Summary: | Handle lld 9+ split RW PT_LOAD segments correctly | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Paul Floyd <pjfloyd> |
Component: | general | Assignee: | Paul Floyd <pjfloyd> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | emaste, mark |
Priority: | NOR | ||
Version: | 3.20 GIT | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Patch for reading debuginfo correctly with lld 9+
Patch for reading debuginfo correctly with lld 9+ |
Description
Paul Floyd
2022-04-20 15:45:27 UTC
Looking at the segment handling code, maybe_merge_nsegments should be doing this. readelf -e says Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x0000000000000040 0x0000000000200040 0x0000000000200040 0x0000000000000268 0x0000000000000268 R 0x8 INTERP 0x00000000000002a8 0x00000000002002a8 0x00000000002002a8 0x0000000000000015 0x0000000000000015 R 0x1 [Requesting program interpreter: /libexec/ld-elf.so.1] LOAD 0x0000000000000000 0x0000000000200000 0x0000000000200000 0x00000000000005b4 0x00000000000005b4 R 0x1000 LOAD 0x00000000000005c0 0x00000000002015c0 0x00000000002015c0 0x00000000000003b0 0x00000000000003b0 R E 0x1000 LOAD 0x0000000000000970 0x0000000000202970 0x0000000000202970 0x0000000000000190 0x0000000000000190 RW 0x1000 LOAD 0x0000000000000b00 0x0000000000203b00 0x0000000000203b00 0x0000000000000048 0x0000000000000058 RW 0x1000 DYNAMIC 0x00000000000009a0 0x00000000002029a0 0x00000000002029a0 0x0000000000000160 0x0000000000000160 RW 0x8 GNU_RELRO 0x0000000000000970 0x0000000000202970 0x0000000000202970 0x0000000000000190 0x0000000000000690 R 0x1 GNU_EH_FRAME 0x0000000000000564 0x0000000000200564 0x0000000000200564 0x0000000000000014 0x0000000000000014 R 0x4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 0 NOTE 0x00000000000002c0 0x00000000002002c0 0x00000000002002c0 0x0000000000000048 0x0000000000000048 R 0x4 I added some debug printfs to maybe_merge_nsegments and that gives me DEBUG: maybe_merge_nsegments R 1 W 1 X 0 DEBUG: maybe_merge_nsegments matching RWX dev and ino DEBUG: maybe_merge_nsegments s1 start 202000 DEBUG: maybe_merge_nsegments s1 end 202fff DEBUG: maybe_merge_nsegments s1 offset 0 DEBUG: maybe_merge_nsegments s2 start 203000 DEBUG: maybe_merge_nsegments s2 end 203fff DEBUG: maybe_merge_nsegments s2 offset 0 DEBUG: maybe_merge_nsegments s1 offset + delta 1000 That's for the varinfo5 executable in memcheck/tests. There's a similar block for the link loader /libexec/ld-elf.so.1 I'm struggling to see how that corresponds to what readelf shows. LOAD 0x0000000000000970 0x0000000000202970 0x0000000000202970 0x0000000000000190 0x0000000000000190 RW 0x1000 LOAD 0x0000000000000b00 0x0000000000203b00 0x0000000000203b00 0x0000000000000048 0x0000000000000058 RW 0x1000 start = PhysAddr - ofset I really don't understand where the offsets come from. Are they both being rounded down to zero? The condition for merging segments is if (s1->hasR == s2->hasR && s1->hasW == s2->hasW && s1->hasX == s2->hasX && s1->dev == s2->dev && s1->ino == s2->ino && s2->offset == s1->offset + ((Long)s2->start) - ((Long)s1->start) ) { OK, same perms, same file, But then the offset test? 0 == 0 + =0x203000 - 0x202000 That's false. What would work is && s2->start == s1->end + 1 ) { Or possibly some way of taking into consideration the value of 'Align' mapelf has res = VG_(am_mmap_file_fixed_client)( VG_PGROUNDDN(addr), VG_PGROUNDUP(bss)-VG_PGROUNDDN(addr), prot, /*VKI_MAP_FIXED|VKI_MAP_PRIVATE, */ e->fd, VG_PGROUNDDN(off) ); which shows off being rounded down to the nearest pagesize (4098 or 0x1000). I have a theory, not really based on a firm understanding or thorough debugging. For this res = VG_(am_mmap_file_fixed_client)( VG_PGROUNDDN(addr), VG_PGROUNDUP(bss)-VG_PGROUNDDN(addr), prot, /*VKI_MAP_FIXED|VKI_MAP_PRIVATE, */ e->fd, VG_PGROUNDDN(off) ); Should 'off' be the offset of the segment only, or the offset of the segment from the start of the merged? The readelf output gives the offsets as 0x970 and 0xb00. They both round down to 0 with VG_PGROUNDDN. If the offset of the second ought to be the sum of the two then that would round down to 0x1000. And I think that would resolve a lot of problems that I've seen (merging, symbol lookup in shared libraries). What I'm going to try is, in elf.c for (i = 0; i < e->e.e_phnum; i++) { ESZ(Phdr) *ph = &e->p[i]; ESZ(Phdr) *phprev = NULL; ESZ(Addr) addr, bss, brkaddr; ESZ(Off) off; ESZ(Word) filesz; ESZ(Word) memsz; unsigned prot = 0; if (ph->p_type != PT_LOAD) continue; if (i) phprev = &e->p[i-1]; if (ph->p_flags & PF_X) prot |= VKI_PROT_EXEC; if (ph->p_flags & PF_W) prot |= VKI_PROT_WRITE; if (ph->p_flags & PF_R) prot |= VKI_PROT_READ; addr = ph->p_vaddr+base; off = ph->p_offset; if (phprev && ph->p_flags == phprev->p_flags) off += phprev->p_offset; Needs more work. Maybe store 'offset' and a 'mergeoffset' Or more precisely 'offset' and 'merge_remainder' or something like that. where 'merge_remainder' would be PGRNDDN(off - PGRNDDN(off) + prevoff - PGRNDDN(prefoff)) (unless there is a page remainder macro). This would either be 0 or 1 page. I see a similar problem on Linux (llvm 12,0,1 on Fedora 34). memcheck/tests/varinfo5 fails, mainly failing to read debuginfo form the .so to get variable names. After barking up the wrong tree for some time (merging RW PT_LOADs), I think that I now see what I need to do. 1. ULong VG_(di_notify_mmap)( Addr a, Bool allow_SkFileV, Int use_fd ) This is probably the trickier one. It needs to handle 3 possible cases A. only 1 RW PT_LOAD, behaviour as present. B. first of 2 RW PT_LOADs, add di to XA and return 0 C. second of 2 RW PT OADs, add di to XA and return di_notify_ACHIEVE_ACCEPT_STATE ( di ) For B I can look at the next NSegment VG_(am_find_nsegment)(seg->end+1); And for C I can look at the previous NSegment VG_(am_find_nsegment)(seg->start-1); 2. Change Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) This should be able to see when there are 2 RW dinfos, and act accordingly to look for .data .got.plt .bss in the second one. No, not quite. It seems like Valgrind loads the exe (and normally the interpreter, which is does with VG_(load_ELF) and readelf. I think that these read the phdrs and shdrs, and pre-allocates the NSegements. That means that when a PT_LOAD gets mmap'd, if there is a subsequent mmap the NSeggment is already in place. For other libs (libc.so, varinfo50so.so for example) it's ld.so that is running and loading them. This is doing the mmaps of the PT_LOADs one at a time. So the NSegment lookahead does not work. If I'm going to to lookahead, that means I'd have to do something like parsing the ELF in the shared lib and working out if there will be further RW PT_LOAD. Sounds hard. Plan B. Do as now, then see the 2nd PT_LOAD and try to clean up the mess. I've gone with plan A. In ULong VG_(di_notify_mmap) I replaced the call to ML_(is_elf_object_file) with a call to a new function ML_(check_elf_and_get_rw_loads) . This is an ultra-stripped down version of ML_(read_elf_debug_info) that just 1. calls ML_(is_elf_object_file) and then 2. iterates over the PHDRs counting RW PT_LOADs Also modified ML_(read_elf_debug_info) to count RW mappings, look for a second one if the count is two and to use this second rw mapping if present for bss data and got.plt The good news is that this fixes all of the failures in varinfo5 like this - Address 0x........ is 7 bytes inside data symbol "static_local_undef.XXXX" + Address 0x........ is in a rw- mapped file /usr/home/paulf/scratch/valgrind/memcheck/tests/varinfo5so.so segment (not reading debuginfo in shared libraries correctly) The bad news is that I'm getting 3 new failures < == 759 tests, 13 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures == --- > == 759 tests, 16 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures == 2212a2220 > drd/tests/annotate_hb_race (stderr) 2217a2226,2227 > none/tests/mmap_fcntl_bug (stderr) > none/tests/vgprintf (stderr) --pid-- di_notify_mmap-0: --pid-- di_notify_mmap-1: 0x200000-0x200fff r-- --pid-- di_notify_mmap-2: /usr/home/paulf/scratch/valgrind/none/tests/vgprintf --pid-- di_notify_mmap-3: is_rx_map 0, is_rw_map 0, is_ro_map 1 --pid-- di_notify_mmap-4: noting details in DebugInfo* at 0x40247C220 --pid-- di_notify_mmap-6: no dinfo loaded /usr/home/paulf/scratch/valgrind/none/tests/vgprintf (no rx or no rw mapping) --pid-- di_notify_mmap-0: --pid-- di_notify_mmap-1: 0x201000-0x201fff r-x --pid-- di_notify_mmap-2: /usr/home/paulf/scratch/valgrind/none/tests/vgprintf --pid-- di_notify_mmap-3: is_rx_map 1, is_rw_map 0, is_ro_map 0 --pid-- di_notify_mmap-4: noting details in DebugInfo* at 0x40247C220 --pid-- di_notify_mmap-6: no dinfo loaded /usr/home/paulf/scratch/valgrind/none/tests/vgprintf (no rx or no rw mapping) --pid-- di_notify_mmap-0: --pid-- di_notify_mmap-1: 0x202000-0x203fff rw- --pid-- di_notify_mmap-2: /usr/home/paulf/scratch/valgrind/none/tests/vgprintf --pid-- di_notify_mmap-3: is_rx_map 0, is_rw_map 1, is_ro_map 0 --pid-- di_notify_mmap-4: noting details in DebugInfo* at 0x40247C220 --pid-- di_notify_mmap-6: no dinfo loaded /usr/home/paulf/scratch/valgrind/none/tests/vgprintf (no rx or no rw mapping) Here is the problem. Why no second notify_mmap for the other RW PT_LOAD? --pid-- di_notify_mmap-0: --pid-- di_notify_mmap-1: 0x4000000-0x4005fff r-- --pid-- di_notify_mmap-2: /libexec/ld-elf.so.1 --pid-- di_notify_mmap-3: is_rx_map 0, is_rw_map 0, is_ro_map 1 --pid-- di_notify_mmap-4: noting details in DebugInfo* at 0x40247C7C0 --pid-- di_notify_mmap-6: no dinfo loaded /libexec/ld-elf.so.1 (no rx or no rw mapping) OK, this is to do with how the ELF exe or shared lib gets loaded. There are 3 cases. 1. the tool itself, already loaded and we look at the maps and work backwards to get NSegments and dinfo. 2. The client (and interpreter). We read the ELF headers and do the loading afterwards. That means the NSegments are done before the mmaping and dinfo loading occurs. 3. Any client shared libraries. These get loaded by the interpreter when the client is running under the host, we intercept the mmap calls and add NSegments and mmap/dinfo. The problem is in point 2 when there is a merge of the NSegements. That will result in just one call to VG_(di_notify_mmap) for the merged segement. And VG_(di_notify_mmap) needs two calls when the ELF headers contain 2 RW PT_LOADs. Maybe I need to use both (looking at NSegments for client load, looking at the ELF headers for a shared lib load). In that case I need to know which context VG_(di_notify_mmap) is being called from. Mayb the elf header e_type could do that. Almost there. none/tests/mmap_fcntl_bug is failing mmap_fcntl_bug: Child got lock, we must have dropped it (TEST FAILED) And this happens whether the client was built with GCC of clang. It seems that the extra fopen/fclose that I'm doing is interfering with the fcntl to lock the file. I need to see if I can use pread (a bit like the existing code does with a 1k stack buffer). Or else fseek back to where the fd was before. Created attachment 148662 [details]
Patch for reading debuginfo correctly with lld 9+
On FreeBSD 13 amd64, OK with both clang and gcc. On Linux Fedora 34 amd64 also OK. On Fedora 34, git head and "./configure CC=clang CXX=clang++ LDFLAGS=-fuse-ld=lld" with thispatch the results look fairly similar to what I get on FreeBSD. I didn't get all of the problems that I had with FreeBSD .got.plt errors on Helgrind and DRD, but I think that is because my Fedora 34 was all or mostly built with GCC. That means that libc, libpthread etc don't have the split RW PT_LOADS. On the other hand my FreeBSD 13 is all or mostly built with clang and the system libs will have them split. The situation might be different on ChromeOS and Android. I wish I understood this part of the code better. But I also struggling a bit. Let me try to talk us through the patch to see if we agree on the why/what: di_notify_mmap is triggered by an actual mmap (and some startup code, for valgrind tools itself?). The problem we are dealing with is when "enough" or "the right" parts of the file have been mapped by ld.so to start searching for the debuginfo and setting up the offsets to map debug addresses to addresses in memory where the code is mapped. The goal of di_notify_mmap is to reach di_notify_ACHIEVE_ACCEPT_STATE at which point the actual debuginfo is read. This is guarded by: if (di->fsm.have_rx_map && di->fsm.have_rw_map) You patch changes that to: if (di->fsm.have_rx_map && rw_load_count >= 1 && di->fsm.have_rw_map == rw_load_count) Now have_rw_map is slightly misnamed, but ok. The rw_load_count comes from the new check_elf_and_get_rw_loads. Which tries to get an elf image from the file descriptor and traverses the phdrs looking for PT_LOAD RW mappings, with this funny counter example: /* * Hold your horses * Just because The ELF file contains 2 RW PT_LOAD segments it * doesn't mean that Valgrind will also make 2 calls to * VG_(di_notify_mmap). If the stars are all aligned * (which usually means that the ELF file is the client * executable with the segment offset for the * second PT_LOAD falls exactly on 0x1000) then the NSegements * will get merged and VG_(di_notify_mmap) only gets called once. */ if (*rw_load_count == 2 && ehdr_m.e_type == ET_EXEC && a_phdr.p_offset == VG_PGROUNDDN(a_phdr.p_offset) ) { *rw_load_count = 1; } I assume in that case ld.so merges the mmap request into one? I cannot say I really understand everything here, but I understand why you want to delay to actual debuginfo loading till all mapping are there. Is the above an accurate description of the proposed patch logic? (In reply to Mark Wielaard from comment #15) For macos I don't know if any of these changes are applicable so I wanted to keep the old code if #if blocks. I'll clean up unrelated changes and push them separately. > I wish I understood this part of the code better. But I also struggling > a bit. Let me try to talk us through the patch to see if we agree on > the why/what: > > di_notify_mmap is triggered by an actual mmap (and some startup code, > for valgrind tools itself?). The problem we are dealing with is when > "enough" or "the right" parts of the file have been mapped by ld.so to > start searching for the debuginfo and setting up the offsets to map > debug addresses to addresses in memory where the code is mapped. Essentially that's it. "enough" is when the RX and the RW PT_LOAD(s) have been seen. As I understand it, di_notify_mmap can be called in 2 situations "HOST" 1a. For the tool exe and tool/core shared libs. These are already mmap'd when the host starts so we look at something like the /proc filesystem to get the mapping after the event and build up the NSegments from that. 1b. Then the host loads ld.so and the guest exe. This is done in the sequence load_client -> VG_(do_exec) -> VG_(do_exec_inner) -> exe_handlers->load_fn ( == VG_(load_ELF) ). This does the mmap'ing and creats the associated NSegments. The NSegments may get merged, so there could be fewer mmaps and PT_LOADs than there are NSegemnts. This is around line 1893 of m_main.c: for (i = 0; i < n_seg_starts; i++) { anu.ull = VG_(di_notify_mmap)( seg_starts[i], True/*allow_SkFileV*/, -1/*Don't use_fd*/); "GUEST" 2. When the guest loads any further shared libs (libc, other dependencies, dlopens). There are a few variations for syswraps/platforms, but the one that we are concerned with is syswrap-generic.c: /* Load symbols? */ di_handle = VG_(di_notify_mmap)( (Addr)sr_Res(sres), False/*allow_SkFileV*/, (Int)arg5 ); In this case the NSegment could possibly be merged, but that is irrelevant because di_notify_mmap is being called based directy on the mmap result. > The goal of di_notify_mmap is to reach di_notify_ACHIEVE_ACCEPT_STATE > at which point the actual debuginfo is read. This is guarded by: > > if (di->fsm.have_rx_map && di->fsm.have_rw_map) > > You patch changes that to: > > if (di->fsm.have_rx_map && > rw_load_count >= 1 && > di->fsm.have_rw_map == rw_load_count) > > Now have_rw_map is slightly misnamed, but ok. Yes, count_rw_map would be better. > > The rw_load_count comes from the new check_elf_and_get_rw_loads. > Which tries to get an elf image from the file descriptor and traverses > the phdrs looking for PT_LOAD RW mappings, with this funny counter > example: > > /* > * Hold your horses > * Just because The ELF file contains 2 RW PT_LOAD segments it > * doesn't mean that Valgrind will also make 2 calls to > * VG_(di_notify_mmap). If the stars are all aligned > * (which usually means that the ELF file is the client > * executable with the segment offset for the > * second PT_LOAD falls exactly on 0x1000) then the NSegements > * will get merged and VG_(di_notify_mmap) only gets called once. */ > if (*rw_load_count == 2 && > ehdr_m.e_type == ET_EXEC && > a_phdr.p_offset == VG_PGROUNDDN(a_phdr.p_offset) ) > { > *rw_load_count = 1; > } > > I assume in that case ld.so merges the mmap request into one? No it's Valgrind that merges the NSegements, as above. > I cannot say I really understand everything here, but I understand why > you want to delay to actual debuginfo loading till all mapping are > there. > > Is the above an accurate description of the proposed patch logic? Yes, that's about it. Lastly, I thought that the code to handle either 1 or 2 segments in ML_(read_elf_debug_info) is a bit messy. Do you have any ideas how it could be done more cleanly? Created attachment 149540 [details]
Patch for reading debuginfo correctly with lld 9+
Added more comments, some extra checks and simplified some code.
(In reply to Paul Floyd from comment #16) > > di_notify_mmap is triggered by an actual mmap (and some startup code, > > for valgrind tools itself?). The problem we are dealing with is when > > "enough" or "the right" parts of the file have been mapped by ld.so to > > start searching for the debuginfo and setting up the offsets to map > > debug addresses to addresses in memory where the code is mapped. > > Essentially that's it. "enough" is when the RX and the RW PT_LOAD(s) have > been seen. OK. > As I understand it, di_notify_mmap can be called in 2 situations > > "HOST" > 1a. For the tool exe and tool/core shared libs. These are already mmap'd > when the host starts so we look at something like the /proc filesystem to > get the mapping after the event and build up the NSegments from that. > > 1b. Then the host loads ld.so and the guest exe. This is done in the > sequence load_client -> VG_(do_exec) -> VG_(do_exec_inner) -> > exe_handlers->load_fn ( == VG_(load_ELF) ). This does the mmap'ing and > creats the associated NSegments. > > The NSegments may get merged, so there could be fewer mmaps and PT_LOADs > than there are NSegemnts. > > This is around line 1893 of m_main.c: > > for (i = 0; i < n_seg_starts; i++) { > anu.ull = VG_(di_notify_mmap)( seg_starts[i], True/*allow_SkFileV*/, > -1/*Don't use_fd*/); > > "GUEST" > 2. When the guest loads any further shared libs (libc, other dependencies, > dlopens). > > There are a few variations for syswraps/platforms, but the one that we are > concerned with is syswrap-generic.c: > > > /* Load symbols? */ > di_handle = VG_(di_notify_mmap)( (Addr)sr_Res(sres), > False/*allow_SkFileV*/, (Int)arg5 ); > > In this case the NSegment could possibly be merged, but that is irrelevant > because di_notify_mmap is being called based directy on the mmap result. Thanks for this overview. I forgot about the "HOST" part. > > The goal of di_notify_mmap is to reach di_notify_ACHIEVE_ACCEPT_STATE > > at which point the actual debuginfo is read. This is guarded by: > > > > if (di->fsm.have_rx_map && di->fsm.have_rw_map) > > > > You patch changes that to: > > > > if (di->fsm.have_rx_map && > > rw_load_count >= 1 && > > di->fsm.have_rw_map == rw_load_count) > > > > Now have_rw_map is slightly misnamed, but ok. > > Yes, count_rw_map would be better. Thanks for updating that in the new patch. > > The rw_load_count comes from the new check_elf_and_get_rw_loads. > > Which tries to get an elf image from the file descriptor and traverses > > the phdrs looking for PT_LOAD RW mappings, with this funny counter > > example: > > > > /* > > * Hold your horses > > * Just because The ELF file contains 2 RW PT_LOAD segments it > > * doesn't mean that Valgrind will also make 2 calls to > > * VG_(di_notify_mmap). If the stars are all aligned > > * (which usually means that the ELF file is the client > > * executable with the segment offset for the > > * second PT_LOAD falls exactly on 0x1000) then the NSegements > > * will get merged and VG_(di_notify_mmap) only gets called once. */ > > if (*rw_load_count == 2 && > > ehdr_m.e_type == ET_EXEC && > > a_phdr.p_offset == VG_PGROUNDDN(a_phdr.p_offset) ) > > { > > *rw_load_count = 1; > > } > > > > I assume in that case ld.so merges the mmap request into one? > > No it's Valgrind that merges the NSegements, as above. Right, this is the do_exec case. Thanks. > Lastly, I thought that the code to handle either 1 or 2 segments in > ML_(read_elf_debug_info) is a bit messy. > > Do you have any ideas how it could be done more cleanly? Yes, it is a bit messy, but I don't really have an idea to do is more cleanly. I think the cleaned up patch is good to go. But I am a little hesitant about the new varinfo5.stderr.exp-freebsd Are you sure it is really correct now? Are the differences really because of different debuginfo generated by clang vs gcc? Or do we still get some addresses/offsets wrong? For the diffs, I see 1 scope problem twice the following two 1 naming 1 static 1 paths + inlining I haven't gone into it in detail. About half of the differences are reasonable. I still want to keep this expected as it's the only one I know of that indicates problems reading debuginfo in shared libraries. |