Created attachment 59514 [details] valgrind output Looking at a wine bug that only occurs when compiling with Clang trunk (http://bugs.winehq.org/show_bug.cgi?id=26754), I found that Valgrind crashes: preloader: Warning: failed to reserve range 00110000-68000000 --9831-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --9831-- si_code=2; Faulting address: 0x805BF6E8; sp: 0x6848b574 valgrind: the 'impossible' happened: Killed by fatal signal ==9831== at 0x380A6DDB: vgModuleLocal_read_debuginfo_dwarf3 (readdwarf.c:280) ==9831== by 0x3805AE45: vgModuleLocal_read_elf_debug_info (readelf.c:2208) ==9831== by 0x38055E57: vgPlain_di_notify_mmap (debuginfo.c:835) ==9831== by 0x38076F0F: vgModuleLocal_generic_PRE_sys_mmap (syswrap-generic.c:2071) ==9831== by 0x3809CA02: vgSysWrap_x86_linux_sys_mmap2_before (syswrap-x86-linux.c:1381) ==9831== by 0x3806BE4D: vgPlain_client_syscall (syswrap-main.c:1498) ==9831== by 0x38068292: handle_syscall (scheduler.c:900) ==9831== by 0x380693DE: vgPlain_scheduler (scheduler.c:1096) ==9831== by 0x38097254: run_a_thread_NORETURN (syswrap-linux.c:95) this is with wine from git (e594268421fbaba5e8db4ee8a929beaae0effcf0), clang from trunk (clang version 3.0 (trunk 130453)), and valgrind from svn (revision 11719). To reproduce: $ compile wine with: #!/bin/bash export CC="clang" export CXX="$CC" export CFLAGS="-std=gnu89 -g -O0 -no-integrated-as" ./configure make then run: valgrind -q --trace-children=yes --track-origins=yes --gen-suppressions=all --suppressions=valgrind-suppressions --leak-check=full --num-callers=20 --workaround-gcc296-bugs=yes --vex-iropt-precise-memory-exns=yes loader/wine regedit valgrind output attached. If Wine is compiled with gcc 4.4.5, works fine. Machine is running gentoo 32-bit, kernel 2.6.37, glibc 2.11.3.
This happened when V tried to read line number info from something clang compiled, either a .so or an executable. Easiest for me would be if you could figure out what file is involved and email it to me (or put it somewhere I can download, if large). You can identify the offending file by running Valgrind with -v.
Created attachment 59582 [details] problem file, bzip2 -9'ed
Created attachment 59583 [details] valgrind verbose output
(In reply to comment #1) > This happened when V tried to read line number info from something > clang compiled, either a .so or an executable. Easiest for me would > be if you could figure out what file is involved and email it to me > (or put it somewhere I can download, if large). > > You can identify the offending file by running Valgrind with -v. (In reply to comment #1) > This happened when V tried to read line number info from something > clang compiled, either a .so or an executable. Easiest for me would > be if you could figure out what file is involved and email it to me > (or put it somewhere I can download, if large). > > You can identify the offending file by running Valgrind with -v. ==22366== Warning: set address range perms: large range [0x82000000, 0xbee10000) (noaccess) --22366-- REDIR: 0x42180e0 (bcmp) redirected to 0x40276e0 (bcmp) --22366-- REDIR: 0x42127b0 (calloc) redirected to 0x402448f (calloc) --22366-- Reading syms from /home/austin/src/wine-clang/dlls/ntdll/ntdll.dll.so (0x7bc00000) --22366-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --22366-- si_code=2; Faulting address: 0x805BF6E8; sp: 0x6848f574 ntdll.dll.so attached, as well as the output of: valgrind -v --trace-children=yes --track-origins=yes --gen-suppressions=all --suppressions=valgrind-suppressions --leak-check=full --num-callers=20 --workaround-gcc296-bugs=yes --vex-iropt-precise-memory-exns=yes loader/wine regedit
(In reply to comment #2) > Created an attachment (id=59582) [details] > problem file, bzip2 -9'ed Thanks. I can successfully reproduce with this. Will investigate.
This somehow pertains to r11705, a fix in which (I thought) I fixed reading of line number info from LLVM-2.9 generated dwarf. There's something very odd about this. The problem pertains to the DW_AT_stmt_list fields in DW_TAG_compile_units. These specify the offset in the .debug_line section of a line number "program" for the compilation unit. There are multiple of these. The first one provides line numbers for an "actctx.c" and is sane enough: <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit) < c> DW_AT_producer : clang version 3.0 (trunk 130453) <2d> DW_AT_language : 1 (ANSI C) <2f> DW_AT_name : actctx.c <38> DW_AT_entry_pc : 0x0 <3c> DW_AT_stmt_list : 0x0 <40> DW_AT_comp_dir : /home/austin/src/wine-clang/dlls/ntdll so the line number program is at 0x0 offset in .debug_line, viz, right at the start. The problem is the next one: <0><8c0b>: Abbrev Number: 1 (DW_TAG_compile_unit) <8c0c> DW_AT_producer : clang version 3.0 (trunk 130453) <8c2d> DW_AT_language : 1 (ANSI C) <8c2f> DW_AT_name : atom.c <8c36> DW_AT_entry_pc : 0x0 <8c3a> DW_AT_stmt_list : 0x7bc018cf <8c3e> DW_AT_comp_dir : /home/austin/src/wine-clang/dlls/ntdll The DW_AT_stmt_list is claimed to be at offset 0x7bc018cf in .debug_line, that is, around about 2GB past the start. This is completely bogus, especially considering the entire ntdll.dll.so is only about 3MB long. Valgrind duly tries to access at the stated offset in .debug_line and segfaults immediately. I noticed, when doing the fix in r11705, that LLVM-2.9 encodes DW_AT_stmt_list using DW_FORM_addr, whereas gcc has always used DW_FORM_data4. I don't know why 2.9 uses that (2.8 didn't iirc), nor whether it's relevant. The dwarf3 spec says that DW_FORM_addr is for holding an address on the target machine, and is subject to relocation. Hence it sounds completely inappropriate for holding an offset into the .debug_info section. /me confused.
Forgot to add: even stranger: readelf --debug-dump=line manages to read all the line number info despite the apparently bogus DW_AT_stmt_list field. No idea how/why.
Well dwarfdump doesn't seem to like it - it decodes the offsets in the same way that you report and when asked to dump line numbers it complains and gives up: vauxhall [~/vgtest] % dwarfdump -l ntdll.dll.so .debug_line: line number info for a single cu dwarfdump ERROR: dwarf_srclines: DW_DLE_ATTR_FORM_BAD (114)
Latest dwarfdump/libdwarf built from source also objects, though slightly differently: vauxhall [~/vgtest] % ~/src/libdwarf/dwarfdump/dwarfdump -l ntdll.dll.so .debug_line: line number info for a single cu /home/thh/src/libdwarf/dwarfdump/dwarfdump ERROR: dwarf_srclines: DW_DLE_BAD_REF_FORM (118) CU Name = CU Producer = "clang version 3.0 (trunk 130453)" DIE OFF = 0x00000000 GOFF = 0x00000000, Low PC = 0x00000000, High PC = 0x00000000 It is basically complaining about the use of DW_FORM_addr for the stmt_list which it doesn't think is valid.
Looking at binutils, the reason that readelf works is that it doesn't actually go via the CU's in the debug_info section when dumping line number data - it just reads the debug_line section straight through from one end to the other.
(In reply to comment #10) Thanks for investigating. > Looking at binutils, the reason that readelf works is that it doesn't actually > go via the CU's in the debug_info section when dumping line number data - it > just reads the debug_line section straight through from one end to the other. Is that an approach that we could switch to? Our current scheme we appears to scan through the main DIE list, to find DW_TAG_compile_units, and read only the parts of debug_line pointed to by DW_AT_stmt_list entries. I don't know why it's done this way instead of the binutils way -- I didn't write this code. It does presumably have the advantage that we don't end up reading line number info for compilation units not referenced by the DIE list. It also doesn't assume that the blocks of line number info in debug_line are placed back-to-back w/ no holes in between. That said I have no idea if either of those points are important. This seems to me like a bug in LLVM 2.9. I don't remember any such problems with 2.8. Much as I'd like to forget about this and tell the LLVM crew to fix their compiler, I'm sure we'll get a bunch of hassle along the lines of "binutils reads it, so what's the problem?" So if we can hack up a workaround, perhaps we should.
Well readelf works because it is only trying to dump the whole section. Whether other users of binutils work is a whole different question - there are routines in bfd to do things like finding line number information for a given address which look like they would fail. That said "objdump -dl" does work somehow, which I thought would fail...
The change in llvm to using DW_FORM_addr was made in r112678: Use absolute label for DW_AT_stmt_list if a target does not prefer offset here. This patch was developed on top of original patch by Artur Pietrek. The only platform where addr4 is still used is Darwin. This change seems to have been made as a result of this mailing list discussion: http://groups.google.com/group/llvm-dev/browse_thread/thread/8964413cbab85774?pli=1
The earlier discussion is also relevant: http://groups.google.com/group/llvm-dev/browse_thread/thread/aeae5b63349fab6b They seem to recognise that it isn't strictly valid (and that it doesn't work on Darwin) but they are doing it so that link time optimisation can adjust the locations by changing the relocations or something.
(In reply to comment #13) > The change in llvm to using DW_FORM_addr was made in r112678: > > Use absolute label for DW_AT_stmt_list if a target does not prefer offset > here. > This patch was developed on top of original patch by Artur Pietrek. > > The only platform where addr4 is still used is Darwin. Uh, but the problem we're seeing on Linux is because it's using DW_FORM_addr, not DW_FORM_data4 like gcc does.
(In reply to comment #12) > Whether other users of binutils work is a whole different question - there are > routines in bfd to do things like finding line number information for a given > address which look like they would fail. austinenglish, does GDB have any difficulty working with the offending file?
(In reply to comment #15) > (In reply to comment #13) > > The change in llvm to using DW_FORM_addr was made in r112678: > > > > Use absolute label for DW_AT_stmt_list if a target does not prefer offset > > here. > > This patch was developed on top of original patch by Artur Pietrek. > > > > The only platform where addr4 is still used is Darwin. > > Uh, but the problem we're seeing on Linux is because it's using > DW_FORM_addr, not DW_FORM_data4 like gcc does. Exactly - they have changed all platforms except Darwin to use addr. They only didn't change Darwin because doing so broke the debugger.
Rafael Espindola tells me that it "Should be fixed in r130846." (which presumably happened in the last few hours). Austin, can you give it a go?
(In reply to comment #18) > Rafael Espindola tells me that it "Should be fixed in r130846." > (which presumably happened in the last few hours). Austin, can > you give it a go? Yep, looks good here, thanks. Resolved upstream?
(In reply to comment #19) > Resolved upstream? Not sure what you're referring to here. Is what resolved in whose upstream?
(In reply to comment #20) > (In reply to comment #19) > > Resolved upstream? > > Not sure what you're referring to here. Is what resolved > in whose upstream? Your comment 18. I updated built clang from svn (r130866), recompiled Wine, and ran: valgrind -q --trace-children=yes --track-origins=yes --gen-suppressions=all --suppressions=valgrind-suppressions --leak-check=full --num-callers=20 --workaround-gcc296-bugs=yes --vex-iropt-precise-memory-exns=yes loader/wine regedit which no longer crashes valgrind. I never updated valgrind, so it seems the problem was fixed upstream in LLVM/Clang, unless I'm misunderstanding something.
(In reply to comment #21) Austin, Thanks for the feedback. > which no longer crashes valgrind. I never updated valgrind, so it seems the > problem was fixed upstream in LLVM/Clang, unless I'm misunderstanding > something. You understand correctly: it was fixed in LLVM upstream. There's one final thing to check though. Valgrind already contained a half-baked attempt to fix this (ineffective, as you saw), and I need to back it out now (svn rev 11705). Can you try the diff below and verify that you still get line numbers on LLVM generated code? Thanks. Index: coregrind/m_debuginfo/readdwarf.c =================================================================== --- coregrind/m_debuginfo/readdwarf.c (revision 11731) +++ coregrind/m_debuginfo/readdwarf.c (working copy) @@ -1111,9 +1111,9 @@ unconditionally to cval? */ case 0x01: /* FORM_addr */ if (addr_size == 4) { - cval = *(UInt*)p; + //cval = *(UInt*)p; } else if (addr_size == 8) { - cval = *(ULong*)p; + //cval = *(ULong*)p; } else { /* wtf, Houston? */ }
Yep, still looks good. This was on top of r11726.
Thanks for the followup check. r11705 is now backed out by r11738. Closing as Invalid (since it's not our bug).