After upgrading from 3.3.0 to 3.4.1 on a powerpc-based linux system, the backtraces no longer contain any debug information, just the address and library. The compiler we are using hasn't changed; it's still gcc 4.1.1. I looked through the FAQ, releases notes, and tried to find something similar in the bug database, but couldn't find anything.
Urr, that's ungood. Could you compile memcheck/tests/errs1.c with -g, verify that 3.4.1 doesn't produce line numbers on it (it should report 2 errors), and send me the executable?
Hmm... on this simple test program, it's actually not finding any errors, except problems attributed to ld-2.5.so. It doesn't even seem to be aware that memory was allocated and not freed. ==29003== Memcheck, a memory error detector. ==29003== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al. ==29003== Using LibVEX rev 1884, a library for dynamic binary translation. ==29003== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP. ==29003== Using valgrind-3.4.1, a dynamic binary instrumentation framework. ==29003== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al. ==29003== For more details, rerun with: -v ==29003== ==29003== Conditional jump or move depends on uninitialised value(s) ==29003== at 0x4002754: (within /lib/ld-2.5.so) ==29003== by 0x4014D6F: (within /lib/ld-2.5.so) ==29003== ==29003== Conditional jump or move depends on uninitialised value(s) ==29003== at 0x4002788: (within /lib/ld-2.5.so) ==29003== by 0x4014D6F: (within /lib/ld-2.5.so) ==29003== ==29003== ERROR SUMMARY: 4 errors from 2 contexts (suppressed: 1 from 1) ==29003== malloc/free: in use at exit: 0 bytes in 0 blocks. ==29003== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. ==29003== For counts of detected errors, rerun with: -v ==29003== Use --track-origins=yes to see where uninitialised values come from ==29003== All heap blocks were freed -- no leaks are possible.
Created attachment 33168 [details] errs1 executable
Hmm. That means the intercept mechanism isn't working, hence it doesn't see calls to malloc/free et al. What Linux distro is this?
The weird part is that it seemed to be working on bigger programs, appearing to find known issues, just not showing the line numbers. Using some other tests, I can generate a few cases that are tracked correctly, and that don't show line numbers, related to invalid stack accesses, although reading/write uninitialized stack variables doesn't seem to trigger any warnings. As for the system, unfortunately this is not a standard distribution, but our own linux system, built from scratch. For testing 3.4.1, we just dropped the new release into our system and built it using the same build options as for 3.3.0.
I realize we have a non-standard setup, but if you have any suggestions on where we should look or what we might try to find the source of the problem(s), it would be much appreciated. Could there be anything related to endianess, or library versions, or gcc versions, or build options that might have changed between releases that could account for both the lack of line numbers and the failure to detect malloc errors?
I have a similiar problem, possibly the same. Sometimes I have debug info in backtraces and sometimes not. I was able to track it down though: $ gcc xyz.c gives me backtraces with debug info $ gcc xyz.c -Wl,--bss-plt yields backtraces *without* debug info Such backtraces worked ok in version 3.3.2 I am using Debian PPC unstable.
This bug also happens with the DENX ELDK 4.2 targetting PowerPC 4xx, when using valgrind 3.4.1 . The bug does not happen with valgrind 3.3.0 and the same setup. The ELDK is freely available, and we're using the cross tool chain (gcc 4.2.2, etc. running on x86_64) that it provides. I tried the compiling memcheck/tests/errs1.c with -g as mention above, and did not see any errors. Is there anything else I can provide or do to help address this bug? I'm happy to provide access to a system if that will help.
It would be useful to know if this problem still happens on the trunk, as there has been a lot of futzing around with the debuginfo reader there recently. Can you check out and test it? Details at http://www.valgrind.org/downloads/repository.html
I tried this with SVN as of 2009-06-02, in the same EDLK 4.2 PowerPC environment that worked fine with valgrind 3.3.0, and the problem is still there.
This is why helgrind/tests/hg05_race2 fails during the nightly PPC build.
(In reply to comment #11) > This is why helgrind/tests/hg05_race2 fails during the nightly PPC build. Please ignore the above comment -- hg05_race2 fails on PPC for another reason.
I'm happy to look into this -- it is listed as a 3.5.0 blocker -- but I'll need remote ssh access to a system which can reproduce the problem. Can anyone provide that?
Downgrading this from "blocker3.5.0" to "wanted3.5.0" due to the lack of response to Julian's requests for assistance.
At least w.r.t. Comment #8, the problem happens because the toolchain generated an executable with a data segment to be mapped rwx. This breaks Valgrind's logic for identifying the data segment on ppc32-linux, as it is looking for a rw- (non executable) data segment. As a result of failing to conclusively identify both the text and data segments for the executable, Valgrind declines to read debuginfo for it. The change below "fixes" it. It would be good to get feedback on whether this helps other folks experiencing this problem on ppc32-linux. Index: coregrind/m_debuginfo/debuginfo.c =================================================================== --- coregrind/m_debuginfo/debuginfo.c (revision 10816) +++ coregrind/m_debuginfo/debuginfo.c (working copy) @@ -702,10 +702,10 @@ */ is_rx_map = False; is_rw_map = False; -# if defined(VGA_x86) +# if defined(VGA_x86) || defined(VGA_ppc32) is_rx_map = seg->hasR && seg->hasX; is_rw_map = seg->hasR && seg->hasW; -# elif defined(VGA_amd64) || defined(VGA_ppc32) || defined(VGA_ppc64) +# elif defined(VGA_amd64) || defined(VGA_ppc64) is_rx_map = seg->hasR && seg->hasX && !seg->hasW; is_rw_map = seg->hasR && seg->hasW && !seg->hasX; # else
The above patch (slightly modified) fixed the original problem I filed. The patch didn't apply with 3.4.1 though. The change I ended up with was: ===== coregrind/m_debuginfo/debuginfo.c 1.3 vs edited ===== --- 1.3/coregrind/m_debuginfo/debuginfo.c 2009-03-01 17:02:37 -05:00 +++ edited/coregrind/m_debuginfo/debuginfo.c 2009-08-14 13:24:47 -04:00 @@ -701,11 +701,11 @@ */ is_rx_map = False; is_rw_map = False; -# if defined(VGP_x86_linux) +# if defined(VGP_x86_linux) || defined(VGP_ppc32_linux) is_rx_map = seg->hasR && seg->hasX; is_rw_map = seg->hasR && seg->hasW; # elif defined(VGP_amd64_linux) \ - || defined(VGP_ppc32_linux) || defined(VGP_ppc64_linux) + || defined(VGP_ppc64_linux) is_rx_map = seg->hasR && seg->hasX && !seg->hasW; is_rw_map = seg->hasR && seg->hasW && !seg->hasX; # else
Committed, r10828. Closing.