Bug 190820

Summary: No debug information on powerpc-linux
Product: [Developer tools] valgrind Reporter: ben
Component: generalAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: normal CC: bart.vanassche+kde, gregs, maps4711, njn
Priority: NOR    
Version: 3.4.1   
Target Milestone: wanted3.5.0   
Platform: Unlisted Binaries   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: errs1 executable

Description ben 2009-04-27 15:08:15 UTC
After upgrading from 3.3.0 to 3.4.1 on a powerpc-based linux system, the backtraces no longer contain any debug information, just the address and library.  The compiler we are using hasn't changed; it's still gcc 4.1.1.

I looked through the FAQ, releases notes, and tried to find something similar in the bug database, but couldn't find anything.
Comment 1 Julian Seward 2009-04-27 18:40:15 UTC
Urr, that's ungood.  Could you compile memcheck/tests/errs1.c with
-g, verify that 3.4.1 doesn't produce line numbers on it (it should
report 2 errors), and send me the executable?
Comment 2 ben 2009-04-27 19:19:14 UTC
Hmm... on this simple test program, it's actually not finding any errors, except problems attributed to ld-2.5.so.  It doesn't even seem to be aware that memory was allocated and not freed.

==29003== Memcheck, a memory error detector.
==29003== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al.
==29003== Using LibVEX rev 1884, a library for dynamic binary translation.
==29003== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP.
==29003== Using valgrind-3.4.1, a dynamic binary instrumentation framework.
==29003== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al.
==29003== For more details, rerun with: -v
==29003== 
==29003== Conditional jump or move depends on uninitialised value(s)
==29003==    at 0x4002754: (within /lib/ld-2.5.so)
==29003==    by 0x4014D6F: (within /lib/ld-2.5.so)
==29003== 
==29003== Conditional jump or move depends on uninitialised value(s)
==29003==    at 0x4002788: (within /lib/ld-2.5.so)
==29003==    by 0x4014D6F: (within /lib/ld-2.5.so)
==29003== 
==29003== ERROR SUMMARY: 4 errors from 2 contexts (suppressed: 1 from 1)
==29003== malloc/free: in use at exit: 0 bytes in 0 blocks.
==29003== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==29003== For counts of detected errors, rerun with: -v
==29003== Use --track-origins=yes to see where uninitialised values come from
==29003== All heap blocks were freed -- no leaks are possible.
Comment 3 ben 2009-04-27 19:20:29 UTC
Created attachment 33168 [details]
errs1 executable
Comment 4 Julian Seward 2009-04-27 21:19:10 UTC
Hmm.  That means the intercept mechanism isn't working, hence it
doesn't see calls to malloc/free et al.  What Linux distro is this?
Comment 5 ben 2009-04-28 00:41:13 UTC
The weird part is that it seemed to be working on bigger programs, appearing to find known issues, just not showing the line numbers.  

Using some other tests, I can generate a few cases that are tracked correctly, and that don't show line numbers, related to invalid stack accesses, although reading/write uninitialized stack variables doesn't seem to trigger any warnings.

As for the system, unfortunately this is not a standard distribution, but our own linux system, built from scratch.  For testing 3.4.1, we just dropped the new release into our system and built it using the same build options as for 3.3.0.
Comment 6 ben 2009-05-06 19:42:18 UTC
I realize we have a non-standard setup, but if you have any suggestions on where we should look or what we might try to find the source of the problem(s), it would be much appreciated.  Could there be anything related to endianess, or library versions, or gcc versions, or build options that might have changed between releases that could account for both the lack of line numbers and the failure to detect malloc errors?
Comment 7 Matthias Grimrath 2009-05-18 21:31:56 UTC
I have a similiar problem, possibly the same. Sometimes I have debug info in backtraces and sometimes not. I was able to track it down though:

$ gcc xyz.c 

gives me backtraces with debug info

$ gcc xyz.c -Wl,--bss-plt

yields backtraces *without* debug info

Such backtraces worked ok in version 3.3.2

I am using Debian PPC unstable.
Comment 8 Greg Snyder 2009-06-01 09:07:16 UTC
This bug also happens with the DENX ELDK 4.2 targetting PowerPC 4xx, when using valgrind 3.4.1 .  The bug does not happen with valgrind 3.3.0 and the same setup.
The ELDK is freely available, and we're using the cross tool chain (gcc 4.2.2, etc. running on x86_64) that it provides.

I tried the compiling memcheck/tests/errs1.c with -g as mention above, and did not see any errors.

Is there anything else I can provide or do to help address this bug?  I'm happy to provide access to a system if that will help.
Comment 9 Julian Seward 2009-06-02 13:41:38 UTC
It would be useful to know if this problem still happens on the
trunk, as there has been a lot of futzing around with the debuginfo
reader there recently.  Can you check out and test it?  Details at
http://www.valgrind.org/downloads/repository.html
Comment 10 Greg Snyder 2009-06-02 21:11:08 UTC
I tried this with SVN as of 2009-06-02, in the same EDLK 4.2 PowerPC environment that worked fine with valgrind 3.3.0, and the problem is still there.
Comment 11 Bart Van Assche 2009-07-26 09:58:17 UTC
This is why helgrind/tests/hg05_race2 fails during the nightly PPC build.
Comment 12 Bart Van Assche 2009-07-26 10:11:47 UTC
(In reply to comment #11)
> This is why helgrind/tests/hg05_race2 fails during the nightly PPC build.

Please ignore the above comment -- hg05_race2 fails on PPC for another reason.
Comment 13 Julian Seward 2009-08-02 15:17:51 UTC
I'm happy to look into this -- it is listed as a 3.5.0 blocker -- but
I'll need remote ssh access to a system which can reproduce the problem.
Can anyone provide that?
Comment 14 Nicholas Nethercote 2009-08-11 02:03:39 UTC
Downgrading this from "blocker3.5.0" to "wanted3.5.0" due to the lack of response to Julian's requests for assistance.
Comment 15 Julian Seward 2009-08-14 17:00:00 UTC
At least w.r.t. Comment #8, the problem happens because the
toolchain generated an executable with a data segment to be
mapped rwx.  This breaks Valgrind's logic for identifying the
data segment on ppc32-linux, as it is looking for a rw- 
(non executable) data segment.  As a result of failing to
conclusively identify both the text and data segments for
the executable, Valgrind declines to read debuginfo for it.

The change below "fixes" it.  It would be good to get feedback
on whether this helps other folks experiencing this problem on
ppc32-linux.


Index: coregrind/m_debuginfo/debuginfo.c
===================================================================
--- coregrind/m_debuginfo/debuginfo.c   (revision 10816)
+++ coregrind/m_debuginfo/debuginfo.c   (working copy)
@@ -702,10 +702,10 @@
    */
    is_rx_map = False;
    is_rw_map = False;
-#  if defined(VGA_x86)
+#  if defined(VGA_x86) || defined(VGA_ppc32)
    is_rx_map = seg->hasR && seg->hasX;
    is_rw_map = seg->hasR && seg->hasW;
-#  elif defined(VGA_amd64) || defined(VGA_ppc32) || defined(VGA_ppc64)
+#  elif defined(VGA_amd64) || defined(VGA_ppc64)
    is_rx_map = seg->hasR && seg->hasX && !seg->hasW;
    is_rw_map = seg->hasR && seg->hasW && !seg->hasX;
 #  else
Comment 16 ben 2009-08-14 20:01:18 UTC
The above patch (slightly modified) fixed the original problem I filed.  The patch didn't apply with 3.4.1 though.  The change I ended up with was:

===== coregrind/m_debuginfo/debuginfo.c 1.3 vs edited =====
--- 1.3/coregrind/m_debuginfo/debuginfo.c       2009-03-01 17:02:37 -05:00
+++ edited/coregrind/m_debuginfo/debuginfo.c    2009-08-14 13:24:47 -04:00
@@ -701,11 +701,11 @@
    */
    is_rx_map = False;
    is_rw_map = False;
-#  if defined(VGP_x86_linux)
+#  if defined(VGP_x86_linux) || defined(VGP_ppc32_linux)
    is_rx_map = seg->hasR && seg->hasX;
    is_rw_map = seg->hasR && seg->hasW;
 #  elif defined(VGP_amd64_linux) \
-        || defined(VGP_ppc32_linux) || defined(VGP_ppc64_linux)
+        || defined(VGP_ppc64_linux)
    is_rx_map = seg->hasR && seg->hasX && !seg->hasW;
    is_rw_map = seg->hasR && seg->hasW && !seg->hasX;
 #  else
Comment 17 Julian Seward 2009-08-16 03:49:41 UTC
Committed, r10828.  Closing.