Bug 127980 - Valgrind reports errors for ld.so if it is stripped
Summary: Valgrind reports errors for ld.so if it is stripped
Status: RESOLVED INTENTIONAL
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.1.1
Platform: Compiled Sources Linux
: NOR minor
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-24 22:58 UTC by Eero Tamminen
Modified: 2009-07-01 08:29 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Test code for dlopen() (1.68 KB, text/plain)
2006-05-24 23:01 UTC, Eero Tamminen
Details
Complete list of Valgrind errors for the stripped ld.so (8.45 KB, text/plain)
2006-05-24 23:10 UTC, Eero Tamminen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eero Tamminen 2006-05-24 22:58:34 UTC
How to reproduce: 
- Build gcc (3.4) with glibc (2.3.6) 
- Install the built toolchain and C-library 
  - Or otherwise acquire similarly built toolchain 
    and _non-stripped_ libc libraries and install 
    them to /lib where programs find them at run-time 
- Compile Valgrind 3.1.1 with the toolchain 
- Compile a test program using dlopen() 
- Valgrind the program 
- strip -s /lib/ld-2.3.6.so 
- Valgrind the program 
 
Expected result: 
- Stripping debug symbols from the dynamic linker 
  doesn't affect what errors Valgrind reports 
 
Actual result: 
- No errors on first Valgrind run 
- After debug symbols are stripped from the dynamic linker: 
 
  - When the program starts, Valgrind reports that: 
==32705== Conditional jump or move depends on uninitialised value(s) 
==32705==    at 0x40091C5: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x4002B73: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x400F08D: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x40011F3: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x40008F6: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
 
  - When the first dlopen() happens, Valgrind reports: 
==32705== Invalid read of size 4 
==32705==    at 0x40122E9: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x40051D4: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x4006C11: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x4116B18: dl_open_worker 
(in /targets/PC-TEST/lib/libc-2.3.6.so) 
==32705==    by 0x400BB7D: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x4117457: _dl_open (in /targets/PC-TEST/lib/libc-2.3.6.so) 
==32705==    by 0x4021CEC: dlopen_doit (in /targets/PC-TEST/lib/libdl-2.3.6.so) 
==32705==    by 0x400BB7D: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x40222DD: _dlerror_run 
(in /targets/PC-TEST/lib/libdl-2.3.6.so) 
==32705==    by 0x4021D3C: dlopen@@GLIBC_2.1 
(in /targets/PC-TEST/lib/libdl-2.3.6.so) 
==32705==    by 0x8048769: test_dl (in /home/etammine/tmp/test-gcc/test-dlopen) 
==32705==    by 0x80488A6: main (in /home/etammine/tmp/test-gcc/test-dlopen) 
==32705==  Address 0x414603C is 20 bytes inside a block of size 23 alloc'd 
==32705==    at 0x401D419: malloc (vg_replace_malloc.c:149) 
==32705==    by 0x4004565: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x4006B82: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x4116B18: dl_open_worker 
(in /targets/PC-TEST/lib/libc-2.3.6.so) 
==32705==    by 0x400BB7D: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x4117457: _dl_open (in /targets/PC-TEST/lib/libc-2.3.6.so) 
==32705==    by 0x4021CEC: dlopen_doit (in /targets/PC-TEST/lib/libdl-2.3.6.so) 
==32705==    by 0x400BB7D: (within /targets/PC-TEST/lib/ld-2.3.6.so) 
==32705==    by 0x40222DD: _dlerror_run 
(in /targets/PC-TEST/lib/libdl-2.3.6.so) 
==32705==    by 0x4021D3C: dlopen@@GLIBC_2.1 
(in /targets/PC-TEST/lib/libdl-2.3.6.so) 
==32705==    by 0x8048769: test_dl (in /home/etammine/tmp/test-gcc/test-dlopen) 
==32705==    by 0x80488A6: main (in /home/etammine/tmp/test-gcc/test-dlopen) 
 
I was testing this in Scratchbox (see scratchbox.org) with a toolchain 
compiled for it, but these errors are output also on the Ubuntu Breezy 
desktop using pre-built Valgrind 3.0.1 binary. 
 
I'm not sure whether this is an issue Valgrind has with optimized binaries, 
but glibc package cannot be compiled without optimizations.
Comment 1 Eero Tamminen 2006-05-24 23:01:58 UTC
Created attachment 16261 [details]
Test code for dlopen()
Comment 2 Eero Tamminen 2006-05-24 23:10:11 UTC
Created attachment 16262 [details]
Complete list of Valgrind errors for the stripped ld.so

/lib is along with other system directories symbolically linked to
appropriate directories under the currently selected target[1] within
the Scratchbox[2] chroot environment. In my test, the target directory
was /targets/PC-TEST/, that's the reason for the funny paths in
the Valgrind log.

[1] target = toolchain/C-library, distribution helpers and optional
    CPU target emulation.
[2] Scratchbox is a tool for cross-compiling Linux distributions.
    For more info, see http://www.scratchbox.org/.
Comment 3 Eero Tamminen 2006-05-24 23:19:16 UTC
Forgot to mention, I'm running the test program like this:
  valgrind --tool=memcheck --num-callers=50 \
    ./test-dlopen /lib/libnss_files.so.2 open

The Ubuntu Breezy /lib/ld-*.so is stripped and I get the errors,
but SUSE 9.1 (gcc 3.3.3, glibc 2.3.3) seems to be shipping
non-stripped /lib/ld-*.so and I'm not getting the errors there.
Comment 4 Tom Hughes 2006-05-27 17:56:55 UTC
Unfortunately the suppression system which valgrind uses to ignore certain known issues in system libraries like glibc and the dynamic linker relies on being able to match symbol names against a list of things to ignore so stripping out too much information will stop valgrind being able to suppress these errors. I don't think there is much we can do to improve this I'm afraid.
Comment 5 Eero Tamminen 2006-05-29 13:06:35 UTC
> Unfortunately the suppression system which valgrind uses to ignore certain 
> known issues in system libraries like glibc and the dynamic linker

Are bugs reported for these Glibc/dynamic linker issues (Bugzilla URL?)?
Are they fixed in some newer Glibc version (which one?)?


> I don't think there is much we can do to improve this I'm afraid.

If Valgrind notices that it has error suppressions for a library which
doesn't have symbols, maybe it could output something like this to the log:
  Because library /lib/ld-2.3.6.so is stripped, suppressions for its errors
  might not have effect.
(I spent quite a while debugging this and there are e.g. in Gnome Bugzilla
bug reports mentioning these invalid reads.)

In the FAQ it could then be explained that error suppressions require
the libraries to have debug symbols if the errors are in functions that
are not exported.
Comment 6 Tom Hughes 2006-05-29 13:19:16 UTC
Most of the glibc issues are I believe cases where glibc is being 'too clever' and valgrind is not able to understand that what it is doing is safe. There are some patches around on the net to clean up glibc and stop it generating various false warnings, but the glibc maintainers have refused them.

As far as a warning goes, this is actually quite tricky, as the .so will probably still have a symbtol table in it, and it may even have a few symbols in it, so it is no easy to tell that it is stripped.
Comment 7 Julian Seward 2006-05-29 14:04:38 UTC
> I'm not sure whether this is an issue Valgrind has with optimized binaries,
> but glibc package cannot be compiled without optimizations.


This is a problem we ran into first on SuSE 9.3 I believe.  Because V's
error suppression machinery depends on spotting certain symbols in 
ld-2.3.X.so, there's not much that can be done about this.  The SuSE 
folks in the end switched to shipping a non-stripped ld-2.3.X.so.  
I think it also gave them problems when using gdb to debug threaded
apps (IIRC).  Your best bet is to ensure that whoever assembles your
distribution doesn't strip ld-2.3.X.so.
Comment 8 Eero Tamminen 2006-05-29 21:15:58 UTC
Ok, the symbols don't increase the ld-2.3.x.so size that much (~90KB -> 110KB),
so that seems quite reasonable, maybe even on embedded devices (for which
development Scratchbox is intended).  Are there similar problems with other 
libraries, or just with ld-2.3.x.so?

Btw. Is there some link to these potential Gdb threaded app debugging issues?
Comment 9 Nicholas Nethercote 2009-07-01 08:29:45 UTC
This seems too hard to fix.