How to reproduce: - Build gcc (3.4) with glibc (2.3.6) - Install the built toolchain and C-library - Or otherwise acquire similarly built toolchain and _non-stripped_ libc libraries and install them to /lib where programs find them at run-time - Compile Valgrind 3.1.1 with the toolchain - Compile a test program using dlopen() - Valgrind the program - strip -s /lib/ld-2.3.6.so - Valgrind the program Expected result: - Stripping debug symbols from the dynamic linker doesn't affect what errors Valgrind reports Actual result: - No errors on first Valgrind run - After debug symbols are stripped from the dynamic linker: - When the program starts, Valgrind reports that: ==32705== Conditional jump or move depends on uninitialised value(s) ==32705== at 0x40091C5: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x4002B73: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x400F08D: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x40011F3: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x40008F6: (within /targets/PC-TEST/lib/ld-2.3.6.so) - When the first dlopen() happens, Valgrind reports: ==32705== Invalid read of size 4 ==32705== at 0x40122E9: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x40051D4: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x4006C11: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x4116B18: dl_open_worker (in /targets/PC-TEST/lib/libc-2.3.6.so) ==32705== by 0x400BB7D: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x4117457: _dl_open (in /targets/PC-TEST/lib/libc-2.3.6.so) ==32705== by 0x4021CEC: dlopen_doit (in /targets/PC-TEST/lib/libdl-2.3.6.so) ==32705== by 0x400BB7D: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x40222DD: _dlerror_run (in /targets/PC-TEST/lib/libdl-2.3.6.so) ==32705== by 0x4021D3C: dlopen@@GLIBC_2.1 (in /targets/PC-TEST/lib/libdl-2.3.6.so) ==32705== by 0x8048769: test_dl (in /home/etammine/tmp/test-gcc/test-dlopen) ==32705== by 0x80488A6: main (in /home/etammine/tmp/test-gcc/test-dlopen) ==32705== Address 0x414603C is 20 bytes inside a block of size 23 alloc'd ==32705== at 0x401D419: malloc (vg_replace_malloc.c:149) ==32705== by 0x4004565: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x4006B82: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x4116B18: dl_open_worker (in /targets/PC-TEST/lib/libc-2.3.6.so) ==32705== by 0x400BB7D: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x4117457: _dl_open (in /targets/PC-TEST/lib/libc-2.3.6.so) ==32705== by 0x4021CEC: dlopen_doit (in /targets/PC-TEST/lib/libdl-2.3.6.so) ==32705== by 0x400BB7D: (within /targets/PC-TEST/lib/ld-2.3.6.so) ==32705== by 0x40222DD: _dlerror_run (in /targets/PC-TEST/lib/libdl-2.3.6.so) ==32705== by 0x4021D3C: dlopen@@GLIBC_2.1 (in /targets/PC-TEST/lib/libdl-2.3.6.so) ==32705== by 0x8048769: test_dl (in /home/etammine/tmp/test-gcc/test-dlopen) ==32705== by 0x80488A6: main (in /home/etammine/tmp/test-gcc/test-dlopen) I was testing this in Scratchbox (see scratchbox.org) with a toolchain compiled for it, but these errors are output also on the Ubuntu Breezy desktop using pre-built Valgrind 3.0.1 binary. I'm not sure whether this is an issue Valgrind has with optimized binaries, but glibc package cannot be compiled without optimizations.
Created attachment 16261 [details] Test code for dlopen()
Created attachment 16262 [details] Complete list of Valgrind errors for the stripped ld.so /lib is along with other system directories symbolically linked to appropriate directories under the currently selected target[1] within the Scratchbox[2] chroot environment. In my test, the target directory was /targets/PC-TEST/, that's the reason for the funny paths in the Valgrind log. [1] target = toolchain/C-library, distribution helpers and optional CPU target emulation. [2] Scratchbox is a tool for cross-compiling Linux distributions. For more info, see http://www.scratchbox.org/.
Forgot to mention, I'm running the test program like this: valgrind --tool=memcheck --num-callers=50 \ ./test-dlopen /lib/libnss_files.so.2 open The Ubuntu Breezy /lib/ld-*.so is stripped and I get the errors, but SUSE 9.1 (gcc 3.3.3, glibc 2.3.3) seems to be shipping non-stripped /lib/ld-*.so and I'm not getting the errors there.
Unfortunately the suppression system which valgrind uses to ignore certain known issues in system libraries like glibc and the dynamic linker relies on being able to match symbol names against a list of things to ignore so stripping out too much information will stop valgrind being able to suppress these errors. I don't think there is much we can do to improve this I'm afraid.
> Unfortunately the suppression system which valgrind uses to ignore certain > known issues in system libraries like glibc and the dynamic linker Are bugs reported for these Glibc/dynamic linker issues (Bugzilla URL?)? Are they fixed in some newer Glibc version (which one?)? > I don't think there is much we can do to improve this I'm afraid. If Valgrind notices that it has error suppressions for a library which doesn't have symbols, maybe it could output something like this to the log: Because library /lib/ld-2.3.6.so is stripped, suppressions for its errors might not have effect. (I spent quite a while debugging this and there are e.g. in Gnome Bugzilla bug reports mentioning these invalid reads.) In the FAQ it could then be explained that error suppressions require the libraries to have debug symbols if the errors are in functions that are not exported.
Most of the glibc issues are I believe cases where glibc is being 'too clever' and valgrind is not able to understand that what it is doing is safe. There are some patches around on the net to clean up glibc and stop it generating various false warnings, but the glibc maintainers have refused them. As far as a warning goes, this is actually quite tricky, as the .so will probably still have a symbtol table in it, and it may even have a few symbols in it, so it is no easy to tell that it is stripped.
> I'm not sure whether this is an issue Valgrind has with optimized binaries, > but glibc package cannot be compiled without optimizations. This is a problem we ran into first on SuSE 9.3 I believe. Because V's error suppression machinery depends on spotting certain symbols in ld-2.3.X.so, there's not much that can be done about this. The SuSE folks in the end switched to shipping a non-stripped ld-2.3.X.so. I think it also gave them problems when using gdb to debug threaded apps (IIRC). Your best bet is to ensure that whoever assembles your distribution doesn't strip ld-2.3.X.so.
Ok, the symbols don't increase the ld-2.3.x.so size that much (~90KB -> 110KB), so that seems quite reasonable, maybe even on embedded devices (for which development Scratchbox is intended). Are there similar problems with other libraries, or just with ld-2.3.x.so? Btw. Is there some link to these potential Gdb threaded app debugging issues?
This seems too hard to fix.