Valgrind doesn't output any debug info when printing a backtrace that has function calls inside dlopen'ed modules. The debug info is there and usable, because I got one backtrace from glibc (free(): invalid pointer error) with the complete backtrace, where valgrind wasn't able to display anything useful.
uhm, it doesn't seem to do instrumentation at all with dlopen'ed libraries, as I changed the shared libraries to be linked at compile time (rather than using dlopen) and now valgrind shows many many (valid) warnings in those libraries that wasn't showing previously.
> ------- uhm, it doesn't seem to do instrumentation at all with dlopen'ed > libraries, That just can't be the case (well, it isn't). If it was, Valgrind would be completely useless for debugging KDE apps and OpenOffice, both of which are totally dependent on dlopen. There must be something else wrong. Can you make a small test case which shows the problem?
There's no mystery here Julian - it's just the well known problem of not getting any names for addresses that come from a library that has been dlclosed surely?
Ah, I think I interpreted "doesn't do instrumentation" too literally. Apologies to Nuno.
In reply to Tom, no the libraries are only dlclose'd when the program ends, so that's not the problem. However the files are unlink'ed after dlopen, so I don't know if that makes any difference or not. This is a complex program and extracting a test case is always difficult, but I'll try hard ;)
So are we talking about memory leaks (where the trace will be printed after the program has ended) or other sorts of errors that are printed in real time as it is running? I don't think the unlinking will matter as valgrind will read the symbols when the mmap that the dlopen does is seenby valgrind, which will be before the unlink.
Uhm, I still don't know what is really happening. Sometimes it prints errors, sometimes not. But it never has debug info. Compare the following (the .bin.tmp file is the dlopened library): A Glibc backtrace (using a wrapper around the atoi function): ./OPENRloader(__wrap_atoi+0x60) [0x8053cf0] MS/OPEN-R/MW/OBJS/ORLROBOT.BIN.tmp(_ZN7Segment9configureE10Dictionary+0x102) [0xb2457b58] MS/OPEN-R/MW/OBJS/ORLROBOT.BIN.tmp(_ZN3Pam17initPamParametersEPc+0x151) [0xb24503fd] MS/OPEN-R/MW/OBJS/ORLROBOT.BIN.tmp(_ZN3Pam4initEv+0x98) [0xb2450720] MS/OPEN-R/MW/OBJS/ORLROBOT.BIN.tmp(_ZN8ORLRobot6DoInitERK12OSystemEvent+0x32e) [0xb242f1a0] MS/OPEN-R/MW/OBJS/ORLROBOT.BIN.tmp(__start_module+0x50) [0xb242c9f0] ./OPENRloader(_Z15module_executorPv+0x8c) [0x805a21c] /lib/libpthread.so.0 [0xb7f3d4bb] /lib/libc.so.6(__clone+0x5e) [0xb7da89ce] valgrind's backtrace: ==13761== Thread 13: ==13761== Invalid read of size 1 ==13761== at 0x41A3A1B: (within /lib/libc-2.5.so) ==13761== by 0x41A377E: __strtol_internal (in /lib/libc-2.5.so) ==13761== by 0x8053C15: __wrap_atoi (stdlib.h:336) ==13761== by 0x7BEFB57: ??? ==13761== by 0x7BE83FC: ??? ==13761== by 0x7BE871F: ??? ==13761== by 0x7BC719F: ??? ==13761== by 0x7BC49EF: ??? ==13761== by 0x805A0FB: module_executor(void*) (helper.cc:40) ==13761== by 0x40414BA: start_thread (in /lib/libpthread-2.5.so) ==13761== by 0x42349CD: clone (in /lib/libc-2.5.so) ==13761== Address 0x0 is not stack'd, malloc'd or (recently) free'd I don't know what kind of information may help you. DO you want the binaries to try to run locally or some dump from some tool? Just ask ;)
For the record, Julian found the cause of the problem: chroot(). After doing a chroot(), valgrind isn't capable of loading the debug informartion from dlopen'ed libraries.
My initial observation is that the bogus addresses ... ==14269== by 0x7BEFB57: ??? etc do not look like the normal code addresses you get on x86-linux, and from the log valgrind did not load any .so's anywhere near there. Also looking at the log I see this --14269-- Reading syms from /cvs/openSDK/openSDK/OPENRloader (0x8048000) and then Loaded module: MS/OPEN-R/MW/OBJS/ORLROBOT.BIN (thread id: 184982416::3) etc so my guess is your application is using some custom loader of its own to load these .BIN files. Still, it should not crash. Can you try again with --smc-check=all ?
I agree wth Tom's previous analysis re dlclose, but on the other hand something is not right here. > I don't know what kind of information may help you. The complete log of a run, with -v, would be a good start.
> no, no. I'm not doing any special magic. those .BIN files are .so gziped. > My application links with zlib to gunzip them at run-time, then writes the > output to the xx.BIN.tmp file and then calls dlopen() on that temporary > file. [scratches head] Ok, I give up. It is too hard to debug remotely. If you can get me a tarball to easily reproduce this with, I'll have a look.
I'm closing crashing and similar bugs that are more than two years old. If you still see this problem with Valgrind 3.4.1 please reopen the bug report. Thanks.
The bug is still there. I did some debugging with Julian and we came to the conclusion that the problem is with chroot() after dlopen'ing the library.
*** Bug 291380 has been marked as a duplicate of this bug. ***