Bug 142086 - valgrind doesn't output debug info with dlopen'ed libraries
Summary: valgrind doesn't output debug info with dlopen'ed libraries
Status: CONFIRMED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.2.3
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-23 00:50 UTC by Nuno Lopes
Modified: 2012-01-12 16:09 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nuno Lopes 2007-02-23 00:50:55 UTC
Valgrind doesn't output any debug info when printing a backtrace that has 
function calls inside dlopen'ed modules. The debug info is there and usable, 
because I got one backtrace from glibc (free(): invalid pointer error) with 
the complete backtrace, where valgrind wasn't able to display anything useful.
Comment 1 Nuno Lopes 2007-03-16 22:51:05 UTC
uhm, it doesn't seem to do instrumentation at all with dlopen'ed libraries, as I changed the shared libraries to be linked at compile time (rather than using dlopen) and now valgrind shows many many (valid) warnings in those libraries that wasn't showing previously.
Comment 2 Julian Seward 2007-03-16 22:58:09 UTC
> ------- uhm, it doesn't seem to do instrumentation at all with dlopen'ed
> libraries,


That just can't be the case (well, it isn't).  If it was, Valgrind would be
completely useless for debugging KDE apps and OpenOffice, both of which are
totally dependent on dlopen.

There must be something else wrong.  Can you make a small test case which
shows the problem?
Comment 3 Tom Hughes 2007-03-17 00:16:09 UTC
There's no mystery here Julian - it's just the well known problem of not getting any names for addresses that come from a library that has been dlclosed surely?
Comment 4 Julian Seward 2007-03-17 00:32:41 UTC
Ah, I think I interpreted "doesn't do instrumentation" too literally.
Apologies to Nuno.
Comment 5 Nuno Lopes 2007-03-17 00:50:07 UTC
In reply to Tom, no the libraries are only dlclose'd when the program ends, so that's not the problem. However the files are unlink'ed after dlopen, so I don't know if that makes any difference or not.
This is a complex program and extracting a test case is always difficult, but I'll try hard ;)
Comment 6 Tom Hughes 2007-03-17 01:52:07 UTC
So are we talking about memory leaks (where the trace will be printed after the program has ended) or other sorts of errors that are printed in real time as it is running?

I don't think the unlinking will matter as valgrind will read the symbols when the mmap that the dlopen does is seenby valgrind, which will be before the unlink.
Comment 7 Nuno Lopes 2007-03-23 23:37:12 UTC
Uhm, I still don't know what is really happening. Sometimes it prints errors, sometimes not. But it never has debug info. Compare the following (the .bin.tmp file is the dlopened library):


A Glibc backtrace (using a wrapper around the atoi function):

./OPENRloader(__wrap_atoi+0x60) [0x8053cf0]
MS/OPEN-R/MW/OBJS/ORLROBOT.BIN.tmp(_ZN7Segment9configureE10Dictionary+0x102) [0xb2457b58]
MS/OPEN-R/MW/OBJS/ORLROBOT.BIN.tmp(_ZN3Pam17initPamParametersEPc+0x151) [0xb24503fd]
MS/OPEN-R/MW/OBJS/ORLROBOT.BIN.tmp(_ZN3Pam4initEv+0x98) [0xb2450720]
MS/OPEN-R/MW/OBJS/ORLROBOT.BIN.tmp(_ZN8ORLRobot6DoInitERK12OSystemEvent+0x32e) [0xb242f1a0]
MS/OPEN-R/MW/OBJS/ORLROBOT.BIN.tmp(__start_module+0x50) [0xb242c9f0]
./OPENRloader(_Z15module_executorPv+0x8c) [0x805a21c]
/lib/libpthread.so.0 [0xb7f3d4bb]
/lib/libc.so.6(__clone+0x5e) [0xb7da89ce]


valgrind's backtrace:
==13761== Thread 13:
==13761== Invalid read of size 1
==13761==    at 0x41A3A1B: (within /lib/libc-2.5.so)
==13761==    by 0x41A377E: __strtol_internal (in /lib/libc-2.5.so)
==13761==    by 0x8053C15: __wrap_atoi (stdlib.h:336)
==13761==    by 0x7BEFB57: ???
==13761==    by 0x7BE83FC: ???
==13761==    by 0x7BE871F: ???
==13761==    by 0x7BC719F: ???
==13761==    by 0x7BC49EF: ???
==13761==    by 0x805A0FB: module_executor(void*) (helper.cc:40)
==13761==    by 0x40414BA: start_thread (in /lib/libpthread-2.5.so)
==13761==    by 0x42349CD: clone (in /lib/libc-2.5.so)
==13761==  Address 0x0 is not stack'd, malloc'd or (recently) free'd


I don't know what kind of information may help you. DO you want the binaries to try to run locally or some dump from some tool? Just ask ;)
Comment 8 Nuno Lopes 2007-03-25 16:42:33 UTC
For the record, Julian found the cause of the problem: chroot(). After doing a chroot(), valgrind isn't capable of loading the debug informartion from dlopen'ed libraries.
Comment 9 Julian Seward 2007-03-26 11:24:31 UTC
My initial observation is that the bogus addresses ...

==14269==    by 0x7BEFB57: ???

etc do not look like the normal code addresses you get on x86-linux,
and from the log valgrind did not load any .so's anywhere near there.

Also looking at the log I see this
--14269-- Reading syms from /cvs/openSDK/openSDK/OPENRloader (0x8048000)
and then
Loaded module: MS/OPEN-R/MW/OBJS/ORLROBOT.BIN (thread id: 184982416::3)
etc

so my guess is your application is using some custom loader of its own
to load these .BIN files.  

Still, it should not crash.  Can you try again with --smc-check=all ?
Comment 10 Julian Seward 2007-03-26 11:27:10 UTC
I agree wth Tom's previous analysis re dlclose, but on the other hand
something is not right here.

> I don't know what kind of information may help you.


The complete log of a run, with -v, would be a good start.
Comment 11 Julian Seward 2007-03-26 11:27:39 UTC
> no, no. I'm not doing any special magic. those .BIN files are .so gziped.
> My application links with zlib to gunzip them at run-time, then writes the
> output to the xx.BIN.tmp file and then calls dlopen() on that temporary
> file.


[scratches head]

Ok, I give up.  It is too hard to debug remotely.  If you can get me a
tarball to easily reproduce this with, I'll have a look.
Comment 12 Nicholas Nethercote 2009-06-30 04:37:18 UTC
I'm closing crashing and similar bugs that are more than two years old.  If 
you still see this problem with Valgrind 3.4.1 please reopen the bug report.
Thanks.
Comment 13 Nuno Lopes 2009-06-30 19:38:28 UTC
The bug is still there.
I did some debugging with Julian and we came to the conclusion that the problem is with chroot() after dlopen'ing the library.
Comment 14 Tom Hughes 2012-01-12 16:09:09 UTC
*** Bug 291380 has been marked as a duplicate of this bug. ***