Bug 113642 - valgrind crashes when trying to read debug information
Summary: valgrind crashes when trying to read debug information
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.1 SVN
Platform: Fedora RPMs Linux
: NOR crash
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-30 21:09 UTC by Bryan O'Sullivan
Modified: 2005-10-04 18:55 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
vg.out (3.50 KB, text/plain)
2005-09-30 21:21 UTC, Bryan O'Sullivan
Details
vg.out (4.00 KB, text/plain)
2005-09-30 21:26 UTC, Bryan O'Sullivan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bryan O'Sullivan 2005-09-30 21:09:28 UTC
Version:           3.1 SVN (using KDE KDE 3.3.2)
Installed from:    Fedora RPMs
Compiler:          gcc version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4) App compiled with PathScale EKOPath 2.1 compiler
OS:                Linux

I have an app compiled with the PathScale 2.2 compilers, which valgrind is unable to even load successfully.

I am using the latest valgrind SVN trunk rev, x86_64, RHEL4.

I get this crash:

valgrind: the 'impossible' happened:
   Killed by fatal signal
==12103==    at 0x7004BAC6: vgModuleLocal_read_debuginfo_dwarf2 (dwarf.c:924)
==12103==    by 0x7002EB65: read_lib_symbols (symtab.c:1749)
==12103==    by 0x7002ED51: vgPlain_read_seg_symbols (symtab.c:1803)
==12103==    by 0x7002B286: vgPlain_di_notify_mmap (symtab.c:197)
==12103==    by 0x70037954: vgModuleLocal_generic_PRE_sys_mmap (syswrap-generic.c:1807)
==12103==    by 0x700483E6: vgSysWrap_amd64_linux_sys_mmap_before (syswrap-amd64-linux.c:1151)
==12103==    by 0x70048A1E: vgPlain_client_syscall (syswrap-main.c:653)
==12103==    by 0x70032FBD: handle_syscall (scheduler.c:618)
==12103==    by 0x700332CF: vgPlain_scheduler (scheduler.c:720)
==12103==    by 0x70053082: vgModuleLocal_thread_wrapper (syswrap-linux.c:82)
==12103==    by 0x70044ADD: run_a_thread_NORETURN (syswrap-amd64-linux.c:117)
Comment 1 Julian Seward 2005-09-30 21:18:46 UTC
> I am using the latest valgrind SVN trunk rev, x86_64, RHEL4.


Does readelf think this exe/.so is OK?

Does it work with 3.0.1, or is this a regression?

Either way .. basically we'll need the object to chase this
down.  Possible?
Comment 2 Bryan O'Sullivan 2005-09-30 21:21:12 UTC
Created attachment 12792 [details]
vg.out

Complete valgrind output for the failing run, as collected with -v.
Comment 3 Bryan O'Sullivan 2005-09-30 21:22:35 UTC
Comment on attachment 12792 [details]
vg.out

I have a newer output file.
Comment 4 Bryan O'Sullivan 2005-09-30 21:26:25 UTC
Created attachment 12793 [details]
vg.out

Here's that newer output file.
Comment 5 Bryan O'Sullivan 2005-09-30 21:36:13 UTC
I can't tell whether the app works with 3.0.1 any longer, as I don't have it any more.  I doubt that it's a regression.
Comment 6 Tom Hughes 2005-10-04 16:28:42 UTC
What does "readelf -S /usr/lib64/libmpichf90nc.so.2.0" say?

Judging by the fact that the fault is on address zero I suspect we will find that it has a debug_line section but no debug_info section which is a bit odd.

The symbol table reader in valgrind is currently using the presence of debug_line to indicate DWARF2 and then assuming that debug_info will be present. That combined with the fact that the loop in ML_(read_debuginfo_dwarf2) will go mad if the size of the debug_info section is less than four bytes would cause this sort of crash.
Comment 7 Nicholas Nethercote 2005-10-04 16:35:08 UTC
On Tue, 4 Oct 2005, Tom Hughes wrote:

> The symbol table reader in valgrind is currently using the presence of 
> debug_line to indicate DWARF2 and then assuming that debug_info will be 
> present.


So

       if (debug_line) {

should become

       if (debug_line && debug_info) {

?  And maybe add in (debug_info_sz > 4), and possibly check that debug_str 
and debug_abbv are non-NULL.
Comment 8 Tom Hughes 2005-10-04 16:38:53 UTC
Something like that, yes.
Comment 9 Julian Seward 2005-10-04 17:30:37 UTC
> ?  And maybe add in (debug_info_sz > 4), and possibly check that debug_str
> and debug_abbv are non-NULL.


Or perhaps .. for dealing with potentially explosive pointers like
this, we could inquire with aspacem whether it's safe to dereference
(perhaps by checking that the pointer points into the same
segment that the .so has been transiently mmaped into).  It's
two calls to VG_(am_find_nsegment), but that's not catastrophically
expensive since it's a binary search now.
Comment 10 Tom Hughes 2005-10-04 17:34:33 UTC
In message <20051004153039.9255.qmail@ktown.kde.org>
        Julian Seward <jseward@acm.org> wrote:

> Or perhaps .. for dealing with potentially explosive pointers like
> this, we could inquire with aspacem whether it's safe to dereference
> (perhaps by checking that the pointer points into the same
> segment that the .so has been transiently mmaped into).  It's
> two calls to VG_(am_find_nsegment), but that's not catastrophically
> expensive since it's a binary search now.


Well the address should be part of memory that we have just mmaped so
it shouldn't really be bogus...

The problem here is that there was no such section in the file so the
default value of zero is still in the variable.

Tom
Comment 11 Bryan O'Sullivan 2005-10-04 17:52:51 UTC
By the way, the compiler in question has a history of generating somewhat questionable debug info.  If you think it's a compiler bug, valgrind should probably still handle it (since the compiler is in the wild), but please let me know, and I'll make sure any bug gets fixed.

Thanks.
Comment 12 Tom Hughes 2005-10-04 18:55:45 UTC
I have committed a fix for this as revision 4856.