Bug 305513 - killed by fatal signal: SIGSEGV in readdwarf.c read_unitinfo_dwarf2
Summary: killed by fatal signal: SIGSEGV in readdwarf.c read_unitinfo_dwarf2
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.8.0
Platform: Fedora RPMs Linux
: NOR major
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-20 20:53 UTC by Robin Green
Modified: 2013-04-17 13:53 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Correctly skip block forms (1.36 KB, patch)
2012-08-21 14:54 UTC, Mark Wielaard
Details
make read_unitinfo_dwarf2 more robust (1.33 KB, patch)
2012-08-21 17:45 UTC, Mark Wielaard
Details
make read_unitinfo_dwarf2 more robust (2.60 KB, patch)
2012-08-21 19:49 UTC, Mark Wielaard
Details
GDB trace with those patches (3.71 KB, text/plain)
2012-08-23 08:38 UTC, tv
Details
GDB trace with those patches with more debug info (2.06 KB, text/plain)
2012-08-23 08:40 UTC, tv
Details
valgrind output with s/if (0/if (1/ (40.16 KB, application/force-download)
2012-08-24 13:52 UTC, tv
Details
GDB trace with those patches with more debug info (3.95 KB, text/plain)
2012-08-24 13:54 UTC, tv
Details
GDB trace with those patches with debug info (+vg printff enabled) (4.63 KB, text/plain)
2012-08-27 14:33 UTC, tv
Details
GDB trace with those patches with debug info (vg printff disabled) (2.27 KB, text/plain)
2012-08-27 14:33 UTC, tv
Details
prevent valgrind to segfault (719 bytes, patch)
2012-08-27 15:24 UTC, tv
Details
Simplify read_unitinfo_dwarf2. Only try to read the first DIE. (4.62 KB, patch)
2013-04-12 06:23 UTC, Mark Wielaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robin Green 2012-08-20 20:53:15 UTC
I got this kind of error with my KDE application, and it occurs with kwrite too:

Reproducible: Always

Steps to Reproduce:
1. valgrind kwrite
Actual Results:  


==5570== Memcheck, a memory error detector
==5570== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==5570== Using Valgrind-3.8.0 and LibVEX; rerun with -h for copyright info
==5570== Command: kwrite
==5570== 
--5570-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--5570-- si_code=1;  Faulting address: 0x403796000;  sp: 0x4030d1018

valgrind: the 'impossible' happened:
   Killed by fatal signal
==5570==    at 0x380D3B00: read_leb128 (readdwarf.c:221)
==5570==    by 0x380D3B96: read_leb128U (readdwarf.c:247)
==5570==    by 0x380D6294: vgModuleLocal_read_debuginfo_dwarf3 (readdwarf.c:956)
==5570==    by 0x38085FEF: vgModuleLocal_read_elf_debug_info (readelf.c:2682)
==5570==    by 0x3807EED5: vgPlain_di_notify_mmap (debuginfo.c:628)
==5570==    by 0x380A0E68: vgModuleLocal_generic_PRE_sys_mmap (syswrap-generic.c:2066)
==5570==    by 0x380CA0C4: vgSysWrap_amd64_linux_sys_mmap_before (syswrap-amd64-linux.c:1012)
==5570==    by 0x3809D9B2: vgPlain_client_syscall (syswrap-main.c:1464)
==5570==    by 0x3809A6FF: handle_syscall (scheduler.c:1057)
==5570==    by 0x3809BC36: vgPlain_scheduler (scheduler.c:1335)
==5570==    by 0x380AB739: run_a_thread_NORETURN (syswrap-linux.c:103)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
==5570==    at 0x30C8217A0A: mmap (syscall-template.S:81)
==5570==    by 0x30C82068DB: _dl_map_object_from_fd (dl-load.c:1344)
==5570==    by 0x30C82083C2: _dl_map_object (dl-load.c:2359)
==5570==    by 0x30C820CCE1: openaux (dl-deps.c:63)
==5570==    by 0x30C820EDC5: _dl_catch_error (dl-error.c:177)
==5570==    by 0x30C820D3C1: _dl_map_object_deps (dl-deps.c:256)
==5570==    by 0x30C820377B: dl_main (rtld.c:1834)
==5570==    by 0x30C821529A: _dl_sysdep_start (dl-sysdep.c:242)
==5570==    by 0x30C8204FC1: _dl_start (rtld.c:336)
==5570==    by 0x30C8201597: ??? (in /usr/lib64/ld-2.16.so)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

Expected Results:  
No crash
Comment 1 Mark Wielaard 2012-08-21 14:54:18 UTC
Created attachment 73360 [details]
Correctly skip block forms

This bug is also https://bugzilla.redhat.com/show_bug.cgi?id=849783

Block forms aren't correctly skipped. p += read_leb128U( &p ); results in undefined behaviour.

The underlying issue is that with dwz there are partial_units and import_units which valgrind doesn't understand yet, so it keeps looking for a "proper" compile_unit TAG which it never finds and so needs to process all the DIEs. (Which should get its own bug/fix.)
Comment 2 Mark Wielaard 2012-08-21 17:45:43 UTC
Created attachment 73363 [details]
make read_unitinfo_dwarf2 more robust

jseward asked if we could make things a little bit more robust here. This patch makes the loops bound by checking we don't go beyond end_img. There is still some potential for bad things to happen in for example the read_leb128 functions, which don't know how far they can read. But this patch alone would have prevented the crash seen (without the other patch). With only this patch applied the the testcase would spew some warnings/errors, but continue running.
Comment 3 Mark Wielaard 2012-08-21 19:49:02 UTC
Created attachment 73368 [details]
make read_unitinfo_dwarf2 more robust

(In reply to comment #2)
> With only this patch applied the the testcase would spew some warnings/errors, but
> continue running.

Which isn't too surprising since I messed up the check and no abbrev was read anymore... doh.
This new patch checks the correct bounds.
Comment 4 tv 2012-08-23 08:38:37 UTC
Created attachment 73411 [details]
GDB trace with those patches

Even with those patches, "valgrind rpm -q" still segfaults on Mageia Cauldron when rpm-debug is installed (equivalent of rpm-debuginfo on RH/FC)
Comment 5 tv 2012-08-23 08:40:53 UTC
Created attachment 73412 [details]
GDB trace with those patches with more debug info
Comment 6 Mark Wielaard 2012-08-24 08:23:26 UTC
(In reply to comment #4)
> Created attachment 73411 [details]
> GDB trace with those patches
> 
> Even with those patches, "valgrind rpm -q" still segfaults on Mageia
> Cauldron when rpm-debug is installed (equivalent of rpm-debuginfo on RH/FC)

I am not sure that is the same bug, it happens later. I also cannot replicate it (against fedora rpm).

You are still seeing ### unknown abbrev 0x124 messages. That indicates either that you didn't apply patch 1, or there is some other DWARF parsing bug in there.

Could you run with -v to get some more info, rebuild with CFLAGS="-g -O0" to help your gdb a bit (or maybe install a newer gdb) and change the if (0) to if (1) inside read_unitinfo_dwarf2() so the  VG_(printf)s are triggered to get a  bit more info about what is going on.
Comment 7 tv 2012-08-24 13:52:16 UTC
Created attachment 73435 [details]
valgrind output with s/if (0/if (1/

Here's the ouput you asked for
Comment 8 tv 2012-08-24 13:54:12 UTC
Created attachment 73436 [details]
GDB trace with those patches with more debug info

GDB trace with -O0 (& -g3 like previous trace)
Note that I only "unknown abbrev" on the first run, I never saw them anymore in all the next runs
Comment 9 tv 2012-08-24 13:56:30 UTC
BTW, you can see our packages spec file & sources at http://svnweb.mageia.org/packages/cauldron/valgrind/current/
Comment 10 tv 2012-08-27 14:33:00 UTC
Created attachment 73504 [details]
GDB trace with those patches with debug info (+vg printff enabled)

I've upgraded binutils & dwz, rebuild rpm but still, once rpm-debug is installed, valgrind segfaults (same error but it now parses quite a lot more info before crashing).
As before, the last open files are those of rpm-debug
Comment 11 tv 2012-08-27 14:33:43 UTC
Created attachment 73505 [details]
GDB trace with those patches with debug info (vg printff disabled)
Comment 12 tv 2012-08-27 15:24:10 UTC
Created attachment 73507 [details]
prevent valgrind to segfault

For the record, both ui.name, ui.compdir results in segfault.
With that patch, it no more segfaults.
Comment 13 Mark Wielaard 2013-04-11 17:56:09 UTC
Found an issue that would explain why 

It has been fixed in r13367 "read_unitinfo_dwarf2 DW_FORM_ref_addr is address size in DWARF version 2."

The reason the read_unitinfo_dwarf2 DWARF reader is giving some trouble now is that it is only really used to pick up the debug_line info from the compile unit DIE. That used to be simple since it is just the first DIE of a CU. But with DWZ there can also be partial units. Which read_unitinfo_dwarf2 completely scans. So now a lot more of the parser is exercised.

But I wonder if we shouldn't simplify read_unitinfo_dwarf2 to at least skip partial units. Or maybe even just parse the debug_line section directly.
Comment 14 Mark Wielaard 2013-04-12 06:23:38 UTC
Created attachment 78833 [details]
Simplify read_unitinfo_dwarf2. Only try to read the first DIE.

> But I wonder if we shouldn't simplify read_unitinfo_dwarf2 to at least skip
> partial units.

We can do even better by just checking the first DIE tag. It must be DW_TAG_compile_unit. If it isn't then we really don't need to scan through the whole DIE tree. The children will not have the information we need anyway (only the first top-level DW_TAG_compule_unit will have a DW_AT_stmt_list).

Quote DWARF spec: "For each compilation unit compiled with a DWARF producer, a contribution is made to the .debug_info section of the object file. Each such contribution consists of a compilation unit
header (see Section 7.5.1.1) followed by a single DW_TAG_compile_unit or DW_TAG_partial_unit debugging information entry, together with its children."
Comment 15 Mark Wielaard 2013-04-12 06:27:23 UTC
BTW. attachment 73360 [details] (" Correctly skip block forms") was already committed as r12892

    Fix skipping of block forms when this code is compiled by gcc 4.8 -- it
    has always been incorrect, modifying 'p' twice between sequence points.
    Fixes #305513.  (Mark Wielaard, mjw@redhat.com)
Comment 16 Mark Wielaard 2013-04-17 13:48:41 UTC
13369(In reply to comment #14)
> Created attachment 78833 [details]
> Simplify read_unitinfo_dwarf2. Only try to read the first DIE.

After a brief discussion on irc nobody seemed too concerned about this change and I think it simplifies the code. So committed as r13369.

I think we can close this bug since the root cause of SIGSEGV has probably been dealt with. If there are new/other crashes in this area it would be better to get new bug reports about them.
Comment 17 Tom Hughes 2013-04-17 13:53:26 UTC
Fix committed by mjw as r13369.