Bug 338803

Summary: Handling of dwz debug alt files or cross-CU is broken
Product: [Developer tools] valgrind Reporter: Mark Wielaard <mark>
Component: generalAssignee: Mark Wielaard <mark>
Status: RESOLVED FIXED    
Severity: normal CC: cpigat242, philippe.waroquiers
Priority: NOR    
Version First Reported In: unspecified   
Target Milestone: ---   
Platform: Other   
OS: Linux   
See Also: https://bugs.kde.org/show_bug.cgi?id=452058
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: Testcase for bug 338803. Handling of dwz debug alt files is broken.
Partial fix
complete patch to fix inline info reading in alternate debug info
Updated Testcase for bug 338803. Handling of dwz debug alt files is broken.
disable dereferencing of cross-CU inlined fn name
Rewrite DWARF inlined subroutine handling to work cross CU

Description Mark Wielaard 2014-09-04 13:01:53 UTC
When reading the debuginfo (DWARF) of a file that uses dwz debug altfiles the handling of the DW_OP_GNU_ref_alt isn't going as expected when --read-inline-info=yes is used. Errors include:

--27307-- WARNING: Serious error when reading debug info
--27307-- When reading debug info from /opt/local/src/valgrind/inlinfo1:
--27307-- abbv_code not found in ht_abbvs table

and/or

--27254-- WARNING: Serious error when reading debug info
--27254-- When reading debug info from /usr/lib64/liblzma.so.5.0.99:
--27254-- get_inlFnName: absori not a subprogram


Reproducible: Always




This partly depends on solving bug #338791 "alt dwz files can be relative of debug/main file" to see it with non-system binaries/libraries.

Philippe analysed it and the issue has at least two parts:

- the inline function absorigin pointing to alt debug info is wrongly used as being in the normal debug info (passing around DIEs means "cooking/uncooking" them, which get_inlFnName doesn't do).

- then the abbreviation used in the absori is wrongly interpreted as an abbreviation coming from the normal debug info; while it should be in the alt debug info. The fix for this will very probably imply to have 2 abbrev  hash tables in the cc : one for the normal info; and one for the alt info.

[Note: "absori" refers to the DIE referenced by the DW_AT_abstract_origin attribute of an DW_TAG_inlined_subroutine DIE.]
Comment 1 Mark Wielaard 2014-09-04 17:49:17 UTC
Created attachment 88565 [details]
Testcase for bug 338803. Handling of dwz debug alt files is  broken.

If we have dwz installed create inlinfoalt and inlinfoalt.dwz from
the original inlinfo testcase. The expected output should be the same
as from the inlinfo testcase.

Depends on fix of bug #338791. Currently fails.
Comment 2 Mark Wielaard 2014-09-04 17:55:07 UTC
Created attachment 88566 [details]
Partial fix

Partial fix based on code from Philippe. This doesn't produce any warnings anymore with the new inlinfoalt testcase, but the stacktrace is not correct/complete. Probably because  get_abbv () returns the wrong result in the alt case.
Comment 3 Philippe Waroquiers 2014-09-04 20:30:59 UTC
Created attachment 88567 [details]
complete patch to fix inline info reading in alternate debug info

Solves all known problems.
And contains an ugly kludge.
Comment 4 Mark Wielaard 2014-09-05 14:21:39 UTC
Created attachment 88578 [details]
Updated Testcase for bug 338803. Handling of dwz debug alt files is broken.

If we have dwz installed create inlinfoalt and inlinfoalt.dwz from
the original inlinfo testcase. The expected output should be the same
as from the inlinfo testcase.

Updated to include the new symlinked exp files in EXTRA_DIST and add stderr_filter_args: inlinfo to inlinfoalt.vgtest so the filters really work as if this really is inlinfo.

This now fails without the proposed fix and and passes with.
Comment 5 Mark Wielaard 2014-09-05 14:36:20 UTC
(In reply to Mark Wielaard from comment #4)
> This now fails without the proposed fix and and passes with.

But I am afraid the proposed fix still isn't completely correct. I can still get the wrong abbrev being handled with larger programs. The issue as far as I can see is that the abbrev cache is only for the "current CU", but a DIE ref can be in a completely different CU (either full or partial). I'll try to get a smaller testcase.
Comment 6 Mark Wielaard 2014-09-05 14:50:53 UTC
To show what seems to go wrong with the larger example. First we see this CU that contains the subprogram definition:

  Compilation Unit @ offset 0x5e4:
   Length:        41
   Version:       4
   Abbrev Offset: 20884
   Pointer Size:  8
Adding abbv_code 1 TAG  DW_TAG_formal_parameter [no children] nf 3   [8,0] [4,
0] [0,0]
[...]
  Adding abbv_code 98 TAG  DW_TAG_subprogram [has children] nf 7   [11,0] [7,0] [6,0] [5,0] [5,0] [1,0] [0,0] 
[...]
 <0><5ef>: Abbrev Number: 25 (DW_TAG_partial_unit)
     DW_AT_stmt_list   : 921    
     DW_AT_comp_dir    : (indirect alt string, offset: 0x1d77): /usr/src/debug/xz-5.1.2alpha/src/liblzma        
 The Directory Table:
  common
  /usr/include
  ../../src/liblzma/api/lzma

  read_filename_table: 1 fndn_ix 158 common block_util.c
  read_filename_table: 2 fndn_ix 159 common index.h
  read_filename_table: 3 fndn_ix 2 /usr/include stdint.h
  read_filename_table: 4 fndn_ix 148 ../../src/liblzma/api/lzma base.h
  read_filename_table: 5 fndn_ix 155 ../../src/liblzma/api/lzma vli.h
  read_filename_table: 6 fndn_ix 156 ../../src/liblzma/api/lzma check.h
  read_filename_table: 7 fndn_ix 157 ../../src/liblzma/api/lzma filter.h
  read_filename_table: 8 fndn_ix 160 ../../src/liblzma/api/lzma block.h

 <1><5f8>: Abbrev Number: 98 (DW_TAG_subprogram)
     DW_AT_name        : (indirect alt string, offset: 0x774): vli_ceil4        
     DW_AT_decl_file   : 2      
     DW_AT_decl_line   : 39     
     DW_AT_prototyped  : 1      
     DW_AT_type        : 0x30   
     DW_AT_inline      : 3      
    uninteresting DIE -> skipping ...
[...]

Then some time later we see this CU that contains the inlined_subroutine:

  Compilation Unit @ offset 0x3321:
   Length:        384
   Version:       4
   Abbrev Offset: 3753
   Pointer Size:  8
  Adding abbv_code 1 TAG  DW_TAG_formal_parameter [no children] nf 6   [14,0] [1
0,0] [9,0] [8,0] [4,0] [0,0] 
[...]
  Adding abbv_code 98 TAG  DW_TAG_inlined_subroutine [has children] nf 6   [12,2] [8,2] [4294967295,3] [2,0] [1,0] [0,0] 
[...]
 <0><332c>: Abbrev Number: 80 (DW_TAG_compile_unit)
     DW_AT_producer    : (indirect alt string, offset: 0x58a): GNU C 4.8.2 20140120 (Red Hat 4.8.2-12) -m64 -mtune=generic -march=x86-64 -g -O2 -std=gnu99 -fvisibility=hidden -fexceptions -fstack-protector-strong -fPIC --param ssp-buffer-size=4    
     DW_AT_language    : 1      
     DW_AT_name        : (indirect alt string, offset: 0x1f4f): common/block_util.c     
     DW_AT_comp_dir    : (indirect alt string, offset: 0x1d77): /usr/src/debug/xz-5.1.2alpha/src/liblzma        
     DW_AT_low_pc      : 0x3720 
     DW_AT_high_pc     : 268    
     DW_AT_stmt_list   : 921    
 The Directory Table:
  common
  /usr/include
  ../../src/liblzma/api/lzma
[...]
 <2><3475>: Abbrev Number: 87 (DW_TAG_inlined_subroutine)
     DW_AT_abstract_ori: 0x5f8  
     DW_AT_low_pc      : 0x381f 
     DW_AT_high_pc     : 8      
     DW_AT_call_file   : 1      
     DW_AT_call_line   : 87     
     DW_AT_sibling     : <3491> 
 <get_inlFnName><5f8>: Abbrev Number: 98 (DW_TAG_inlined_subroutine)

------ .debug_info reading failed ------
--9061-- WARNING: Serious error when reading debug info
--9061-- When reading debug info from /usr/lib64/liblzma.so.5.0.99:
--9061-- get_inlFnName: absori not a subprogram

Oops, we used the abbrev table cache from this CU, but the subprogram DIE that the abstract_origin points to is in another DIE, so we misinterpret Abbrev Number: 98.
Comment 7 Philippe Waroquiers 2014-09-05 21:54:55 UTC
Created attachment 88584 [details]
disable dereferencing of cross-CU inlined fn name

After more in depth analysis and discussion of the problematic cases, it became
clear that any cross CU reference is broken.
For the inline info, this can happen for the inlined function name.
For var info, it seems other problems happen but not analysed.
The attached patch bypasses the problem for inlined info by detecting cross-CU
reference and giving UnknownInlinedFun for this case.
A proper solution might be implemented later.
Note that such cross-CU references are only known to appear when using
alternate debug dwz file and/or executables that have  dwarf info optimised by dwz.
Comment 8 Philippe Waroquiers 2014-09-06 00:15:21 UTC
After review by Mark, committed a slightly revised version of the disabling patch
in revision 14476.
That allows inlined info to be read (but inlined function names might be reported as
unknown).
A better solution is needed.
Comment 9 Mark Wielaard 2014-09-07 12:44:25 UTC
I like the workaround/solution, but I am not a fan of the warning which shows up even with -q.
On a system (fedora) with lots of system libraries having been compressed by DWZ this shows up a lot. Would you be fine with something like this to suppress it with -q:

diff --git a/coregrind/m_debuginfo/readdwarf3.c b/coregrind/m_debuginfo/readdwarf3.c
index 825df53..8453d3d 100644
--- a/coregrind/m_debuginfo/readdwarf3.c
+++ b/coregrind/m_debuginfo/readdwarf3.c
@@ -2558,7 +2558,7 @@ static HChar* get_inlFnName (Int absori, CUConst* cc, Bool td3)
        || posn < cc->cu_start_offset
        || posn >= cc->cu_start_offset + cc->unit_length) {
       static Bool reported = False;
-      if (!reported) {
+      if (!reported && VG_(clo_verbosity) > 0) {
          VG_(message)(Vg_DebugMsg,
                       "Warning: cross-CU LIMITATION: some inlined fn names\n"
                       "might be shown as UnknownInlinedFun\n");
Comment 10 Philippe Waroquiers 2014-09-07 12:47:33 UTC
(In reply to Mark Wielaard from comment #9)
> I like the workaround/solution, but I am not a fan of the warning which
> shows up even with -q.
> On a system (fedora) with lots of system libraries having been compressed by
> DWZ this shows up a lot. Would you be fine with something like this to
> suppress it with -q:
> 
> diff --git a/coregrind/m_debuginfo/readdwarf3.c
> b/coregrind/m_debuginfo/readdwarf3.c
> index 825df53..8453d3d 100644
> --- a/coregrind/m_debuginfo/readdwarf3.c
> +++ b/coregrind/m_debuginfo/readdwarf3.c
> @@ -2558,7 +2558,7 @@ static HChar* get_inlFnName (Int absori, CUConst* cc,
> Bool td3)
>         || posn < cc->cu_start_offset
>         || posn >= cc->cu_start_offset + cc->unit_length) {
>        static Bool reported = False;
> -      if (!reported) {
> +      if (!reported && VG_(clo_verbosity) > 0) {
>           VG_(message)(Vg_DebugMsg,
>                        "Warning: cross-CU LIMITATION: some inlined fn
> names\n"
>                        "might be shown as UnknownInlinedFun\n");

Yes, fine.
Maybe we might even use '> 1' to have it shown only when the user asks for the non
default verbosity ?
user specifies -v
Comment 11 Mark Wielaard 2014-09-08 09:29:16 UTC
(In reply to Philippe Waroquiers from comment #10)
> Yes, fine.
> Maybe we might even use '> 1' to have it shown only when the user asks for
> the non
> default verbosity ?
> user specifies -v

Did that as valgrind svn r14492. Having the warning with -v is actually nice since it will immediately follow the first "Reading syms from ..." message that prints which debug files were considered. Which gives the user a hint which file contains the problematic/compressed DWARF.
Comment 12 Mark Wielaard 2025-06-16 18:56:17 UTC
Created attachment 182310 [details]
Rewrite DWARF inlined subroutine handling to work cross CU

Rewrite DWARF inlined subroutine handling to work cross CU
https://code.wildebeest.org/git/user/mjw/valgrind/commit/?h=inline-backtrace-post

The readdwarf3 parsers cannot read DIEs across CUs. An inlined
subroutine refers to an subprogram which has a name (or refers to a
declaration of a subprogram that has a name). These subprograms can be
(and often are when dwz has been used to compress the DWARF) in a
different CU. So a lot of inlined subroutines in backtraces are just
called "UnknownInlinedFun".

To work around not being able to read DIEs across CUs directly we
don't try to immediately resolve the name of the inlined subroutine by
following the abstract origin reference to the subprogram, but just
record it in the DiInlLoc. We also record all subprogram indexes while
parsing in a new DiSubprogram structure and whether the subprogram had
a name or had a reference to another subprogram (specification).
    
We have to look under a couple more DIEs. We normally want to skip any
DIE that doesn't have an address range when looking for inlined
subroutines, but there are various other DIEs that can contain a
subprogram (specification).

We also want to walk the DIEs from low to high (cooked DIE) index, so
we first pass over the main .debug_info, then the .debug_types, and
finally the alt .debug_info. That way we can store the DiSubprograms
in an array from low to high index and use a binary search to connect
the inlined subroutines to the subprogram that contains the name.

The code also tracks whether the subprogram is artificial, but this
isn't used yet. But should make it possible for a followup patch to
remove artificial inlined subroutines from a backtrace.

Tested against emacs and libreoffice as packaged in Fedora where the
programs and all shared libraries used are processed with dwz. The new
code gives a name to every inlined subroutine. Except when the DWARF
produced is bad and the DW_AT_subroutine didn't contain an
DW_AT_abstract_origin and so no DW_AT_subprogram can be found.
Comment 13 Mark Wielaard 2025-06-19 22:02:53 UTC
commit f7dccaab11b8dc1af2bbcd31dea5bb7a50c6f811
Author: Mark Wielaard <mark@klomp.org>
Date:   Thu May 29 23:41:52 2025 +0200

    Rewrite DWARF inlined subroutine handling to work cross CU