Bug 445668

Summary: Inline stack frame generation is broken for Rust binaries
Product: [Developer tools] valgrind Reporter: Nick Nethercote <n.nethercote>
Component: generalAssignee: Mark Wielaard <mark>
Status: RESOLVED FIXED    
Severity: normal CC: mark
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Other   
OS: Linux   
See Also: https://bugs.kde.org/show_bug.cgi?id=445916
Latest Commit: Version Fixed In:
Attachments: Rust reproducer
Compiled file
Revert second part of commit 75e3ef0f3b834f75f49333d35b5a040060247b03

Description Nick Nethercote 2021-11-18 00:49:11 UTC
Here is some DHAT output I am getting with Rust code:

> └── PP 1.14/14 {
>       Total:     118,312 bytes (0.05%, 60.74/Minstr) in 11,864 blocks (1.12%, 6.09/Minstr), avg size 9.97 bytes, avg lifetime 60,918.19 instrs (0% of program duration)
>       Max:       40 bytes in 5 blocks, avg size 8 bytes
>       At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
>       At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
>       Reads:     178,976 bytes (0.03%, 91.89/Minstr), 1.51/byte
>       Writes:    130,464 bytes (0.05%, 66.98/Minstr), 1.1/byte
>       Allocated at {
>         #1: 0xA2D0172: _RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694 (alloc.rs:87)
>         #2: 0xA2C42B2: _RINvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB6_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE19process_obligationsNtB1s_16FulfillProcessorINtB6_7OutcomeB1q_NtNtCsgI90OQiJWEs_11rustc_infer6traits20FulfillmentErrorCodeEEB1w_ (mod.rs:459)
>         #3: 0xA27EA42: _RNvMs_NtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfillNtB4_18FulfillmentContext6select.llvm.5302840341462405287 (fulfill.rs:140)
>       }
>     }

Here's what it should look like:

>    └── PP 1.14/14 {
>       Total:     118,312 bytes (0.05%, 60.81/Minstr) in 11,864 blocks (1.12%, 6.1/Minstr), avg size 9.97 bytes, avg lifetime 60,857.47 instrs (0% of program duration)
>       Max:       40 bytes in 5 blocks, avg size 8 bytes
>       At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
>       At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
>       Reads:     178,976 bytes (0.03%, 91.99/Minstr), 1.51/byte
>       Writes:    130,464 bytes (0.05%, 67.05/Minstr), 1.1/byte
>       Allocated at {
>         #1: 0xA2CD172: alloc (alloc.rs:87)
>         #2: 0xA2CD172: alloc_impl (alloc.rs:169)
>         #3: 0xA2CD172: allocate (alloc.rs:229)
>         #4: 0xA2CD172: exchange_malloc (alloc.rs:318)
>         #5: 0xA2CD172: new<rustc_trait_selection::traits::fulfill::PendingPredicateObligation> (mod.rs:192)
>         #6: 0xA2CD172: _RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694 (mod.rs:376)
>         #7: 0xA2C12B2: _RINvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB6_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE19process_obligationsNtB1s_16FulfillProcessorINtB6_7OutcomeB1q_NtNtCsgI90OQiJWEs_11rustc_infer6traits20FulfillmentErrorCodeEEB1w_ (mod.rs:459)
>         #8: 0xA27BA42: _RNvMs_NtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfillNtB4_18FulfillmentContext6select.llvm.5302840341462405287 (fulfill.rs:140)
>       }
>     }

I bisected the problem to this commit:

commit 75e3ef0f3b834f75f49333d35b5a040060247b03
Author: Mark Wielaard <mark@klomp.org>
Date:   Thu Sep 16 22:01:47 2021 +0200

    readdwarf3: Skip units without addresses when looking for inlined functions
    
    When a unit doesn't cover any addresses skip it because no actual code
    will be inside. Also use skip_DIE instead of read_DIE when not parsing
    (skipping) children.

Reverting this commit fixed the problem, though the reversion was a bit messy because some things have changed since.

Mark, I don't understand much about dwarf3. Any idea what might be wrong? I was profiling the Rust compiler which is very large and complicated. If necessary, I could try to create a smaller Rust program that reproduces the error.
Comment 1 Mark Wielaard 2021-11-18 12:28:11 UTC
(In reply to Nick Nethercote from comment #0)
> Here is some DHAT output I am getting with Rust code:
> 
> > └── PP 1.14/14 {
> >       Total:     118,312 bytes (0.05%, 60.74/Minstr) in 11,864 blocks (1.12%, 6.09/Minstr), avg size 9.97 bytes, avg lifetime 60,918.19 instrs (0% of program duration)
> >       Max:       40 bytes in 5 blocks, avg size 8 bytes
> >       At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
> >       At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
> >       Reads:     178,976 bytes (0.03%, 91.89/Minstr), 1.51/byte
> >       Writes:    130,464 bytes (0.05%, 66.98/Minstr), 1.1/byte
> >       Allocated at {
> >         #1: 0xA2D0172: _RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694 (alloc.rs:87)
> >         #2: 0xA2C42B2: _RINvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB6_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE19process_obligationsNtB1s_16FulfillProcessorINtB6_7OutcomeB1q_NtNtCsgI90OQiJWEs_11rustc_infer6traits20FulfillmentErrorCodeEEB1w_ (mod.rs:459)
> >         #3: 0xA27EA42: _RNvMs_NtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfillNtB4_18FulfillmentContext6select.llvm.5302840341462405287 (fulfill.rs:140)
> >       }
> >     }
> 
> Here's what it should look like:
> 
> >    └── PP 1.14/14 {
> >       Total:     118,312 bytes (0.05%, 60.81/Minstr) in 11,864 blocks (1.12%, 6.1/Minstr), avg size 9.97 bytes, avg lifetime 60,857.47 instrs (0% of program duration)
> >       Max:       40 bytes in 5 blocks, avg size 8 bytes
> >       At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
> >       At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
> >       Reads:     178,976 bytes (0.03%, 91.99/Minstr), 1.51/byte
> >       Writes:    130,464 bytes (0.05%, 67.05/Minstr), 1.1/byte
> >       Allocated at {
> >         #1: 0xA2CD172: alloc (alloc.rs:87)
> >         #2: 0xA2CD172: alloc_impl (alloc.rs:169)
> >         #3: 0xA2CD172: allocate (alloc.rs:229)
> >         #4: 0xA2CD172: exchange_malloc (alloc.rs:318)
> >         #5: 0xA2CD172: new<rustc_trait_selection::traits::fulfill::PendingPredicateObligation> (mod.rs:192)
> >         #6: 0xA2CD172: _RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694 (mod.rs:376)
> >         #7: 0xA2C12B2: _RINvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB6_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE19process_obligationsNtB1s_16FulfillProcessorINtB6_7OutcomeB1q_NtNtCsgI90OQiJWEs_11rustc_infer6traits20FulfillmentErrorCodeEEB1w_ (mod.rs:459)
> >         #8: 0xA27BA42: _RNvMs_NtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfillNtB4_18FulfillmentContext6select.llvm.5302840341462405287 (fulfill.rs:140)
> >       }
> >     }

OK, so we (should have) have 6 frames all (inlined) all at the same address, inside a function (we cannot demangle?)

Inline information comes from the DIE tree, so it makes sense that it is related to this commit:

> I bisected the problem to this commit:
> 
> commit 75e3ef0f3b834f75f49333d35b5a040060247b03
> Author: Mark Wielaard <mark@klomp.org>
> Date:   Thu Sep 16 22:01:47 2021 +0200
> 
>     readdwarf3: Skip units without addresses when looking for inlined
> functions
>     
>     When a unit doesn't cover any addresses skip it because no actual code
>     will be inside. Also use skip_DIE instead of read_DIE when not parsing
>     (skipping) children.
> 
> Reverting this commit fixed the problem, though the reversion was a bit
> messy because some things have changed since.
> 
> Mark, I don't understand much about dwarf3. Any idea what might be wrong? I
> was profiling the Rust compiler which is very large and complicated. If
> necessary, I could try to create a smaller Rust program that reproduces the
> error.

Parsing a DIE tree is somewhat expensive, so what this commit does is skip parsing a CU DIE tree if it doesn't cover an addresses. That is if it doesn't have a DW_AT_low_pc and/or DW_AT_ranges attribute. Such a Compile Unit doesn't (or shouldn't) contain any entries that have addresses associated (it only contains type information for example) and so doesn't need to be read, but can just be skipped.

So either the compiler did not associate any addresses to the unit that contains the function (and the inlined subroutine information) or we are not checking the unit attributes correctly.

If you have a smaller example that would be great. Otherwise could you search for the DIE for "_RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694" (*) and post the unit DIE (the first one with DW_TAG_compile_unit, partial_unit, skeleton_unit, etc.) and if it isn't too large the whole DIE tree for that unit?

You can use readelf (or eu-readelf) --debug-dump=info for that (but the output is probably really large)

(*) BTW. What is this? It looks like a mangled Rust v0 symbol, but it isn't because it (I also checked with c++filt) cannot be demangled.
Comment 2 Nick Nethercote 2021-11-18 23:28:18 UTC
> (*) BTW. What is this? It looks like a mangled Rust v0 symbol, but it isn't because it (I also checked with c++filt) cannot be demangled.

The Rust compiler currently sometimes generates symbols with a `.llvm.<numbers>` suffix. These violate the v0 spec, which doesn't allow '.' chars, and the libiberty/Valgrind demangler doesn't demangle them. It's a problem that needs to be fixed on the Rust side.
Comment 3 Nick Nethercote 2021-11-18 23:51:07 UTC
Created attachment 143713 [details]
Rust reproducer

I've written a tiny program that reproduces the problem.
- I compiled with Rust 1.56.1 with `rustc -g a.rs`.
- I ran DHAT with `vg-in-place --tool=dhat --dhat-out-file=dhout ./a`.
- I viewed `dhout` with `dh_view.html`.

Trunk Valgrind has a first PP like this:
```
  │       #1: 0x1123BB: alloc::alloc::alloc (alloc.rs:86)
  │       #2: 0x112446: alloc::alloc::Global::alloc_impl (alloc.rs:166)
  │       #3: 0x112B29: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:226)
  │       #4: 0x110991: alloc::raw_vec::finish_grow (raw_vec.rs:510)
  │       #5: 0x110E85: alloc::raw_vec::RawVec<T,A>::grow_amortized (raw_vec.rs:443)
  │       #6: 0x111045: alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle (raw_vec.rs:334)
  │       #7: 0x11101B: alloc::raw_vec::RawVec<T,A>::reserve (raw_vec.rs:338)
  │       #8: 0x1135DF: alloc::vec::Vec<T,A>::reserve (mod.rs:805)
  │       #9: 0x1139AF: a::main (a.rs:5)
  │       #10: 0x11206A: core::ops::function::FnOnce::call_once (function.rs:227)
  │       #11: 0x111F5D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125)
```
With the abovementioned commit reverted, the first PP looks like this:
```
  │       #1: 0x1123BB: alloc::alloc::alloc (alloc.rs:86)
  │       #2: 0x112446: alloc::alloc::Global::alloc_impl (alloc.rs:166)
  │       #3: 0x112B29: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:226)
  │       #4: 0x110991: alloc::raw_vec::finish_grow (raw_vec.rs:510)
  │       #5: 0x110E85: alloc::raw_vec::RawVec<T,A>::grow_amortized (raw_vec.rs:443)
  │       #6: 0x111045: alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle (raw_vec.rs:334)
  │       #7: 0x11101B: alloc::raw_vec::RawVec<T,A>::reserve (raw_vec.rs:338)
  │       #8: 0x1135DF: alloc::vec::Vec<T,A>::reserve (mod.rs:805)
  │       #9: 0x1139AF: f4 (a.rs:5)
  │       #10: 0x1139AF: f3 (a.rs:12)
  │       #11: 0x1139AF: f2 (a.rs:17)
  │       #12: 0x1139AF: f1 (a.rs:22)
  │       #13: 0x1139AF: a::main (a.rs:26)
  │       #14: 0x11206A: core::ops::function::FnOnce::call_once (function.rs:227)
  │       #15: 0x111F5D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125)

```
Comment 4 Nick Nethercote 2021-11-18 23:52:50 UTC
Created attachment 143715 [details]
Compiled file

Here's the binary produced by Rust 1.56.1 for `rustc -g a.rs` on my x86-64 Linux box.
Comment 5 Mark Wielaard 2021-11-21 00:04:21 UTC
Created attachment 143791 [details]
Revert second part of commit 75e3ef0f3b834f75f49333d35b5a040060247b03

The problem isn't the first part of the patch, but the second part "Also use skip_DIE instead of read_DIE when not parsing (skipping) children". Reverting just that shows the inline frames again. I haven't analyzed why yet.
Comment 6 Mark Wielaard 2021-11-21 12:57:07 UTC
I think I found it. rustc puts concrete function instances in namespaces (which is allowed in DWARF since there is no strict separation between type declarations and program scope entries in a DIE tree), the inline parser didn't expect this and so skipped any DIE under a namespace entry. This wasn't an issue before because "skipping" a DIE tree was done by reading it, so it wasn't actually skipped. But now that we really skip the DIE (sub)tree (which is faster than actually parsing it) some entries were missed in the rustc case. The following patch fixes it:

diff --git a/coregrind/m_debuginfo/readdwarf3.c b/coregrind/m_debuginfo/readdwarf3.c
index 18eecea9f..5489f8d13 100644
--- a/coregrind/m_debuginfo/readdwarf3.c
+++ b/coregrind/m_debuginfo/readdwarf3.c
@@ -3358,7 +3358,7 @@ static Bool parse_inl_DIE (
    // might maybe contain a DW_TAG_inlined_subroutine:
    Bool ret = (unit_has_addrs
                || dtag == DW_TAG_lexical_block || dtag == DW_TAG_subprogram
-               || dtag == DW_TAG_inlined_subroutine);
+               || dtag == DW_TAG_inlined_subroutine || dtag == DW_TAG_namespace);
    return ret;
 
   bad_DIE:

Does that also work for your larger example?
Comment 7 Nick Nethercote 2021-11-21 23:37:00 UTC
The one-line fix works on the big example! Thank you.
Comment 8 Nick Nethercote 2021-11-21 23:42:41 UTC
> The Rust compiler currently sometimes generates symbols with a
> `.llvm.<numbers>` suffix. These violate the v0 spec, which doesn't allow '.'
> chars, and the libiberty/Valgrind demangler doesn't demangle them. It's a
> problem that needs to be fixed on the Rust side.

Going back to this (and I know it's a tangent), there's now some disagreement about whether this needs to be fixed on the Rust side, or the libiberty/Valgrind side: https://github.com/rust-lang/rust/issues/60705#issuecomment-974011409  :(
Comment 9 Mark Wielaard 2021-11-22 12:00:50 UTC
(In reply to Nick Nethercote from comment #8)
> > The Rust compiler currently sometimes generates symbols with a
> > `.llvm.<numbers>` suffix. These violate the v0 spec, which doesn't allow '.'
> > chars, and the libiberty/Valgrind demangler doesn't demangle them. It's a
> > problem that needs to be fixed on the Rust side.
> 
> Going back to this (and I know it's a tangent), there's now some
> disagreement about whether this needs to be fixed on the Rust side, or the
> libiberty/Valgrind side:
> https://github.com/rust-lang/rust/issues/60705#issuecomment-974011409  :(

Interesting. I see gcc "fixed" this (for c++ symbols) by translating those suffixes to "[clone .constprop.2]", "[clone .isra.3]" or "[clone ._omp_fn.2]".
Which is nice if you want to know why there is a new (local) symbol for a code range. But I think it would actually be fine to simply cut off anything after a '.' or '$' and just demangle the part before it. For the user the important thing is knowing which (source) function an address is associated with, not which compiler transformation has been applied to it.
Comment 10 Mark Wielaard 2021-11-22 12:16:57 UTC
commit 542447d4708d4418a08e678dcf467af92b90b7ad
Author: Mark Wielaard <mark@klomp.org>
Date:   Mon Nov 22 13:07:59 2021 +0100

    readdwarf3.c (parse_inl_DIE) inlined_subroutine can appear in namespaces
    
    This was broken by commit 75e3ef0f3 "readdwarf3: Skip units without
    addresses when looking for inlined functions". Specifically by this
    part: "Also use skip_DIE instead of read_DIE when not parsing
    (skipping) children"
    
    rustc puts concrete function instances in namespaces (which is
    allowed in DWARF since there is no strict separation between type
    declarations and program scope entries in a DIE tree), the inline
    parser didn't expect this and so skipped any DIE under a namespace
    entry. This wasn't an issue before because "skipping" a DIE tree was
    done by reading it, so it wasn't actually skipped. But now that we
    really skip the DIE (sub)tree (which is faster than actually parsing
    it) some entries were missed in the rustc case.
    
    https://bugs.kde.org/show_bug.cgi?id=445668
Comment 11 Mark Wielaard 2021-11-22 12:24:06 UTC
(In reply to Nick Nethercote from comment #8)
> > The Rust compiler currently sometimes generates symbols with a
> > `.llvm.<numbers>` suffix. These violate the v0 spec, which doesn't allow '.'
> > chars, and the libiberty/Valgrind demangler doesn't demangle them. It's a
> > problem that needs to be fixed on the Rust side.
> 
> Going back to this (and I know it's a tangent), there's now some
> disagreement about whether this needs to be fixed on the Rust side, or the
> libiberty/Valgrind side:
> https://github.com/rust-lang/rust/issues/60705#issuecomment-974011409  :(

I filed a separate bug to track this: https://bugs.kde.org/show_bug.cgi?id=445916