Here is some DHAT output I am getting with Rust code: > └── PP 1.14/14 { > Total: 118,312 bytes (0.05%, 60.74/Minstr) in 11,864 blocks (1.12%, 6.09/Minstr), avg size 9.97 bytes, avg lifetime 60,918.19 instrs (0% of program duration) > Max: 40 bytes in 5 blocks, avg size 8 bytes > At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes > At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes > Reads: 178,976 bytes (0.03%, 91.89/Minstr), 1.51/byte > Writes: 130,464 bytes (0.05%, 66.98/Minstr), 1.1/byte > Allocated at { > #1: 0xA2D0172: _RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694 (alloc.rs:87) > #2: 0xA2C42B2: _RINvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB6_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE19process_obligationsNtB1s_16FulfillProcessorINtB6_7OutcomeB1q_NtNtCsgI90OQiJWEs_11rustc_infer6traits20FulfillmentErrorCodeEEB1w_ (mod.rs:459) > #3: 0xA27EA42: _RNvMs_NtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfillNtB4_18FulfillmentContext6select.llvm.5302840341462405287 (fulfill.rs:140) > } > } Here's what it should look like: > └── PP 1.14/14 { > Total: 118,312 bytes (0.05%, 60.81/Minstr) in 11,864 blocks (1.12%, 6.1/Minstr), avg size 9.97 bytes, avg lifetime 60,857.47 instrs (0% of program duration) > Max: 40 bytes in 5 blocks, avg size 8 bytes > At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes > At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes > Reads: 178,976 bytes (0.03%, 91.99/Minstr), 1.51/byte > Writes: 130,464 bytes (0.05%, 67.05/Minstr), 1.1/byte > Allocated at { > #1: 0xA2CD172: alloc (alloc.rs:87) > #2: 0xA2CD172: alloc_impl (alloc.rs:169) > #3: 0xA2CD172: allocate (alloc.rs:229) > #4: 0xA2CD172: exchange_malloc (alloc.rs:318) > #5: 0xA2CD172: new<rustc_trait_selection::traits::fulfill::PendingPredicateObligation> (mod.rs:192) > #6: 0xA2CD172: _RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694 (mod.rs:376) > #7: 0xA2C12B2: _RINvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB6_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE19process_obligationsNtB1s_16FulfillProcessorINtB6_7OutcomeB1q_NtNtCsgI90OQiJWEs_11rustc_infer6traits20FulfillmentErrorCodeEEB1w_ (mod.rs:459) > #8: 0xA27BA42: _RNvMs_NtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfillNtB4_18FulfillmentContext6select.llvm.5302840341462405287 (fulfill.rs:140) > } > } I bisected the problem to this commit: commit 75e3ef0f3b834f75f49333d35b5a040060247b03 Author: Mark Wielaard <mark@klomp.org> Date: Thu Sep 16 22:01:47 2021 +0200 readdwarf3: Skip units without addresses when looking for inlined functions When a unit doesn't cover any addresses skip it because no actual code will be inside. Also use skip_DIE instead of read_DIE when not parsing (skipping) children. Reverting this commit fixed the problem, though the reversion was a bit messy because some things have changed since. Mark, I don't understand much about dwarf3. Any idea what might be wrong? I was profiling the Rust compiler which is very large and complicated. If necessary, I could try to create a smaller Rust program that reproduces the error.
(In reply to Nick Nethercote from comment #0) > Here is some DHAT output I am getting with Rust code: > > > └── PP 1.14/14 { > > Total: 118,312 bytes (0.05%, 60.74/Minstr) in 11,864 blocks (1.12%, 6.09/Minstr), avg size 9.97 bytes, avg lifetime 60,918.19 instrs (0% of program duration) > > Max: 40 bytes in 5 blocks, avg size 8 bytes > > At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes > > At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes > > Reads: 178,976 bytes (0.03%, 91.89/Minstr), 1.51/byte > > Writes: 130,464 bytes (0.05%, 66.98/Minstr), 1.1/byte > > Allocated at { > > #1: 0xA2D0172: _RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694 (alloc.rs:87) > > #2: 0xA2C42B2: _RINvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB6_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE19process_obligationsNtB1s_16FulfillProcessorINtB6_7OutcomeB1q_NtNtCsgI90OQiJWEs_11rustc_infer6traits20FulfillmentErrorCodeEEB1w_ (mod.rs:459) > > #3: 0xA27EA42: _RNvMs_NtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfillNtB4_18FulfillmentContext6select.llvm.5302840341462405287 (fulfill.rs:140) > > } > > } > > Here's what it should look like: > > > └── PP 1.14/14 { > > Total: 118,312 bytes (0.05%, 60.81/Minstr) in 11,864 blocks (1.12%, 6.1/Minstr), avg size 9.97 bytes, avg lifetime 60,857.47 instrs (0% of program duration) > > Max: 40 bytes in 5 blocks, avg size 8 bytes > > At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes > > At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes > > Reads: 178,976 bytes (0.03%, 91.99/Minstr), 1.51/byte > > Writes: 130,464 bytes (0.05%, 67.05/Minstr), 1.1/byte > > Allocated at { > > #1: 0xA2CD172: alloc (alloc.rs:87) > > #2: 0xA2CD172: alloc_impl (alloc.rs:169) > > #3: 0xA2CD172: allocate (alloc.rs:229) > > #4: 0xA2CD172: exchange_malloc (alloc.rs:318) > > #5: 0xA2CD172: new<rustc_trait_selection::traits::fulfill::PendingPredicateObligation> (mod.rs:192) > > #6: 0xA2CD172: _RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694 (mod.rs:376) > > #7: 0xA2C12B2: _RINvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB6_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE19process_obligationsNtB1s_16FulfillProcessorINtB6_7OutcomeB1q_NtNtCsgI90OQiJWEs_11rustc_infer6traits20FulfillmentErrorCodeEEB1w_ (mod.rs:459) > > #8: 0xA27BA42: _RNvMs_NtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfillNtB4_18FulfillmentContext6select.llvm.5302840341462405287 (fulfill.rs:140) > > } > > } OK, so we (should have) have 6 frames all (inlined) all at the same address, inside a function (we cannot demangle?) Inline information comes from the DIE tree, so it makes sense that it is related to this commit: > I bisected the problem to this commit: > > commit 75e3ef0f3b834f75f49333d35b5a040060247b03 > Author: Mark Wielaard <mark@klomp.org> > Date: Thu Sep 16 22:01:47 2021 +0200 > > readdwarf3: Skip units without addresses when looking for inlined > functions > > When a unit doesn't cover any addresses skip it because no actual code > will be inside. Also use skip_DIE instead of read_DIE when not parsing > (skipping) children. > > Reverting this commit fixed the problem, though the reversion was a bit > messy because some things have changed since. > > Mark, I don't understand much about dwarf3. Any idea what might be wrong? I > was profiling the Rust compiler which is very large and complicated. If > necessary, I could try to create a smaller Rust program that reproduces the > error. Parsing a DIE tree is somewhat expensive, so what this commit does is skip parsing a CU DIE tree if it doesn't cover an addresses. That is if it doesn't have a DW_AT_low_pc and/or DW_AT_ranges attribute. Such a Compile Unit doesn't (or shouldn't) contain any entries that have addresses associated (it only contains type information for example) and so doesn't need to be read, but can just be skipped. So either the compiler did not associate any addresses to the unit that contains the function (and the inlined subroutine information) or we are not checking the unit attributes correctly. If you have a smaller example that would be great. Otherwise could you search for the DIE for "_RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694" (*) and post the unit DIE (the first one with DW_TAG_compile_unit, partial_unit, skeleton_unit, etc.) and if it isn't too large the whole DIE tree for that unit? You can use readelf (or eu-readelf) --debug-dump=info for that (but the output is probably really large) (*) BTW. What is this? It looks like a mangled Rust v0 symbol, but it isn't because it (I also checked with c++filt) cannot be demangled.
> (*) BTW. What is this? It looks like a mangled Rust v0 symbol, but it isn't because it (I also checked with c++filt) cannot be demangled. The Rust compiler currently sometimes generates symbols with a `.llvm.<numbers>` suffix. These violate the v0 spec, which doesn't allow '.' chars, and the libiberty/Valgrind demangler doesn't demangle them. It's a problem that needs to be fixed on the Rust side.
Created attachment 143713 [details] Rust reproducer I've written a tiny program that reproduces the problem. - I compiled with Rust 1.56.1 with `rustc -g a.rs`. - I ran DHAT with `vg-in-place --tool=dhat --dhat-out-file=dhout ./a`. - I viewed `dhout` with `dh_view.html`. Trunk Valgrind has a first PP like this: ``` │ #1: 0x1123BB: alloc::alloc::alloc (alloc.rs:86) │ #2: 0x112446: alloc::alloc::Global::alloc_impl (alloc.rs:166) │ #3: 0x112B29: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:226) │ #4: 0x110991: alloc::raw_vec::finish_grow (raw_vec.rs:510) │ #5: 0x110E85: alloc::raw_vec::RawVec<T,A>::grow_amortized (raw_vec.rs:443) │ #6: 0x111045: alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle (raw_vec.rs:334) │ #7: 0x11101B: alloc::raw_vec::RawVec<T,A>::reserve (raw_vec.rs:338) │ #8: 0x1135DF: alloc::vec::Vec<T,A>::reserve (mod.rs:805) │ #9: 0x1139AF: a::main (a.rs:5) │ #10: 0x11206A: core::ops::function::FnOnce::call_once (function.rs:227) │ #11: 0x111F5D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125) ``` With the abovementioned commit reverted, the first PP looks like this: ``` │ #1: 0x1123BB: alloc::alloc::alloc (alloc.rs:86) │ #2: 0x112446: alloc::alloc::Global::alloc_impl (alloc.rs:166) │ #3: 0x112B29: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:226) │ #4: 0x110991: alloc::raw_vec::finish_grow (raw_vec.rs:510) │ #5: 0x110E85: alloc::raw_vec::RawVec<T,A>::grow_amortized (raw_vec.rs:443) │ #6: 0x111045: alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle (raw_vec.rs:334) │ #7: 0x11101B: alloc::raw_vec::RawVec<T,A>::reserve (raw_vec.rs:338) │ #8: 0x1135DF: alloc::vec::Vec<T,A>::reserve (mod.rs:805) │ #9: 0x1139AF: f4 (a.rs:5) │ #10: 0x1139AF: f3 (a.rs:12) │ #11: 0x1139AF: f2 (a.rs:17) │ #12: 0x1139AF: f1 (a.rs:22) │ #13: 0x1139AF: a::main (a.rs:26) │ #14: 0x11206A: core::ops::function::FnOnce::call_once (function.rs:227) │ #15: 0x111F5D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125) ```
Created attachment 143715 [details] Compiled file Here's the binary produced by Rust 1.56.1 for `rustc -g a.rs` on my x86-64 Linux box.
Created attachment 143791 [details] Revert second part of commit 75e3ef0f3b834f75f49333d35b5a040060247b03 The problem isn't the first part of the patch, but the second part "Also use skip_DIE instead of read_DIE when not parsing (skipping) children". Reverting just that shows the inline frames again. I haven't analyzed why yet.
I think I found it. rustc puts concrete function instances in namespaces (which is allowed in DWARF since there is no strict separation between type declarations and program scope entries in a DIE tree), the inline parser didn't expect this and so skipped any DIE under a namespace entry. This wasn't an issue before because "skipping" a DIE tree was done by reading it, so it wasn't actually skipped. But now that we really skip the DIE (sub)tree (which is faster than actually parsing it) some entries were missed in the rustc case. The following patch fixes it: diff --git a/coregrind/m_debuginfo/readdwarf3.c b/coregrind/m_debuginfo/readdwarf3.c index 18eecea9f..5489f8d13 100644 --- a/coregrind/m_debuginfo/readdwarf3.c +++ b/coregrind/m_debuginfo/readdwarf3.c @@ -3358,7 +3358,7 @@ static Bool parse_inl_DIE ( // might maybe contain a DW_TAG_inlined_subroutine: Bool ret = (unit_has_addrs || dtag == DW_TAG_lexical_block || dtag == DW_TAG_subprogram - || dtag == DW_TAG_inlined_subroutine); + || dtag == DW_TAG_inlined_subroutine || dtag == DW_TAG_namespace); return ret; bad_DIE: Does that also work for your larger example?
The one-line fix works on the big example! Thank you.
> The Rust compiler currently sometimes generates symbols with a > `.llvm.<numbers>` suffix. These violate the v0 spec, which doesn't allow '.' > chars, and the libiberty/Valgrind demangler doesn't demangle them. It's a > problem that needs to be fixed on the Rust side. Going back to this (and I know it's a tangent), there's now some disagreement about whether this needs to be fixed on the Rust side, or the libiberty/Valgrind side: https://github.com/rust-lang/rust/issues/60705#issuecomment-974011409 :(
(In reply to Nick Nethercote from comment #8) > > The Rust compiler currently sometimes generates symbols with a > > `.llvm.<numbers>` suffix. These violate the v0 spec, which doesn't allow '.' > > chars, and the libiberty/Valgrind demangler doesn't demangle them. It's a > > problem that needs to be fixed on the Rust side. > > Going back to this (and I know it's a tangent), there's now some > disagreement about whether this needs to be fixed on the Rust side, or the > libiberty/Valgrind side: > https://github.com/rust-lang/rust/issues/60705#issuecomment-974011409 :( Interesting. I see gcc "fixed" this (for c++ symbols) by translating those suffixes to "[clone .constprop.2]", "[clone .isra.3]" or "[clone ._omp_fn.2]". Which is nice if you want to know why there is a new (local) symbol for a code range. But I think it would actually be fine to simply cut off anything after a '.' or '$' and just demangle the part before it. For the user the important thing is knowing which (source) function an address is associated with, not which compiler transformation has been applied to it.
commit 542447d4708d4418a08e678dcf467af92b90b7ad Author: Mark Wielaard <mark@klomp.org> Date: Mon Nov 22 13:07:59 2021 +0100 readdwarf3.c (parse_inl_DIE) inlined_subroutine can appear in namespaces This was broken by commit 75e3ef0f3 "readdwarf3: Skip units without addresses when looking for inlined functions". Specifically by this part: "Also use skip_DIE instead of read_DIE when not parsing (skipping) children" rustc puts concrete function instances in namespaces (which is allowed in DWARF since there is no strict separation between type declarations and program scope entries in a DIE tree), the inline parser didn't expect this and so skipped any DIE under a namespace entry. This wasn't an issue before because "skipping" a DIE tree was done by reading it, so it wasn't actually skipped. But now that we really skip the DIE (sub)tree (which is faster than actually parsing it) some entries were missed in the rustc case. https://bugs.kde.org/show_bug.cgi?id=445668
(In reply to Nick Nethercote from comment #8) > > The Rust compiler currently sometimes generates symbols with a > > `.llvm.<numbers>` suffix. These violate the v0 spec, which doesn't allow '.' > > chars, and the libiberty/Valgrind demangler doesn't demangle them. It's a > > problem that needs to be fixed on the Rust side. > > Going back to this (and I know it's a tangent), there's now some > disagreement about whether this needs to be fixed on the Rust side, or the > libiberty/Valgrind side: > https://github.com/rust-lang/rust/issues/60705#issuecomment-974011409 :( I filed a separate bug to track this: https://bugs.kde.org/show_bug.cgi?id=445916