Bug 415293 - Incorrect call-graph tracking due to new _dl_runtime_resolve_xsave* functions
Summary: Incorrect call-graph tracking due to new _dl_runtime_resolve_xsave* functions
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: callgrind (show other bugs)
Version: 3.15 SVN
Platform: RedHat Enterprise Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Josef Weidendorfer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-12-17 16:31 UTC by pcpa
Modified: 2021-04-07 22:30 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
valgrind-3.14.0-rhbz1784541.patch (6.75 KB, text/plain)
2020-03-04 12:17 UTC, pcpa
Details
sample_callgrind.tgz (948 bytes, application/octet-stream)
2020-05-25 14:37 UTC, pcpa
Details
Patch to search/check all possible dl_runtime_resolve variants (8.19 KB, text/plain)
2020-07-27 16:22 UTC, Mark Wielaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description pcpa 2019-12-17 16:31:25 UTC
Newer glibc have alternate ld.so _ld_runtime_resolve functions.
Namely _dl_runtime_resolve_xsave and _dl_runtime_resolve_xsave'2
Those exist to save the full AVX registers, for applications that somewhat
use an 'extended' x86_64 ABI.

The problem is that applications need to run with LD_BIND_NOW set for
kcachegrind to have some more meaningful usage.
I believe callgrind/fn.c should be extended to handle these new ld.so
symbols.

(BTW, it appears there is a minor typo in the comments of callgrind/fn.c,
that tell about _ld_runtime_resolve, but check for _dl_runtime_resolve).
Comment 1 pcpa 2019-12-18 12:14:40 UTC
As an extra note, there are several different _dl_runtime_resolve{EXT}
variants. I believe only one is used in an application, but there are
multiple names.
A possible fix is to use the logic in callgrind/fn.c so that when a
function matching the initial _dl_runtime_resolve is called, use the
existing code.
A quick grep shows "EXT" as _xsave, _xsavec, _fxsave, _old, _new,
_0, _1, _avx, _avx512, _sse, _sse_vex, _opt... On different versions
of recent (2 or less years) glibc trees.
Comment 2 pcpa 2020-03-04 12:17:19 UTC
Created attachment 126591 [details]
valgrind-3.14.0-rhbz1784541.patch

This is a suggested patch or possible way to handle the issue.

The user that had the issue now has it working on rhel7 using the
attached patch and the command line option

--dl-runtime-resolve-pattern=PATTERN

where PATTERN is either:

amd64-def
amd64-xsavec
amd64-xsave
amd64-fxsave

and the most common should be amd64-xsave, so, testing it before
amd64-xsavec would make it work most times without the option.

This is specific to rhel7 or centos7 and x86_64.

The patch checks for /usr/lib/ld and /usr/lib64/ld patterns, otherwise
it would not even check the patterns, then, checks patterns in the
order above, but because it needs to use only one, the option was
required.
Comment 3 Mark Wielaard 2020-04-03 12:34:59 UTC
What would be a good way to test the different variants?
Do you have a simple example that shows the original issue?
Can we add something to the testsuite?
Comment 4 pcpa 2020-05-25 14:37:25 UTC
Created attachment 128770 [details]
sample_callgrind.tgz

This a reproducer from a Red Hat user.
The repro.sh script works with or without devtoolset installed.

On a example run:

$ ls *_out.out
callgrind_cacheprofiling_7020_out.out  callgrind_cacheprofiling_7021_out.out
$ grep dl_runtime_resolve *.out
callgrind_cacheprofiling_7020_out.out:cfn=(422) _dl_runtime_resolve_xsave
callgrind_cacheprofiling_7020_out.out:fn=(423) _dl_runtime_resolve_xsave'2

only the sample output without LD_BIND_NOW set shows _dl_runtime_resolve*.
Comment 5 pcpa 2020-05-25 14:41:04 UTC
It should be tricky to figure what version is being used.
Could look into the 'the elf_machine_runtime_setup' function in
sysdeps/${ARCH}/dl-machine.h glibc source.

Sample chunk from rhel7 x86_64 source:

...
	  if (HAS_ARCH_FEATURE (AVX512F_Usable))
	    *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx512;
	  else if (HAS_ARCH_FEATURE (AVX_Usable))
	    *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx;
	  else
	    *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_sse;
...

because of check for cpu features, a reproducer environment to validate all
entries, or experiment with automatic detection might need custom VMs.
Comment 6 Mark Wielaard 2020-07-27 16:22:00 UTC
Created attachment 130439 [details]
Patch to search/check all possible dl_runtime_resolve variants

I tweaked your patch a little so we detect all variants, all variants are then, when detected, squashed to the same name ("_dl_runtime_resolve"). This way we don't need the user to supply a variant name on the command line.

Does this work for you?
Comment 7 pcpa 2020-07-29 14:09:26 UTC
Appears to work for me.
I will confirm with the person that pointed out the problem if it works
on their environment, but unlikely to not work.
Comment 8 Mark Wielaard 2020-08-20 10:36:14 UTC
(In reply to Mark Wielaard from comment #6)
> Created attachment 130439 [details]
> Patch to search/check all possible dl_runtime_resolve variants

So this approach seems to work as intended and I would like to push it to git. Any comments/reviews before I do that?
Comment 9 Mark Wielaard 2020-08-25 14:48:10 UTC
commit 86277041a296d3ca34d938b347c246ff4485c3a0
Author: Mark Wielaard <mark@klomp.org>
Date:   Mon Jul 27 22:43:28 2020 +0200

    Incorrect call-graph tracking due to new _dl_runtime_resolve_xsave*
    
    Newer glibc have alternate ld.so _ld_runtime_resolve functions.
    Namely _dl_runtime_resolve_xsave and _dl_runtime_resolve_xsavec
    
    This patch recognizes the xsave, xsvec and fxsave variants and
    changes callgrind so that any variant counts as _dl_runtime_resolve.
    
    Original patch by paulo.cesar.pereira.de.andrade@gmail.com
    https://bugs.kde.org/show_bug.cgi?id=415293
Comment 10 Josef Weidendorfer 2021-04-07 22:30:53 UTC
Original Callgrind author here...
Sorry for not checking my old email address for open-source projects obviously
for almost 1 1/2 years... just see this now, as I was checking out callgrind on
recent Ubuntu and found it not working any more (partly due to this issue).

First: thanks for this very elaborated patch!

Two remarks:
- there is no "_dl_runtime_resolve_xsave'2" - the ending "'2" is
  generated by callgrind notifying "recursion level 2" - ie it's the same
  function, just found for the 2nd time on the call stack. As by default
  the recursion level is max'ed at 2, it probably includes all higher levels

- the complex "check_code" function really only should be required when
  the symbols are stripped from the runtime linker, as last resort to
  still be able to do the special handling...

Trying to make this work on more architectures and recent glibc's, my suggestion
is to add this patch for callgrind/fn.c:

-      if (VG_(strcmp)(fn->name, "_dl_runtime_resolve")==0) {
+      if (VG_(strncmp)(fn->name, "_dl_runtime_resolve", 19)==0) {

While the heuristic that every function symbol starting
with the prefix "_dl_runtime_resolve" as being an entry point
into the runtime linker for resolving a function address may
be a bit rough, I do not think this prefix is used often in
other source code for anything else.

The worst case is a slightly misleading call graph only
visible in a very specific situation: if the wrongly-detected
function does a tail call (ie instead of returning, jumping
to another function), it will be shown as 2 calls in a row
from the original caller.

The whole issue to solve here is that the call chain
A -(call)-> B -(tail call)-> C (with B being "_dl_runtime_resolve*")
should be interpreted as A calling B, returning to A, calling C.
Why? Because "_dl_runtime_resolve*" is some kind of dispatcher function,
with a large number of calls routed through it. And this scenario is
bad for profile analysis when only having a single caller as context.
The replacement shows (1) better information, (2) reduces the probability
of a recursive function cycle - which obscures profiling results, and
(3) is less confusing to users not aware of the runtime linker.

Note: the patch applied as fix can stay, as it allows for a stripped
runtime linker.