Bug 464671

Summary: Default glibc suppression shouldn't use @GLIBC_VERSION@ anymore with glibc >= 2.34
Product: [Developer tools] valgrind Reporter: Romain Geissler <romain.geissler>
Component: memcheckAssignee: Julian Seward <jseward>
Status: REPORTED ---    
Severity: normal CC: mark, pjfloyd, sam
Priority: NOR    
Version: 3.21 GIT   
Target Milestone: ---   
Platform: Other   
OS: Other   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Pach (doesn't work for glibc < 2.34)

Description Romain Geissler 2023-01-23 01:26:05 UTC
Hi,

I have investigated a valgrind false positive I had after upgrading my toolchain. I have a gnu toolchain built from scratch, using glibc 2.37 (current master branch) and a gcc 13, configured with --with-arch-64=x86-64-v2. It looks like the resulting glibc also uses x86-64-v2 by default, and apparently uses some vectorized instruction by default, especially in strcmp.

When running memcheck on a dynamically linked program, valgrind is actually reporting two "cond" errors during the glibc startup, in stack traces like this:

==16638== Conditional jump or move depends on uninitialised value(s)
==16638==    at 0x40217E1: strcmp (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==    by 0x400AD64: _dl_name_match_p (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==    by 0x4008353: _dl_map_object (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==    by 0x4019F94: map_doit (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==    by 0x4001488: _dl_catch_exception (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==    by 0x40015AE: _dl_catch_error (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==    by 0x401A47F: do_preload (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==    by 0x401B206: handle_preload_list (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==    by 0x401E022: dl_main (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==    by 0x401989E: _dl_sysdep_start (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==    by 0x401AFAB: _dl_start (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==    by 0x4019E77: ??? (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)
==16638==  Uninitialised value was created by a stack allocation
==16638==    at 0x401B1AA: handle_preload_list (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.4/lib/ld-linux-x86-64.so.2)

So while trying to LD_PRELOAD the valgrind strcmp symbol replacements, we call the glibc strcmp, which causes some expected errors (most likely generated by the vectorization code).

Checking the default suppression file, I can see things like:

{
   dl-hack3-cond-1
   Memcheck:Cond
   obj:*/lib*/ld-2.36*.so*
   obj:*/lib*/ld-2.36*.so*
   obj:*/lib*/ld-2.36*.so*
}

However since glibc 2.34, these files ld-2.36*.so* no long exist. According to glibc NEWS file:

* Previously, glibc installed its various shared objects under versioned
  file names such as libc-2.33.so.  The ABI sonames (e.g., libc.so.6)
  were provided as symbolic links.  Starting with glibc 2.34, the shared
  objects are installed under their ABI sonames directly, without
  symbolic links.  This increases compatibility with distribution
  package managers that delete removed files late during the package
  upgrade or downgrade process.

Indeed comparing a glibc 2.23 vs 2.37, here is what we see for ld.so:

rgeissler@ncerndobedev6097:~> ls -la /remote/tools/Linux/2.6/1A/toolchain/x86_64-2.6.32-v4/lib*|grep ld-
-rwxr-xr-x  1 otfdelde otfdelde   185520 Dec 19 14:36 ld-2.23.so
lrwxrwxrwx  1 otfdelde otfdelde       10 Dec 19 14:36 ld-linux-x86-64.so.2 -> ld-2.23.so
rgeissler@ncerndobedev6097:~> ls -la /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23/lib*|grep ld-
-rwxr-xr-x  1 otfdelde otfdelde   227256 Jan 16 22:15 ld-linux-x86-64.so.2

So the file glibc-2.X.supp.in in valgrind source code https://sourceware.org/git/?p=valgrind.git;a=blob;f=glibc-2.X.supp.in;h=eeefa393519476120f6f0ce1320163b216339354;hb=HEAD seems to be wrong for recent glibc. Would it be possible to edit it to take into account the new glibc file naming convention ? I tried changing all */lib*/ld-2.36*.so* patterns by */lib*/ld-linux*.so* locally, and it worked (ie valgrind didn't report any false positive anymore).

Cheers,
Romain
Comment 1 Romain Geissler 2023-01-23 01:38:07 UTC
Seems related to https://bugs.kde.org/show_bug.cgi?id=439590 which resulted in this commit: https://sourceware.org/git/?p=valgrind.git;a=commitdiff;h=a1364805fc74b5690f763033c0c9b43f27613572#patch3

In particular in glibc-2.X-drd.supp.in there was this diff:

@@ -6,7 +8,7 @@
 {
    drd-ld
    drd:ConflictingAccess
-   obj:*/lib*/ld-*.so
+   obj:*/lib*/ld*.so*
 }

Applying similar changes to glibc-2.X.supp.in would fix the problem.
Comment 2 Paul Floyd 2023-01-23 06:37:01 UTC
I need to upgrade to Fedora 37 so that I can test this. I'll do that soon.
Comment 3 Romain Geissler 2023-01-30 21:05:33 UTC
Hi,

On my side I will test this, which isn't good in general, but shall be ok for people like me using glibc >= 2.34. I have replaced libdl by libc as in glibc 2.34 libdl is now an empty library, all its symbols have been moved into libc.

Cheers,
Romain
Comment 4 Romain Geissler 2023-01-30 21:06:45 UTC
Created attachment 155802 [details]
Pach (doesn't work for glibc < 2.34)
Comment 5 Paul Floyd 2023-01-31 07:17:44 UTC
I'd rather leave this to Mark as I only have 1 Fedora machine to test on (with glibc 2.36)
Comment 6 Sam James 2023-03-25 08:26:10 UTC
I'd offer but I don't have access to older glibcs easily either. I'd love for this to get into the next release though. I can probably try with Docker if required.
Comment 7 Mark Wielaard 2023-04-15 00:13:50 UTC
Sorry I didn't look at this earlier.

The original commit that adapted things to the glibc 2.34 naming change did say:

    The same could be done for the glibc-2.X.supp.in file, but hasn't
    yet because it looks like most suppressions in that file are obsolete.

And some of those suppressions are horribly broad.

Is this just for strcmp in ld.so?
In that case the intercept in shared/vg_replace_strmem.c should in theory fix that.
But it looks like it is too early. Which means we might want to create a hardwire for it in coregrind/m_redir.c
Comment 8 Romain Geissler 2023-04-23 11:26:02 UTC
(In reply to Mark Wielaard from comment #7)

> Is this just for strcmp in ld.so?
> In that case the intercept in shared/vg_replace_strmem.c should in theory
> fix that.

Hi,

Yes in my case it happens only for strcmp in ld.so, I tried again just now with an unpatched toolchain on our side, using a unittest binary that doesn't depend on many libraries, and here is the error showed at the end of the report:

==87482== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
==87482==
==87482== 1 errors in context 1 of 2:
==87482== Conditional jump or move depends on uninitialised value(s)
==87482==    at 0x4021881: strcmp (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x400ADAB: _dl_name_match_p (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x4008383: _dl_map_object (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401A034: map_doit (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x4001488: _dl_catch_exception (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x40015AE: _dl_catch_error (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401A51F: do_preload (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401B2A6: handle_preload_list (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401E0D2: dl_main (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401993E: _dl_sysdep_start (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401B04B: _dl_start (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x4019F17: ??? (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==  Uninitialised value was created by a stack allocation
==87482==    at 0x401B24A: handle_preload_list (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==
{
   <insert_a_suppression_name_here>
   Memcheck:Cond
   fun:strcmp
   fun:_dl_name_match_p
   fun:_dl_map_object
   fun:map_doit
   fun:_dl_catch_exception
   fun:_dl_catch_error
   fun:do_preload
   fun:handle_preload_list
   fun:dl_main
   fun:_dl_sysdep_start
   fun:_dl_start
   obj:/remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2
}
==87482==
==87482== 1 errors in context 2 of 2:
==87482== Conditional jump or move depends on uninitialised value(s)
==87482==    at 0x4021881: strcmp (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x400AD84: _dl_name_match_p (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x4008383: _dl_map_object (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401A034: map_doit (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x4001488: _dl_catch_exception (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x40015AE: _dl_catch_error (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401A51F: do_preload (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401B2A6: handle_preload_list (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401E0D2: dl_main (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401993E: _dl_sysdep_start (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x401B04B: _dl_start (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==    by 0x4019F17: ??? (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==  Uninitialised value was created by a stack allocation
==87482==    at 0x401B24A: handle_preload_list (in /remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2)
==87482==
{
   <insert_a_suppression_name_here>
   Memcheck:Cond
   fun:strcmp
   fun:_dl_name_match_p
   fun:_dl_map_object
   fun:map_doit
   fun:_dl_catch_exception
   fun:_dl_catch_error
   fun:do_preload
   fun:handle_preload_list
   fun:dl_main
   fun:_dl_sysdep_start
   fun:_dl_start
   obj:/remote/tools/Linux/2.6/1A/toolchain/x86_64-v23.0.8/lib/ld-linux-x86-64.so.2
}
==87482== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)



> But it looks like it is too early. Which means we might want to create a
> hardwire for it in coregrind/m_redir.c

Yes this happens very early in the dynamic library loading process, so even the .so shared library provided by valgrind aren't loaded just yet (actually from what I recall from my initial investigation months ago is that the compared string came from the LD_PRELOAD variable set by valgrind to preload the valgrind .so replacement containing strcmp and the likes.

We might want to start from scratch and use a new suppression file containing just this specific case. For now in our toolchain we keep using my patch until an official solution is pushed in the valgrind project.

Cheers,
Romain
Comment 9 Paul Floyd 2024-08-08 06:03:37 UTC
Time to revisit this.