444545 – memchr behaves differently under valgrind --tool=massif

Bug 444545 - memchr behaves differently under valgrind --tool=massif

Summary: memchr behaves differently under valgrind --tool=massif

Status:	RESOLVED NOT A BUG

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	massif (show other bugs)
Version:	unspecified
Platform:	Fedora RPMs Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Nicholas Nethercote

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-10-28 14:23 UTC by Peter Hull
Modified:	2023-05-12 12:32 UTC (History)
CC List:	3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Test program showing issue (358 bytes, text/plain) 2021-10-28 14:23 UTC, Peter Hull	Details
Updated/simplified test case (369 bytes, text/plain) 2021-11-01 14:34 UTC, Peter Hull	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Peter Hull 2021-10-28 14:23:46 UTC

Created attachment 142965 [details]
Test program showing issue

SUMMARY
Using the massif tool my program was malfunctioning. I reduced this to a simple case where it appears that the behaviour of memchr is different when running with massif to when running with memcheck, or running without valgrid altogether
Using Valgrind 3.17

STEPS TO REPRODUCE
1. Compile program (attached) with 'gcc you.c'
2. Run 'valgrind --tool=massif ./a.out'
3. Run './a.out'
4. Observe output differs

OBSERVED RESULT
$ valgrind --tool=massif ./a.out
==9427== Massif, a heap profiler
==9427== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==9427== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==9427== Command: ./a.out
==9427== 
Found /DEF/GHI/JKL at 3
Found /GHI/JKL at 7
Found /JKL at 11
Found JKL at 12
Found KL at 13
Found L at 14
==9427== 

EXPECTED RESULT
$ ./a.out 
Found /DEF/GHI/JKL at 3
Found /GHI/JKL at 7
Found /JKL at 11

ALSO result with memcheck is correct
$ valgrind ./a.out
==9420== Memcheck, a memory error detector
==9420== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9420== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==9420== Command: ./a.out
==9420== 
Found /DEF/GHI/JKL at 3
Found /GHI/JKL at 7
Found /JKL at 11
==9420== 
==9420== HEAP SUMMARY:
==9420==     in use at exit: 0 bytes in 0 blocks
==9420==   total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
==9420== 
==9420== All heap blocks were freed -- no leaks are possible
==9420== 
==9420== For lists of detected and suppressed errors, rerun with: -s
==9420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

SOFTWARE/OS VERSIONS
Fedora Linux 35 5.14.10-300.fc35.x86_64 

ADDITIONAL INFORMATION
Program:
#include <stdio.h>
#include <string.h>

int main() {
  const char* txt = "ABC/DEF/GHI/JKL";
  int n = strlen(txt);
  const char* ptr = txt;
  while (ptr) {
    const char* f = memchr(ptr, '/', txt + n - ptr);
    if (f) {
      printf("Found %s at %d\n", f, f - txt);
      ptr = f + 1;
    } else {
      ptr = NULL;
    }
  }
  return 0;
}

Comment 1 Paul Floyd 2021-10-29 08:31:55 UTC

I can't reproduce this (GCC 5.3 and 9.2 on RHEL 7.6, Valgrind built from git source).

Comment 2 Peter Hull 2021-10-29 11:50:13 UTC

(I should add I was running in a virtualbox vm if that is significant?)

It is also OK under Debian, which has gcc version 10.2.1 20210110 (Debian 10.2.1-6) and Valgrind-3.16.1 

On Fedora, where it shows the problem, gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) and valgrind-3.17.0

Please advise if I can help further.

Comment 3 Peter Hull 2021-11-01 09:37:14 UTC

I still get the problem on Fedora 35, gcc 11.2.1, valgrind from current (as of today) git (Valgrind-3.19.0.GIT)
Suggests that gcc 11 is miscompiling something?

Comment 4 Julian Seward 2021-11-01 12:28:43 UTC

I can't reproduce this, running on Fedora 35, x86_64, using V from git
and the system gcc, 11.2.1.  This is on an Intel i7-3615QM, in short an
older machine that supports up to avx but not avx2.

Can you post the results of /usr/bin/lscpu on your F35 install?

Comment 5 Peter Hull 2021-11-01 13:38:35 UTC

Sorry this is proving more troublesome than I expected.

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  1
  On-line CPU(s) list:   0
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
    CPU family:          6
    Model:               158
    Thread(s) per core:  1
    Core(s) per socket:  1
    Socket(s):           1
    Stepping:            10
    BogoMIPS:            4416.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
                          mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology
                          nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 cx16 pcid sse4_1 
                         sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowpre
                         fetch invpcid_single pti fsgsbase avx2 invpcid rdseed clflushopt md_clear flush
                         _l1d
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   32 KiB (1 instance)
  L1i:                   32 KiB (1 instance)
  L2:                    256 KiB (1 instance)
  L3:                    9 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: VMX unsupported
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Mitigation; Clear CPU buffers; SMT Host state unknown
  Meltdown:              Mitigation; PTI
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Full generic retpoline, STIBP disabled, RSB filling
  Srbds:                 Unknown: Dependent on hypervisor status
  Tsx async abort:       Not affected

Comment 6 Julian Seward 2021-11-01 13:54:30 UTC

>  Model name:            Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz

Hmm, nothing particularly exotic there.

> Sorry this is proving more troublesome than I expected.

It's OK; however; we can't fix it if we can't reproduce it.  Can you
maybe re-check your testing setup?  TBH I seriously doubt that
Massif or the V framework as a whole mishandles memchr, since
I've run huge apps (eg, Firefox) on Massif and I see no such breakage.
Plus there are multiple auto-test runs every night, on various F35
targets, and I'm not aware of such breakage in them.

That said it maybe that you've fallen across some obscure corner 
case that is broken.  But I can't imagine what that could be.

Comment 7 Peter Hull 2021-11-01 13:57:40 UTC

As I'm using VirtualBox I can restart the VM with AVX2 disabled, using the command 
.\vboxmanage setextradata "Fedora" VBoxInternal/CPUM/IsaExts/AVX2 0

Now the problem does not occur.

However, I did report in comment 2 that under Debian the problem didn't occur either, and this was using the same VirtualBox/Host PC, so it isn't simply AVX2 vs not AVX2.

Thanks for your help so far!

The output of lscpu in this case is 
$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  1
  On-line CPU(s) list:   0
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
    CPU family:          6
    Model:               158
    Thread(s) per core:  1
    Core(s) per socket:  1
    Socket(s):           1
    Stepping:            10
    BogoMIPS:            4415.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
                          mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology
                          nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 cx16 pcid sse4_1 
                         sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowpre
                         fetch invpcid_single pti fsgsbase invpcid rdseed clflushopt md_clear flush_l1d
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   32 KiB (1 instance)
  L1i:                   32 KiB (1 instance)
  L2:                    256 KiB (1 instance)
  L3:                    9 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: VMX unsupported
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Mitigation; Clear CPU buffers; SMT Host state unknown
  Meltdown:              Mitigation; PTI
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Full generic retpoline, STIBP disabled, RSB filling
  Srbds:                 Unknown: Dependent on hypervisor status
  Tsx async abort:       Not affected

Comment 8 Peter Hull 2021-11-01 14:34:02 UTC

I did a little more checking. I reduced the test program somewhat (attached)
I tried putting a breakpoint on memchr both standalone and running under valgrind, to see what got called. I saw for 'standalone'

(gdb) run
Starting program: /home/peter/Projects/vgtest/a.out 
...
Breakpoint 1, memchr_ifunc () at ../sysdeps/x86_64/multiarch/memchr.c:29
29      libc_ifunc_redirected (__redirect_memchr, memchr, IFUNC_SELECTOR ());
(gdb) step
__memchr_sse2 () at ../sysdeps/x86_64/multiarch/../memchr.S:34

and under Valgrind

$ valgrind --tool=massif --vgdb-stop-at=startup -v ./a.out
(gdb) target remote | /usr/libexec/valgrind/../../bin/vgdb --pid=1734
...
(gdb) br memchr
Breakpoint 1 at 0x401040 (2 locations)
(gdb) cont
Continuing.

Breakpoint 1, memchr_ifunc () at ../sysdeps/x86_64/multiarch/memchr.c:29
29      libc_ifunc_redirected (__redirect_memchr, memchr, IFUNC_SELECTOR ());
(gdb) step
__memchr_avx2 () at ../sysdeps/x86_64/multiarch/memchr-avx2.S:60
60      ENTRY (MEMCHR)

Note that in the first case we're calling into __memchr_sse2 and for the second it is __memchr_avx2.

Could this be significant?

Comment 9 Peter Hull 2021-11-01 14:34:51 UTC

Created attachment 143101 [details]
Updated/simplified test case

Comment 10 Julian Seward 2021-11-01 15:09:41 UTC

That is all very strange.  Q: for the failing case, does 
adding `--alignment=32` make any difference?  That changes
the alignment produced by `malloc`.

Comment 11 Peter Hull 2021-11-02 16:40:30 UTC

(In reply to Julian Seward from comment #10)
> Q: for the failing case, does 
> adding `--alignment=32` make any difference?  
That didn't make any difference that I could see.

I found that running it like this:

    GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX_Fast_Unaligned_Load valgrind --tool=massif ./a.out

makes it work OK (that is supposed to disable some CPU feature detection, as mentioned in https://stackoverflow.com/a/61048314/231929)

It would be good to rule virtualbox in or out, I wonder if there is anyone with a AVX2-capable CPU running Fedora 35 directly who could test it. Unfortunately I couldn't reproduce it on Debian 11.

I'm running out of ideas now.

Comment 12 Christopher Wecht 2022-12-12 10:00:19 UTC

I think, that I can clarify this issue a bit. From my understanding, this issue is caused by a unfortunate combination of glibc and VirtualBox.

1. VirtualBox has the oddity of providing AVX2 instructions to its clients, but not BMI instructions: https://www.virtualbox.org/ticket/15471
This fact makes VirtualBox a quite weird plattform: at least for Intel CPUs BMI and AVX2 have both been introduced with Haswell. It is actually quite unusual, that a CPU architecture supports AVX2 but _not_ BMI.

2. glibc uses a special implementations of memchr for different instruction sets available. Which one is actually used is decided depending on the available instruction sets: https://github.com/bminor/glibc/blob/glibc-2.36/sysdeps/x86_64/multiarch/ifunc-impl-list.c#L60

3. The AVX2 implementation of memchr is enabled if AVX2 is available: https://github.com/bminor/glibc/blob/glibc-2.36/sysdeps/x86_64/multiarch/ifunc-impl-list.c#L71

4. But it turns out, that the AVX2 implementation of memchr uses not only AVX2 instructions but also BMI instructions, namely TZCNTL: https://github.com/bminor/glibc/blob/master/sysdeps/x86_64/multiarch/memchr-avx2.S#L101

5. My conclusion is now: VirtualBox communicates to its guests, that AVX2 is available but not BMI. This results in glibc using the AVX2 implemntation although it needs the BMI instructions, too. This is probably not an issue if the VBox is executed on a x86 guest, because the needed instructions are actually supported there and with modern CPU fetures for virtualization (https://bugs.kde.org/show_bug.cgi?id=444545) the instructions are more or less directly forwarded to the CPU anyways (AFAIK). It seems now, that massif is confused by running a program which uses BMI instructions on a system which allegidly doesn't support them.

I can backup this claim with the following observations:
- The above provided test programm work with massif on ubuntu 20.04 and 20.04 but not with 21.10, 22.04 and 22.10. The breaking change must have been introduced between 2.33 and 2.34. It turns out, that 2.34 is exactly the version, in which the BMI instructions have been introduced in the "main path" (see https://github.com/bminor/glibc/blob/glibc-2.34/sysdeps/x86_64/multiarch/memchr-avx2.S vs https://github.com/bminor/glibc/blob/glibc-2.33/sysdeps/x86_64/multiarch/memchr-avx2.S).

Interestinly in the master branch, there is now a check for BMI2 in place (https://github.com/bminor/glibc/blob/master/sysdeps/x86_64/multiarch/ifunc-impl-list.c#L76) so supposedly, this issue should not occure with the most recent "nigtly" version of glibc, if my reasoning is correct.

Comment 13 Peter Hull 2022-12-12 10:36:28 UTC

Christopher, that's really interesting and it sounds like you have found the cause. From my point of view I have been checking (very infrequently) to see if some update to Fedora or VirtualBox has magically fixed it... maybe the end is actually near, when the BMI2 check becomes part of the main release.
Thanks,
Peter

Comment 14 Peter Hull 2023-05-11 09:07:39 UTC

It does seem to have been fixed for me now, running under VirtualBox 7.0.8:

glibc.x86_64 2.37-4.fc38
gcc version 13.1.1 20230426 (Red Hat 13.1.1-1)
valgrind-3.21.0
Fedora 38 6.2.13-300.fc38.x86_64

I don't know which of the updates was responsible for the change.

Thanks everyone for your help.

Comment 15 Paul Floyd 2023-05-11 10:05:26 UTC

Did the VBox CPU change in any ways (different lscpu flags)?

Not much has changed in Valgrind. I did make one change that affects hwcaps and the CPUID dirtyhelper, but I don't think that is relevant.

Is BMI support configurable with VBox?

Does this work

VBoxManage setextradata "VM name" VBoxInternal/CPUM/BMI 1
?

Comment 16 Peter Hull 2023-05-12 11:17:12 UTC

(In reply to Paul Floyd from comment #15)
> Did the VBox CPU change in any ways (different lscpu flags)?
Yes, it has changed, avx and avx2 have gone, and bmi1 and bmi2 have appeared, together with some other changes (see below) 
> Is BMI support configurable with VBox?
> Does this work
> VBoxManage setextradata "VM name" VBoxInternal/CPUM/BMI 1

Setting it to '1' or '0' doesn't make any difference to the CPU flags.

From my point of view this bug is fixed, and I guess from the CPU flags info it's because of a change in VirtualBox?

Thanks,
Peter

---

Flags previously (my post https://bugs.kde.org/show_bug.cgi?id=444545#c5): fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d
Flags now (BMI unset or set to 1): fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d

Comment 17 Paul Floyd 2023-05-12 12:32:34 UTC

Yes I agree that this can be closed.