Created attachment 142965 [details] Test program showing issue SUMMARY Using the massif tool my program was malfunctioning. I reduced this to a simple case where it appears that the behaviour of memchr is different when running with massif to when running with memcheck, or running without valgrid altogether Using Valgrind 3.17 STEPS TO REPRODUCE 1. Compile program (attached) with 'gcc you.c' 2. Run 'valgrind --tool=massif ./a.out' 3. Run './a.out' 4. Observe output differs OBSERVED RESULT $ valgrind --tool=massif ./a.out ==9427== Massif, a heap profiler ==9427== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote ==9427== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info ==9427== Command: ./a.out ==9427== Found /DEF/GHI/JKL at 3 Found /GHI/JKL at 7 Found /JKL at 11 Found JKL at 12 Found KL at 13 Found L at 14 ==9427== EXPECTED RESULT $ ./a.out Found /DEF/GHI/JKL at 3 Found /GHI/JKL at 7 Found /JKL at 11 ALSO result with memcheck is correct $ valgrind ./a.out ==9420== Memcheck, a memory error detector ==9420== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==9420== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info ==9420== Command: ./a.out ==9420== Found /DEF/GHI/JKL at 3 Found /GHI/JKL at 7 Found /JKL at 11 ==9420== ==9420== HEAP SUMMARY: ==9420== in use at exit: 0 bytes in 0 blocks ==9420== total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated ==9420== ==9420== All heap blocks were freed -- no leaks are possible ==9420== ==9420== For lists of detected and suppressed errors, rerun with: -s ==9420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) SOFTWARE/OS VERSIONS Fedora Linux 35 5.14.10-300.fc35.x86_64 ADDITIONAL INFORMATION Program: #include <stdio.h> #include <string.h> int main() { const char* txt = "ABC/DEF/GHI/JKL"; int n = strlen(txt); const char* ptr = txt; while (ptr) { const char* f = memchr(ptr, '/', txt + n - ptr); if (f) { printf("Found %s at %d\n", f, f - txt); ptr = f + 1; } else { ptr = NULL; } } return 0; }
I can't reproduce this (GCC 5.3 and 9.2 on RHEL 7.6, Valgrind built from git source).
(I should add I was running in a virtualbox vm if that is significant?) It is also OK under Debian, which has gcc version 10.2.1 20210110 (Debian 10.2.1-6) and Valgrind-3.16.1 On Fedora, where it shows the problem, gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) and valgrind-3.17.0 Please advise if I can help further.
I still get the problem on Fedora 35, gcc 11.2.1, valgrind from current (as of today) git (Valgrind-3.19.0.GIT) Suggests that gcc 11 is miscompiling something?
I can't reproduce this, running on Fedora 35, x86_64, using V from git and the system gcc, 11.2.1. This is on an Intel i7-3615QM, in short an older machine that supports up to avx but not avx2. Can you post the results of /usr/bin/lscpu on your F35 install?
Sorry this is proving more troublesome than I expected. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 1 On-line CPU(s) list: 0 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz CPU family: 6 Model: 158 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 Stepping: 10 BogoMIPS: 4416.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowpre fetch invpcid_single pti fsgsbase avx2 invpcid rdseed clflushopt md_clear flush _l1d Virtualization features: Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 32 KiB (1 instance) L1i: 32 KiB (1 instance) L2: 256 KiB (1 instance) L3: 9 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0 Vulnerabilities: Itlb multihit: KVM: Mitigation: VMX unsupported L1tf: Mitigation; PTE Inversion Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Meltdown: Mitigation; PTI Spec store bypass: Vulnerable Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Full generic retpoline, STIBP disabled, RSB filling Srbds: Unknown: Dependent on hypervisor status Tsx async abort: Not affected
> Model name: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz Hmm, nothing particularly exotic there. > Sorry this is proving more troublesome than I expected. It's OK; however; we can't fix it if we can't reproduce it. Can you maybe re-check your testing setup? TBH I seriously doubt that Massif or the V framework as a whole mishandles memchr, since I've run huge apps (eg, Firefox) on Massif and I see no such breakage. Plus there are multiple auto-test runs every night, on various F35 targets, and I'm not aware of such breakage in them. That said it maybe that you've fallen across some obscure corner case that is broken. But I can't imagine what that could be.
As I'm using VirtualBox I can restart the VM with AVX2 disabled, using the command .\vboxmanage setextradata "Fedora" VBoxInternal/CPUM/IsaExts/AVX2 0 Now the problem does not occur. However, I did report in comment 2 that under Debian the problem didn't occur either, and this was using the same VirtualBox/Host PC, so it isn't simply AVX2 vs not AVX2. Thanks for your help so far! The output of lscpu in this case is $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 1 On-line CPU(s) list: 0 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz CPU family: 6 Model: 158 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 Stepping: 10 BogoMIPS: 4415.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowpre fetch invpcid_single pti fsgsbase invpcid rdseed clflushopt md_clear flush_l1d Virtualization features: Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 32 KiB (1 instance) L1i: 32 KiB (1 instance) L2: 256 KiB (1 instance) L3: 9 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0 Vulnerabilities: Itlb multihit: KVM: Mitigation: VMX unsupported L1tf: Mitigation; PTE Inversion Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Meltdown: Mitigation; PTI Spec store bypass: Vulnerable Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Full generic retpoline, STIBP disabled, RSB filling Srbds: Unknown: Dependent on hypervisor status Tsx async abort: Not affected
I did a little more checking. I reduced the test program somewhat (attached) I tried putting a breakpoint on memchr both standalone and running under valgrind, to see what got called. I saw for 'standalone' (gdb) run Starting program: /home/peter/Projects/vgtest/a.out ... Breakpoint 1, memchr_ifunc () at ../sysdeps/x86_64/multiarch/memchr.c:29 29 libc_ifunc_redirected (__redirect_memchr, memchr, IFUNC_SELECTOR ()); (gdb) step __memchr_sse2 () at ../sysdeps/x86_64/multiarch/../memchr.S:34 and under Valgrind $ valgrind --tool=massif --vgdb-stop-at=startup -v ./a.out (gdb) target remote | /usr/libexec/valgrind/../../bin/vgdb --pid=1734 ... (gdb) br memchr Breakpoint 1 at 0x401040 (2 locations) (gdb) cont Continuing. Breakpoint 1, memchr_ifunc () at ../sysdeps/x86_64/multiarch/memchr.c:29 29 libc_ifunc_redirected (__redirect_memchr, memchr, IFUNC_SELECTOR ()); (gdb) step __memchr_avx2 () at ../sysdeps/x86_64/multiarch/memchr-avx2.S:60 60 ENTRY (MEMCHR) Note that in the first case we're calling into __memchr_sse2 and for the second it is __memchr_avx2. Could this be significant?
Created attachment 143101 [details] Updated/simplified test case
That is all very strange. Q: for the failing case, does adding `--alignment=32` make any difference? That changes the alignment produced by `malloc`.
(In reply to Julian Seward from comment #10) > Q: for the failing case, does > adding `--alignment=32` make any difference? That didn't make any difference that I could see. I found that running it like this: GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX_Fast_Unaligned_Load valgrind --tool=massif ./a.out makes it work OK (that is supposed to disable some CPU feature detection, as mentioned in https://stackoverflow.com/a/61048314/231929) It would be good to rule virtualbox in or out, I wonder if there is anyone with a AVX2-capable CPU running Fedora 35 directly who could test it. Unfortunately I couldn't reproduce it on Debian 11. I'm running out of ideas now.
I think, that I can clarify this issue a bit. From my understanding, this issue is caused by a unfortunate combination of glibc and VirtualBox. 1. VirtualBox has the oddity of providing AVX2 instructions to its clients, but not BMI instructions: https://www.virtualbox.org/ticket/15471 This fact makes VirtualBox a quite weird plattform: at least for Intel CPUs BMI and AVX2 have both been introduced with Haswell. It is actually quite unusual, that a CPU architecture supports AVX2 but _not_ BMI. 2. glibc uses a special implementations of memchr for different instruction sets available. Which one is actually used is decided depending on the available instruction sets: https://github.com/bminor/glibc/blob/glibc-2.36/sysdeps/x86_64/multiarch/ifunc-impl-list.c#L60 3. The AVX2 implementation of memchr is enabled if AVX2 is available: https://github.com/bminor/glibc/blob/glibc-2.36/sysdeps/x86_64/multiarch/ifunc-impl-list.c#L71 4. But it turns out, that the AVX2 implementation of memchr uses not only AVX2 instructions but also BMI instructions, namely TZCNTL: https://github.com/bminor/glibc/blob/master/sysdeps/x86_64/multiarch/memchr-avx2.S#L101 5. My conclusion is now: VirtualBox communicates to its guests, that AVX2 is available but not BMI. This results in glibc using the AVX2 implemntation although it needs the BMI instructions, too. This is probably not an issue if the VBox is executed on a x86 guest, because the needed instructions are actually supported there and with modern CPU fetures for virtualization (https://bugs.kde.org/show_bug.cgi?id=444545) the instructions are more or less directly forwarded to the CPU anyways (AFAIK). It seems now, that massif is confused by running a program which uses BMI instructions on a system which allegidly doesn't support them. I can backup this claim with the following observations: - The above provided test programm work with massif on ubuntu 20.04 and 20.04 but not with 21.10, 22.04 and 22.10. The breaking change must have been introduced between 2.33 and 2.34. It turns out, that 2.34 is exactly the version, in which the BMI instructions have been introduced in the "main path" (see https://github.com/bminor/glibc/blob/glibc-2.34/sysdeps/x86_64/multiarch/memchr-avx2.S vs https://github.com/bminor/glibc/blob/glibc-2.33/sysdeps/x86_64/multiarch/memchr-avx2.S). Interestinly in the master branch, there is now a check for BMI2 in place (https://github.com/bminor/glibc/blob/master/sysdeps/x86_64/multiarch/ifunc-impl-list.c#L76) so supposedly, this issue should not occure with the most recent "nigtly" version of glibc, if my reasoning is correct.
Christopher, that's really interesting and it sounds like you have found the cause. From my point of view I have been checking (very infrequently) to see if some update to Fedora or VirtualBox has magically fixed it... maybe the end is actually near, when the BMI2 check becomes part of the main release. Thanks, Peter
It does seem to have been fixed for me now, running under VirtualBox 7.0.8: glibc.x86_64 2.37-4.fc38 gcc version 13.1.1 20230426 (Red Hat 13.1.1-1) valgrind-3.21.0 Fedora 38 6.2.13-300.fc38.x86_64 I don't know which of the updates was responsible for the change. Thanks everyone for your help.
Did the VBox CPU change in any ways (different lscpu flags)? Not much has changed in Valgrind. I did make one change that affects hwcaps and the CPUID dirtyhelper, but I don't think that is relevant. Is BMI support configurable with VBox? Does this work VBoxManage setextradata "VM name" VBoxInternal/CPUM/BMI 1 ?
(In reply to Paul Floyd from comment #15) > Did the VBox CPU change in any ways (different lscpu flags)? Yes, it has changed, avx and avx2 have gone, and bmi1 and bmi2 have appeared, together with some other changes (see below) > Is BMI support configurable with VBox? > Does this work > VBoxManage setextradata "VM name" VBoxInternal/CPUM/BMI 1 Setting it to '1' or '0' doesn't make any difference to the CPU flags. From my point of view this bug is fixed, and I guess from the CPU flags info it's because of a change in VirtualBox? Thanks, Peter --- Flags previously (my post https://bugs.kde.org/show_bug.cgi?id=444545#c5): fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d Flags now (BMI unset or set to 1): fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d
Yes I agree that this can be closed.