Created attachment 136531 [details] a.c GCC 9.3.0 (OpenEmbedded, Ubuntu 20.04) and GCC 10.2.1 (Debian Bullseye) with -O2 generate x86 and x86-64 code that causes memcheck (v2.16.1, 8b1961511c93962ea2a9b918af8e9c32e3c24d71) to report: ==4112219== Conditional jump or move depends on uninitialised value(s) ==4112219== at 0x10906D: main (a.c:10) STEPS TO REPRODUCE gcc -O3 a.c b.c valgrind ./a.out (the files need to be separate so that GCC can't see the implementation of the fill function when compiling main.) The generated code contains: 1067: 66 83 7c 24 0a 73 cmpw $0x73,0xa(%rsp) 106d: 75 e9 jne 1058 <main+0x18> where the cmpw is comparing a 16-bit word against 's', '\0' when only '\0' was written to that part of the buffer. This appears to be similar to bug 413642 and bug 420780.
Created attachment 136532 [details] b.c
GCC 10.2.1 (Debian Bullseye) AArch64 appears to be doing something similar: 650: 79404be0 ldrh w0, [sp, #36] 654: 7101cc1f cmp w0, #0x73 except it looks like it's a 32-bit comparison this time so three uninitialised bytes are being compared.
Does running with --expensive-definedness-checks=yes make any difference?
(In reply to Julian Seward from comment #3) > Does running with --expensive-definedness-checks=yes make any difference? Yes. It makes the problem go away for all the combinations that I'd previously tested. It ain't half slow on anything more than toy test cases though. :)
(In reply to Mike Crowe from comment #4) > It ain't half slow on anything more than toy test cases though. :) That was a general test to establish whether lack of precise instrumentation was indeed the problem. Now that we've established that, I can enable precise instrumentation just for the case CmpEQ16/NE16 on those two targets, which shouldn't be much of a per hit, unless your program spends all its time doing 16-bit equality comparisons.
Here's my proposed fix. It works for your test case on x86_64 but I didn't test it on arm64. Does it work for you? diff --git a/memcheck/mc_translate.c b/memcheck/mc_translate.c index 516988bdd..759f00b3a 100644 --- a/memcheck/mc_translate.c +++ b/memcheck/mc_translate.c @@ -8590,12 +8590,14 @@ IRSB* MC_(instrument) ( VgCallbackClosure* closure, # elif defined(VGA_amd64) mce.dlbo.dl_Add32 = DLexpensive; mce.dlbo.dl_Add64 = DLauto; + mce.dlbo.dl_CmpEQ16_CmpNE16 = DLexpensive; mce.dlbo.dl_CmpEQ32_CmpNE32 = DLexpensive; mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive; # elif defined(VGA_ppc64le) // Needed by (at least) set_AV_CR6() in the front end. mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive; # elif defined(VGA_arm64) + mce.dlbo.dl_CmpEQ16_CmpNE16 = DLexpensive; mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive; # endif
(In reply to Julian Seward from comment #6) > # elif defined(VGA_arm64) > + mce.dlbo.dl_CmpEQ16_CmpNE16 = DLexpensive; > mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive; > # endif Thanks! This works for me if I use mce.dlbo.dl_CmpEQ32_CmpNE32 instead (it's a 32-bit compare on AArch64.) I shall see if I can test on ARM32 and some other architectures too.
Created attachment 136555 [details] Fix for optimised strcmp on x86, x86_64, arm and AArch64 This patch makes the test case work for me on x86, x86_64, arm and AArch64. I suspect that something similar is required for ppc64 at least.
Committed, 21571252964c4d7860210cbe9f60a49eb6881824. Thanks for the patch (and analysis).
I had a go at reproducing this problem on a ppc64be machine with Valgrind-3.17.0.RC2 too. I could, but it didn't go away with --expensive-definedness-checks=yes or by applying equivalent patches to the same area of the code (with the addition of defined(VGA_ppc64be).) I don't use ppc64 so this isn't important to me, I just happened to have access to such a machine to try it on.