434193 – GCC 9+ inlined strcmp causes "Conditional jump or move depends on uninitialised value" report

Bug 434193 - GCC 9+ inlined strcmp causes "Conditional jump or move depends on uninitialised value" report

Summary: GCC 9+ inlined strcmp causes "Conditional jump or move depends on uninitialis...

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	unspecified
Platform:	Debian testing Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-03-09 16:40 UTC by Mike Crowe
Modified:	2021-03-17 16:38 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
a.c (143 bytes, text/x-csrc) 2021-03-09 16:40 UTC, Mike Crowe	Details
b.c (70 bytes, text/x-csrc) 2021-03-09 16:41 UTC, Mike Crowe	Details
Fix for optimised strcmp on x86, x86_64, arm and AArch64 (1.46 KB, patch) 2021-03-10 15:54 UTC, Mike Crowe	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Mike Crowe 2021-03-09 16:40:56 UTC

Created attachment 136531 [details]
a.c

GCC 9.3.0 (OpenEmbedded, Ubuntu 20.04) and GCC 10.2.1 (Debian Bullseye) with -O2 generate x86 and x86-64 code that causes memcheck (v2.16.1, 8b1961511c93962ea2a9b918af8e9c32e3c24d71) to report:

==4112219== Conditional jump or move depends on uninitialised value(s)
==4112219==    at 0x10906D: main (a.c:10)

STEPS TO REPRODUCE

 gcc -O3 a.c b.c
 valgrind ./a.out

(the files need to be separate so that GCC can't see the implementation of the fill function when compiling main.)

The generated code contains:

    1067:       66 83 7c 24 0a 73       cmpw   $0x73,0xa(%rsp)
    106d:       75 e9                   jne    1058 <main+0x18>

where the cmpw is comparing a 16-bit word against 's', '\0' when only '\0' was written to that part of the buffer.

This appears to be similar to bug 413642 and bug 420780.

Comment 1 Mike Crowe 2021-03-09 16:41:20 UTC

Created attachment 136532 [details]
b.c

Comment 2 Mike Crowe 2021-03-09 17:59:04 UTC

GCC 10.2.1 (Debian Bullseye) AArch64 appears to be doing something similar:

 650:   79404be0        ldrh    w0, [sp, #36]
 654:   7101cc1f        cmp     w0, #0x73

except it looks like it's a 32-bit comparison this time so three uninitialised bytes are being compared.

Comment 3 Julian Seward 2021-03-09 18:04:38 UTC

Does running with --expensive-definedness-checks=yes make any difference?

Comment 4 Mike Crowe 2021-03-09 18:38:11 UTC

(In reply to Julian Seward from comment #3)
> Does running with --expensive-definedness-checks=yes make any difference?

Yes. It makes the problem go away for all the combinations that I'd previously tested.

It ain't half slow on anything more than toy test cases though. :)

Comment 5 Julian Seward 2021-03-10 07:54:34 UTC

(In reply to Mike Crowe from comment #4)
> It ain't half slow on anything more than toy test cases though. :)

That was a general test to establish whether lack of precise
instrumentation was indeed the problem.  Now that we've established that,
I can enable precise instrumentation just for the case CmpEQ16/NE16 on
those two targets, which shouldn't be much of a per hit, unless your 
program spends all its time doing 16-bit equality comparisons.

Comment 6 Julian Seward 2021-03-10 08:08:47 UTC

Here's my proposed fix.  It works for your test case on x86_64
but I didn't test it on arm64.  Does it work for you?

diff --git a/memcheck/mc_translate.c b/memcheck/mc_translate.c
index 516988bdd..759f00b3a 100644
--- a/memcheck/mc_translate.c
+++ b/memcheck/mc_translate.c
@@ -8590,12 +8590,14 @@ IRSB* MC_(instrument) ( VgCallbackClosure* closure,
 #     elif defined(VGA_amd64)
       mce.dlbo.dl_Add32           = DLexpensive;
       mce.dlbo.dl_Add64           = DLauto;
+      mce.dlbo.dl_CmpEQ16_CmpNE16 = DLexpensive;
       mce.dlbo.dl_CmpEQ32_CmpNE32 = DLexpensive;
       mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive;
 #     elif defined(VGA_ppc64le)
       // Needed by (at least) set_AV_CR6() in the front end.
       mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive;
 #     elif defined(VGA_arm64)
+      mce.dlbo.dl_CmpEQ16_CmpNE16 = DLexpensive;
       mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive;
 #     endif

Comment 7 Mike Crowe 2021-03-10 12:35:04 UTC

(In reply to Julian Seward from comment #6)
>  #     elif defined(VGA_arm64)
> +      mce.dlbo.dl_CmpEQ16_CmpNE16 = DLexpensive;
>        mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive;
>  #     endif

Thanks! This works for me if I use mce.dlbo.dl_CmpEQ32_CmpNE32 instead (it's a 32-bit compare on AArch64.)

I shall see if I can test on ARM32 and some other architectures too.

Comment 8 Mike Crowe 2021-03-10 15:54:39 UTC

Created attachment 136555 [details]
Fix for optimised strcmp on x86, x86_64, arm and AArch64

This patch makes the test case work for me on x86, x86_64, arm and AArch64. I suspect that something similar is required for ppc64 at least.

Comment 9 Julian Seward 2021-03-12 17:00:53 UTC

Committed, 21571252964c4d7860210cbe9f60a49eb6881824.
Thanks for the patch (and analysis).

Comment 10 Mike Crowe 2021-03-17 16:38:16 UTC

I had a go at reproducing this problem on a ppc64be machine with Valgrind-3.17.0.RC2 too. I could, but it didn't go away with --expensive-definedness-checks=yes or by applying equivalent patches to the same area of the code (with the addition of defined(VGA_ppc64be).)

I don't use ppc64 so this isn't important to me, I just happened to have access to such a machine to try it on.