Bug 492210

Summary: False positive on x86/amd64 with ZF taken directly from addition
Product: [Developer tools] valgrind Reporter: Alexander Monakov <amonakov>
Component: memcheckAssignee: Paul Floyd <pjfloyd>
Status: RESOLVED FIXED    
Severity: normal CC: pjfloyd
Priority: NOR    
Version: 3.24 GIT   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: tests all add/sub Z/NZ

Description Alexander Monakov 2024-08-26 13:33:19 UTC
This is minimized from Memcheck false positive on GCC bootstrap, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116458

Take the following testcase:

static int f(short *s)
{
        return s[0] + (s[1] << 16);
}

int main()
{
        void *p = __builtin_malloc(64);
        *(char *)p = 1;
        asm("" ::: "memory");
        int r = f(p);
        asm("" : "+d"(r));
        if (!r)
                __builtin_abort();
}

Compile with 'gcc -O2 -g1' and run under valgrind. Then remove or comment out

        asm("" : "+d"(r));

and compile/run again. valgrind will complain:

==16441== Conditional jump or move depends on uninitialised value(s)
==16441==    at 0x10908D: main (in /tmp/vg/a.out)

even though the difference in generated code is

--- 1.s 2024-08-24 09:46:24.719252103 +0300
+++ 2.s 2024-08-24 09:46:35.202670418 +0300
@@ -19,7 +19,6 @@
        movswl  (%rax), %eax
        sall    $16, %edx
        addl    %eax, %edx
-       testl   %edx, %edx
        je      .L3
        xorl    %eax, %eax
        addq    $8, %rsp

i.e. ZF is now taken directly from addition rather than recomputed with the TEST instruction. In other words, propagation of known bits in ADD is powerful enough and proves that low 8 bits of %edx are known, but something goes awry in deducing ZF.
Comment 1 Paul Floyd 2024-08-27 06:20:41 UTC
I can reproduce similar errors on amd64.

I'm not too sure what the side effects of the inline asm is.

The 4 bytes of r should be U1U1 (where U is uninitialized). --expensive-definedness-checks=yes doesn't seem to help.
Comment 2 Alexander Monakov 2024-08-27 06:33:50 UTC
(In reply to Paul Floyd from comment #1)
> I can reproduce similar errors on amd64.
> 
> I'm not too sure what the side effects of the inline asm is.

It just pretends to modify the variable 'r' to make GCC emit 'test edx, edx' before the branch. Since you've reproduced the issue, and it's the difference between two variants of machine code what matters, feel free to forget the C testcase.
 
> The 4 bytes of r should be U1U1 (where U is uninitialized).

UUU1 actually, s[1] in f is UU, s[0] is U1.
Comment 3 Alexander Monakov 2024-08-29 15:54:42 UTC
The following patch helps. I'm quite unsure what the numbers in comments (like "8," in the context visible here) are counting.

diff --git a/VEX/priv/guest_amd64_helpers.c b/VEX/priv/guest_amd64_helpers.c
index da1cabc3c..033167037 100644
--- a/VEX/priv/guest_amd64_helpers.c
+++ b/VEX/priv/guest_amd64_helpers.c
@@ -1098,6 +1098,13 @@ IRExpr* guest_amd64_spechelper ( const HChar* function_name,
                            binop(Iop_Add64, cc_dep1, cc_dep2),
                            mkU64(0)));
       }
+      if (isU64(cc_op, AMD64G_CC_OP_ADDL) && isU64(cond, AMD64CondZ)) {
+         /* long add, then Z --> test ((int)(dst+src) == 0) */
+         return unop(Iop_1Uto64,
+                     binop(Iop_CmpEQ32,
+                           unop(Iop_64to32, binop(Iop_Add64, cc_dep1, cc_dep2)),
+                           mkU32(0)));
+      }

       /* 8, */
       if (isU64(cc_op, AMD64G_CC_OP_ADDQ) && isU64(cond, AMD64CondS)) {
Comment 4 Paul Floyd 2024-09-01 05:24:26 UTC
The patch looks ok to me. I’ll check if there should be all 4 addq addl subq and subl variants.

The numbers 0 4 8 etc in the comments are the values of the condition code.
Comment 5 Alexander Monakov 2024-09-01 07:48:35 UTC
(In reply to Paul Floyd from comment #4)
> The patch looks ok to me. I’ll check if there should be all 4 addq addl subq
> and subl variants.

There are also CondZ and CondNZ variants.
Comment 6 Paul Floyd 2024-09-02 05:52:56 UTC
(In reply to Alexander Monakov from comment #5)
> (In reply to Paul Floyd from comment #4)
> > The patch looks ok to me. I’ll check if there should be all 4 addq addl subq
> > and subl variants.
> 
> There are also CondZ and CondNZ variants.

It looks like these have been added as needed, not systematically to cover all cases.
Comment 7 Alexander Monakov 2024-09-02 08:11:19 UTC
(In reply to Paul Floyd from comment #6)
> It looks like these have been added as needed, not systematically to cover
> all cases.

In GCC we cannot leave compiling-under-valgrind broken until a release of Valgrind with the fix for a false positive appears. We apply a workaround (in this case, initializing a few bytes more) and move on. As far as vectorized code in libcpp is concerned, a fix for this particular case is not needed anymore. If all false positives are fixed in this manner — covering just one case on just one architecture, when the original project already worked around it — all it may amount to is accumulation of dead weight.
Comment 8 Paul Floyd 2024-09-09 08:42:11 UTC
(In reply to Alexander Monakov from comment #7)
> (In reply to Paul Floyd from comment #6)
> > It looks like these have been added as needed, not systematically to cover
> > all cases.
> 
> In GCC we cannot leave compiling-under-valgrind broken until a release of
> Valgrind with the fix for a false positive appears. We apply a workaround
> (in this case, initializing a few bytes more) and move on. As far as
> vectorized code in libcpp is concerned, a fix for this particular case is
> not needed anymore. If all false positives are fixed in this manner —
> covering just one case on just one architecture, when the original project
> already worked around it — all it may amount to is accumulation of dead
> weight.

This item is my next priority. I'd like to be sure of getting it into 3.24.
Comment 9 Paul Floyd 2024-09-10 06:48:30 UTC
There are two things that I don't yet understand.

I can't get addb or subb versions to work.

Also I'm not sure what triggers the NZ versions. add and sub followed by jne seem to also trigger the Z version.
Comment 10 Paul Floyd 2024-09-11 07:27:23 UTC
Created attachment 173550 [details]
tests all add/sub Z/NZ

Here's the test I've been using. I don't think that there should be any errors since the lowest bit of all of the add/sub operations is always defined and 1 so the Z flag should always be defined and zero.

In Valgrind I've added all combinations of sub/add and Z/NZ.

The byte operations still generate errors. So instead of 8 errors I'm still getting 4 errors.
Comment 11 Paul Floyd 2024-09-15 07:07:28 UTC
(In reply to Paul Floyd from comment #10)

> The byte operations still generate errors. So instead of 8 errors I'm still
> getting 4 errors.

I see now. The auto expensiveness checks only apply down to word operations, not byte ones.
Comment 12 Paul Floyd 2024-09-15 08:05:31 UTC
Could you see if my try branch users/paulf/try-bug492210 works OK?
Comment 13 Alexander Monakov 2024-09-15 19:08:26 UTC
(In reply to Paul Floyd from comment #12)
> Could you see if my try branch users/paulf/try-bug492210 works OK?

I briefly tried it with a standalone test for vectorized helpers from libcpp and the false positives are gone.

By the way, I noticed that the repeated comment in your patch will need corrections:

+         /* long add, then Z --> test ((char)(dst+src) == 0) */

It's "long" only in the first instance, for ADDW/ADDB it's word/byte respectively.
Comment 14 Paul Floyd 2024-09-15 19:35:06 UTC
(In reply to Alexander Monakov from comment #13)

> It's "long" only in the first instance, for ADDW/ADDB it's word/byte
> respectively.

I don't think that the comments are very consistent in this respect. They should probably all be long long.

 /* long add, then Z --> test ((char)(dst+src) == 0) */
         return unop(Iop_1Uto64,
                     binop(Iop_CmpEQ8,
                           unop(Iop_64to8, binop(Iop_Add64, cc_dep1, cc_dep2)),
                           mkU8(0)));

binop(Iop_Add64, cc_dep1, cc_dep2) is the long long add

unop(Iop_64to8 ...) is the char cast

binop(Iop_CmpEQ8, ...  mkU8(0)) is the test == 0 and finally

 unop(Iop_1Uto64 ...) is the conversion to 1 bit Z flag.
Comment 15 Alexander Monakov 2024-09-15 19:43:48 UTC
(In reply to Paul Floyd from comment #14)
> (In reply to Alexander Monakov from comment #13)
> 
> > It's "long" only in the first instance, for ADDW/ADDB it's word/byte
> > respectively.
> 
> I don't think that the comments are very consistent in this respect. They
> should probably all be long long.

I think you are misunderstanding what the comments refer to: the part before "-->" refers to what's tested in the "if". Have a look at other such comments in the file.
Comment 16 Paul Floyd 2024-09-15 20:14:24 UTC
(In reply to Alexander Monakov from comment #15)

> I think you are misunderstanding what the comments refer to: the part before
> "-->" refers to what's tested in the "if". Have a look at other such
> comments in the file.

What if? You mean the condition that's being generated? For me that's clear enough from the Q/L/W/B suffixes.

AFAICS it's all mixed up. It should either be all amd64 terminology (quad, long or maybe doubleword, word, byte) or ILP64 C (long or long long, int, short, byte). Byte and word seem OK but quad/long long and long/int are confusing.
Comment 17 Paul Floyd 2024-09-15 20:14:53 UTC
commit 7e78a9a990c990631adb3f7c7edfd8cda418547f (HEAD -> master, origin/master, origin/HEAD)
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Sun Sep 15 09:52:56 2024 +0200

    Bug 492210 - False positive on x86/amd64 with ZF taken directly from addition
    
    Also adds similar checks for short and char equivalents to the
    original int reproducer.
    
    Initial fix provided by
       Alexander Monakov <amonakov@gmail.com>
    
    Two versions of the testcase, one with default options and one with
    -expensive-definedness-checks=yes because the byte operations
    subb and addb need the flag turned on explicitly.