432801 – Valgrind 3.16.1 reports a jump based on uninitialized memory somehow related to clang and signals

Bug 432801 - Valgrind 3.16.1 reports a jump based on uninitialized memory somehow related to clang and signals

Summary: Valgrind 3.16.1 reports a jump based on uninitialized memory somehow related ...

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	unspecified
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-02-11 13:03 UTC by Peter Klotz
Modified:	2023-09-23 21:27 UTC (History)
CC List:	5 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
The test program (716 bytes, text/plain) 2021-02-11 13:03 UTC, Peter Klotz	Details
Disassembly of main() when Valgrind shows the warning (8.79 KB, text/plain) 2021-02-11 15:26 UTC, Peter Klotz	Details
Disassembly of main() when Valgrind shows no warning (8.25 KB, text/plain) 2021-02-11 15:33 UTC, Peter Klotz	Details
patch for expensive greater than comparisons (14.30 KB, patch) 2021-03-03 19:47 UTC, Eyal	Details
patch with tests (12.96 KB, patch) 2021-03-04 22:11 UTC, Eyal	Details
patch that supports pcmpgtX for 64/32/16/8 bit (24.57 KB, patch) 2021-03-11 05:16 UTC, Eyal	Details
patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates (48.04 KB, patch) 2021-03-12 04:05 UTC, Eyal	Details
c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates (133.09 KB, patch) 2021-04-29 04:26 UTC, Eyal	Details
c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates (133.09 KB, patch) 2021-05-07 20:33 UTC, Eyal	Details
c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates (133.13 KB, patch) 2021-06-06 17:12 UTC, Eyal	Details
c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates (221.07 KB, patch) 2023-09-04 14:30 UTC, Eyal	Details
c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates (221.07 KB, patch) 2023-09-05 14:38 UTC, Eyal	Details
attachment-882777-0.html (1.10 KB, text/html) 2023-09-07 23:11 UTC, Eyal	Details
attachment-395894-0.html (1.32 KB, text/html) 2023-09-14 00:56 UTC, Eyal	Details
attachment-1510033-0.html (1.93 KB, text/html) 2023-09-23 21:27 UTC, Eyal	Details
Show Obsolete (8) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Peter Klotz 2021-02-11 13:03:55 UTC

Created attachment 135595 [details]
The test program

The attached program (extracted from a much larger piece of code) gives us this report:

==2193718== Conditional jump or move depends on uninitialised value(s)
==2193718==    at 0x20199D: main (Standalone.c:33)
==2193718==  Uninitialised value was created by a stack allocation
==2193718==    at 0x201A90: ??? (in a.out)

STEPS TO REPRODUCE
clang -W -Wall -g -O2 Standalone.c && valgrind --track-origins=yes ./a.out

According to objdump the allocation happens in sigaction():
  201a90:       ff 25 ea 21 00 00       jmpq   *0x21ea(%rip)        # 203c80 <sigaction@GLIBC_2.2.5>

The use of the uninitialized memory then occurs in line 33 which reads "if (hs==hp)".
It is unclear how something that sigaction() does could affect those two local variables.

Some strange facts about the issue:
 * It only occurs with clang (tested with 11.0.0 and 11.0.1), not with gcc (8.3.1 and 4.8.5)
 * It only happens when -O2 is specified, not with -O1
 * It occurs under RHEL 8 (glibc 2.28) and Arch Linux (glibc 2.33) but not RHEL 7 (glibc 2.17)
 * The problem goes away if slight, seemingly unrelated modifications are made to the program
  + Skip assignment to variable pattern[0]
  + Change all size_t variables to unsigned int
  + Call sigaction() only once
  + Replace the myLen() call with a constant

We are in no way sure, this is a Valgrind issue, it could be a miscompile by clang or a subtle error in the program we are missing. So our hope is that someone with deeper knowledge can point us into the right direction.

Comment 1 Mark Wielaard 2021-02-11 14:24:28 UTC

The actual instruction that produces the Conditional jump or move warning is relevant. Could you post the complete disassembly of main?
Does that disassembly differ between a working and non-working version?

Comment 2 Tom Hughes 2021-02-11 14:28:22 UTC

I reproduced this - it's the value of hp that valgrind thinks is uninitialised after the loop computing it.

Altering the loop in any way seems to stop it happening.

I didn't look at the assembly in any detail but I think clang is folding that loop that adds up a constant char array into some "clever" calculation.

Comment 3 Peter Klotz 2021-02-11 15:26:07 UTC

Created attachment 135596 [details]
Disassembly of main() when Valgrind shows the warning

Comment 4 Peter Klotz 2021-02-11 15:33:22 UTC

Created attachment 135597 [details]
Disassembly of main() when Valgrind shows no warning

The second sigaction() call was removed. In this case the assembly code of main() looks very similar to the original one.

In all other cases (e.g. removing the pattern[0] assignment or replacing the myLen() call with a constant) the disassembly looks much simpler. It seems the compiler optimization removes most of the code. This is interesting because my assumption was that "volatile" should prevent the compiler from doing so.

Comment 5 Eyal 2021-02-25 06:36:15 UTC

I have been getting a report about "Conditional jump or move depends on uninitialised value(s)" in the project pcb2gcode that only happens on clang and not gcc, like this one.

My bug started when github continuous integration switched from clang version 9 to clang version 10.

On a hunch, I tried this code with both old and new clang.  It's the same here: old clang does not report an error but new clang does.

gcc: no error
clang version 7: no error
clang version 10: error

Not sure if that's useful to you in debugging.  You might compare disassembly of different versions of clang.

Comment 6 Eyal 2021-02-25 08:05:05 UTC

Here's one that also fails but with a different error:

==12859== Syscall param exit_group(status) contains uninitialised byte(s)
==12859==    at 0x492F9D6: _Exit (_exit.c:31)
==12859==    by 0x48A2E89: __run_exit_handlers (exit.c:132)
==12859==    by 0x48A2EB9: exit (exit.c:139)
==12859==    by 0x488D0A1: (below main) (libc-start.c:342)
==12859==  Uninitialised value was created by a stack allocation
==12859==    at 0x401030: ??? (in /tmp/a.out)

// clang -W -Wall -g -O2 Standalone.c && valgrind --track-origins=yes ./a.out

#include <signal.h>
#include <string.h>

int main()
{
    {
        struct sigaction act;
        if (sigaction(SIGTERM, 0, &act) < 0) {
          return 1;
        }
        if (sigaction(SIGTERM, 0, &act) < 0) {
          return 1;
        }
    }

    char pattern[] = "123456786";
    pattern[2] = '4';
    const unsigned long plen = strlen(pattern);
    size_t hp=0;
    for (size_t i = 1; i < plen; ++i)
        hp += pattern[i];
    if (hp)
      return 1;

    return 0;
}

Comment 7 Eyal 2021-02-26 01:35:33 UTC

It seems that the issue is the loop.  When the length of pattern is more than 8, the code runs a routine that is able to sum 8 chars at a time.  It uses xmm for this.  It only enters that code if the number of bytes to sum is at least 8.

I'm not sure why removing the call to sigaction matters.  Maybe it's an alignment issue?

Removing the `pattern[0] = '1'` allows the compiler to figure out that the pattern is a constant and the whole thing gets evaluated at compile-time, as if constexpr.  That explains why the line is necessary to cause the bug.

The routine to sum 8 bytes at once seems really long to me but I guess clang has decided that it's faster than doing extra jumps.  If I can figure out how to have valgrind display the bit-validity values during processing, maybe I can see which instruction is getting instrumented incorrectly.

I feel that this is a valgrind bug, not a clang bug.

Comment 8 Eyal 2021-02-26 05:28:21 UTC

Even vgdb isn't helping me.  Here's the code that I'm using:

#include <signal.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main()
{
  struct sigaction act;
  if (sigaction(SIGTERM, 0, &act) == 1) {
    return 12;
  }
  if (sigaction(SIGTERM, 0, &act) == 1) {
    return 12;
  }

  char pattern[] = "0123456789";
    pattern[8] = 0;
    const unsigned long plen = strlen(pattern);
    size_t hp=0;
    for (size_t i = 0; i < plen; ++i)
        hp += pattern[i];
    volatile int j = 0;
    if (hp==j) {
      j++;
    }
    return 1;
}

If I switch the pattern[8] with pattern[9] then the output is nearly identical but it fails a memcheck test in valgrind.

I used vgdb to step through the code and I'm seeing strange behavior.  In the code, there are two instances of movq to an xmm register:

   0x00000000004011ff <+191>:	je     0x4012c8 <main+392>
   0x0000000000401205 <+197>:	mov    %rbx,%rdi
   0x0000000000401208 <+200>:	sub    %rsi,%rdi
   0x000000000040120b <+203>:	movq   %r8,%xmm0

and

   0x00000000004012c8 <+392>:	movq   %r8,%xmm0


Identical instructions.  Depending on the pattern[] line above, either the jump is taken or not.  In either case, eventually there is a movq from register r8 into xmm0.

In the working case, I see that $r8 is successfully copied into $xmm0 and the vbits are all cleared to 0, as it expected.  But in the broken case, the value is not copied and the vbits are wrong, too!  It might just be wrong when looking at the result in vgdb because the output is right in the end.

If someone can help me debug this, I'm willing to put in some time.

Comment 9 Eyal 2021-02-26 07:18:52 UTC

Oops, I spoke to soon.  It's a bug in clang.  Here's code that you can try:

#include <signal.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

void foo() {
    // Put the garbage number 123 into eax.
    // It's caller-saved so no problem.
    __asm__ ("lea 123, %eax;");

    // xmm variables are all caller-saved, too.
    // Fill them with garbage.
    __asm__ ("movd %eax, %xmm0");
    __asm__ ("punpcklbw %xmm0, %xmm0");
    __asm__ ("punpcklbw %xmm0, %xmm0");
    __asm__ ("punpcklbw %xmm0, %xmm0");
    __asm__ ("punpcklbw %xmm0, %xmm0");

    __asm__ ("movd %eax, %xmm1");
    __asm__ ("punpcklbw %xmm1, %xmm1");
    __asm__ ("punpcklbw %xmm1, %xmm1");
    __asm__ ("punpcklbw %xmm1, %xmm1");
    __asm__ ("punpcklbw %xmm1, %xmm1");

    __asm__ ("movd %eax, %xmm2");
    __asm__ ("punpcklbw %xmm2, %xmm2");
    __asm__ ("punpcklbw %xmm2, %xmm2");
    __asm__ ("punpcklbw %xmm2, %xmm2");
    __asm__ ("punpcklbw %xmm2, %xmm2");

    __asm__ ("movd %eax, %xmm3");
    __asm__ ("punpcklbw %xmm3, %xmm3");
    __asm__ ("punpcklbw %xmm3, %xmm3");
    __asm__ ("punpcklbw %xmm3, %xmm3");
    __asm__ ("punpcklbw %xmm3, %xmm3");

    __asm__ ("movd %eax, %xmm4");
    __asm__ ("punpcklbw %xmm4, %xmm4");
    __asm__ ("punpcklbw %xmm4, %xmm4");
    __asm__ ("punpcklbw %xmm4, %xmm4");
    __asm__ ("punpcklbw %xmm4, %xmm4");
}

int main() {
  char pattern[] = "0123456789";
  pattern[9] = 0;
  const unsigned long plen = strlen(pattern);
  foo();
  size_t hp=0;
  for (size_t i = 0; i < plen; ++i)
    hp += pattern[i];
  volatile int j = 0;
  if (hp==j) {
    j++;
  }
  printf("%ld\n", hp);
  return 1;
}

Run this with and without foo commented and see that the results are different.  No valgrind need.

Testing on godbolt, I see that the bug is present even through clang 11.0.1

https://godbolt.org/z/6s1rTd

Comment 10 Peter Klotz 2021-02-26 12:19:16 UTC

Hi Eyal

Thank you very much for the analysis of the problem!

Will you open an LLVM issue?

Comment 11 Eyal 2021-02-26 18:18:18 UTC

I'm back to not being sure if this is an llvm issue.

There are a few things at play here.  One is sigaction(), which can foul up the the contents of xmm registers, especially xmm3, which is the only register that isn't specifically cleared out by the loop.

The other issue is clang, which isn't specifically clearing out xmm3 but it sort of is doing thing that happen to clear it out.  I'm still investigating.

And finally, there is valgrind, which might just not know enough about an operation in order to determine that, yes, it's legal.

I feel like I'm making progress.  I'll update with more as I go along.

If I do find that it is a memcheck problem in clang, is there someone that can help me write a software patch for valgrind?

Comment 12 Eyal 2021-02-26 20:37:16 UTC

Okay, now I'm back to thinking that it's a valgrind issue.  But it's nothing that valgrind can fix.  Here's the problematic asm code:

<main+229>     movd   %edx,%xmm2              (1)
<main+233>     punpcklbw %xmm2,%xmm2          (2)
<main+237>     punpcklwd %xmm2,%xmm3          (3)
<main+241>     movzwl 0xa(%rsp,%rsi,1),%edx
<main+246>     movd   %edx,%xmm2              (4)
<main+250>     punpcklbw %xmm2,%xmm2
<main+254>     punpcklwd %xmm2,%xmm2
<main+258>     pxor   %xmm4,%xmm4             (5)
<main+262>     pcmpgtd %xmm3,%xmm4            (6)
<main+266>     psrad  $0x18,%xmm3

This code is some SIMD math that gets made for summing the characters in a string, like in the original code.  Before this code, the calls to sigaction have inadvertently fouled up the contents of the xmm registers.  That's okay, sigaction is allowed to do that because xmm registers are caller-saved.  That means that if the caller wanted them to have valid info, it was up to the caller to save and restore them beforehand.  No problem.

For the below explanation, we'll use letters (ABCD) for known bytes, X for unknown bytes, and 0 for 0 bytes.

(1) is putting a known value into xmm2.  So xmm2 is now well-defined as ABCD0000000000000000

(2) is bytewise interleaving the value in xmm2 with itself.  So xmm2 is now AABBCCDD00000000.

(3) is wordwise (16b) interleaving xmm2 with xmm3.  xmm3 is now AAXXBBXXCCXXDDXX.

(4) notice that xmm2 has been clobbered with a new value.

(5) xmm4 is set to all 0: 0000000000000000

(6) is doing a signed double-word (32-bit) SIMD comparison of xmm3 and xmm4 and putting the result as a 0 or -1 into xmm4.  If the xmm4 value is bigger than the xmm3 value, the xmm4 double-word will be filled with ones.  Otherwise, zero.  The comparison looks like this (MSB-first):

0000 > AAXX ? -1 : 0
0000 > BBXX ? -1 : 0
0000 > CCXX ? -1 : 0
0000 > DDXX ? -1 : 0

Considering the ABCDs:

  * If they are negative then the MSB is a 1 and zero is greater than all negative numbers so we don't need to look any further.
  * If they are positive then the whole number is positive and we don't need to look any further.
  * If they are zero then the number is either 0 or positive.  Either way, 0 is not greater than a non-negative number so we don't need to look any further.

So as far we're considered, the output here is completely defined!

valgrind memcheck doesn't track values, though, only whether or not a value was explicitly defined.  So valgrind is seeing this:

QRST > AAXX ? -1 : 0

All the letters (other than X) are defined but valgrind's mechanism doesn't keep track of whether or not they are zero.  Because it doesn't know that Q is a 0, it might be that QR match AA!  In which case, the only way to know the result is to know how ST compares to XX.  And XX is unknown so the result is unknown.  The "undefinedness" propagates all the way to a branch statement later in the code which valgrind detects and reports.

---

In (4) above, we see that xmm2 is clobbered quickly after it's use in (1,2,3).  So why use it anyway?  Also, lots of the other code doesn't use xmm2 as a scratchpad:

<main+246>     movd   %edx,%xmm2
<main+250>     punpcklbw %xmm2,%xmm2
<main+254>     punpcklwd %xmm2,%xmm2

<main+305>     movd   %edx,%xmm0
<main+309>     punpcklbw %xmm0,%xmm0
<main+313>     punpcklwd %xmm0,%xmm0

<main+322>     movd   %edx,%xmm1
<main+326>     punpcklbw %xmm1,%xmm1
<main+330>     punpcklwd %xmm1,%xmm1

So why not do the same here?

If I write this code and compile it:

void asm_test() {
  __asm__ ("movd %edx, %xmm2");
  __asm__ ("punpcklbw %xmm2, %xmm2");
  __asm__ ("punpcklwd %xmm2, %xmm3");

  __asm__ ("movd %edx, %xmm3");
  __asm__ ("punpcklbw %xmm3, %xmm3");
  __asm__ ("punpcklwd %xmm3, %xmm3");
}

And then objdump -D on it:

  401697:       66 0f 6e d2             movd   %edx,%xmm2
  40169b:       66 0f 60 d2             punpcklbw %xmm2,%xmm2
  40169f:       66 0f 61 da             punpcklwd %xmm2,%xmm3
  4016a3:       66 0f 6e da             movd   %edx,%xmm3
  4016a7:       66 0f 60 db             punpcklbw %xmm3,%xmm3
  4016ab:       66 0f 61 db             punpcklwd %xmm3,%xmm3

I can see that the instruction sizes are the same.  So I can use emacs hexl-mode or xxd or objdump -R to modify the binary and try it.  It only took a moment and it solved the problem.  Valgrind no longer reports any errors.

Comment 13 Eyal 2021-02-27 18:29:54 UTC

https://bugs.llvm.org/show_bug.cgi?id=49372

I hope that LLVM can help us out here!  I think that it would be nice.

Comment 14 Eyal 2021-03-02 07:48:46 UTC

A possible solution in Valgrind: https://github.com/eyal0/valgrind/commit/899ea491e358013579f87e443beff0a30c69e348

This improves Valgrind's check for definedness when doing 32x4 SIMD signed greater than.  It solves the problem for the test case below.  If you can, consider testing if it solves the problem for your original code.

Comment 15 Peter Klotz 2021-03-02 11:20:40 UTC

Hi Eyal

Your patch indeed solves the Valgrind error that occurred in the original program.
Really great work, thanks a lot!

Comment 16 Eyal 2021-03-03 19:47:13 UTC

Created attachment 136346 [details]
patch for expensive greater than comparisons

Comment 17 Julian Seward 2021-03-04 12:10:28 UTC

Interesting analysis, and a plausible patch; thank you for that.  This seems
like a new trick from LLVM.

I'm still struggling to understand what's going on, though.  I can see that

  for (size_t i = 0; i < plen; ++i)
    hp += pattern[i];

could be vectorised as you say, so that it loads 4 bytes at a time, and uses
punpcklbw twice to interleave them as described in comment 12.  But:

* where's the addition instruction that merges the lanes together?  I don't
  see that.

* what is the purpose of the pcmpgtd instruction?  The original sources
  contain a scalar comparison against zero

  if (hp==j) {
    j++;
  }

  Is that related?  If so, how does a scalar 32-bit equality test against zero
  get translated into a vector 32x4 signed-greater-than operation?

---

In the patch, there's mention of biasing:

+      // From here on out, we're dealing with biased integers instead of 2's
+      // complement.

What does that mean, in this context?

Regarding the test:

* you put it in memcheck/tests/x86; "x86" here means 32-bit only.  Is that
  what you intended?  I would have expected it to go in the "amd64" directory.

* because the test is written in C, whether or not it tests what you expect it
  to test depends entirely on the compiler used to compile it.  And most
  likely, it won't be vectorised, or won't be vectorised in the same way.
  This kind of test really needs to be written in assembly (inline assembly)
  so we know what we're testing.

Comment 18 Eyal 2021-03-04 14:48:44 UTC

>Interesting analysis, and a plausible patch; thank you for that.  This seems
>like a new trick from LLVM.

Thanks.  Yes

>* where's the addition instruction that merges the lanes together?  I don't
>  see that.

It's just beyond the part of the asm that I put in this post.  I didn't include it because it wasn't relevant to the bug.  Here you go:

   0x0000000000401225 <+213>:	movd   %edx,%xmm2
   0x0000000000401229 <+217>:	punpcklbw %xmm2,%xmm2
   0x000000000040122d <+221>:	punpcklwd %xmm2,%xmm3
   0x0000000000401231 <+225>:	movzwl 0xa(%rsp,%rsi,1),%edx
   0x0000000000401236 <+230>:	movd   %edx,%xmm2
   0x000000000040123a <+234>:	punpcklbw %xmm2,%xmm2
   0x000000000040123e <+238>:	punpcklwd %xmm2,%xmm2
   0x0000000000401242 <+242>:	pxor   %xmm4,%xmm4
   0x0000000000401246 <+246>:	pcmpgtd %xmm3,%xmm4
   0x000000000040124a <+250>:	psrad  $0x18,%xmm3
   0x000000000040124f <+255>:	punpckldq %xmm4,%xmm3
   0x0000000000401253 <+259>:	paddq  %xmm0,%xmm3

That last line is the add.

>* what is the purpose of the pcmpgtd instruction?  The original sources
>  contain a scalar comparison against zero

For sign-extension.  Because the numbers are 8-bit characters (char) but the output is 32-bit signed (int), we need to sign extend the char.  The comparison of 0>$xmm3 is filling the register with 1s if the number is negative.  The following psrad does an arithmetic shift, sign extending the negative to fill the 24-bit difference.  You can see the paddq below it.  I didn't analyze it too much; I trust the LLVM people know what they're doing.

  if (hp==j) {
    j++;
  }

>  Is that related?  If so, how does a scalar 32-bit equality test against zero
>  get translated into a vector 32x4 signed-greater-than operation?

Later in the code the vector eventually gets collapsed into a single value, like we need.  This comparison is what causes the error because memcheck only reports the error when the value gets used in a branch, not when it's created.

I could have also returned the sum; that would have worked as well.

---

>In the patch, there's mention of biasing:
>
>+      // From here on out, we're dealing with biased integers instead of 2's
>+      // complement.
>
>What does that mean, in this context?

Probably just bad wording on my part.  What I should have just said is that the MSB of a signed number is 0 for positive and 1 for negative so to get the max or min value, we need to invert the MSB with xor.  Or another way to think about it is that we convert the number space from 2's complement, where the 0x00 is 0 and 0xff is -1, to a "biased" space where 0x00 is -128, 0x7f is -1, 0x80 is 0, 0x81 is 1, all the way up to 0xff which is 127.  Each number is stored in bits as it's value +128. Writing numbers that way fixes the problem of 2's complement were a 1 in the MSB is *less* than a 0 in the MSB.

I will remove the confusing explanation and instead just mention the MSB thing.

>* you put it in memcheck/tests/x86; "x86" here means 32-bit only.  Is that
>  what you intended?  I would have expected it to go in the "amd64" directory.

I'll fix that.

>* because the test is written in C, whether or not it tests what you expect it
>  to test depends entirely on the compiler used to compile it.  And most
>  likely, it won't be vectorised, or won't be vectorised in the same way.
>  This kind of test really needs to be written in assembly (inline assembly)
>  so we know what we're testing.

Yes.  For that reason I put the binary in the patch!  That was bad.  I'll use asm instructions.  I had a bad time with them when I was debugging but I'll figure it out.  Also, I'll be able to test more of the edge cases.

There were bugs in the original patch anyway.  It's a delicate bit of code.

Because it's only for Intel SSE2 processors, if I put the test in the amd64 directory, is that enough to ensure that only the SSE2-enabled chips will run the test?

Comment 19 Eyal 2021-03-04 22:09:25 UTC

Comment on attachment 136346 [details]
patch for expensive greater than comparisons

Obsolete now.

Comment 20 Eyal 2021-03-04 22:11:00 UTC

Created attachment 136381 [details]
patch with tests

Comment 21 Eyal 2021-03-09 19:04:07 UTC

Will anyone consider the patch?  There might be a lot of false positives out there now that clang is doing this optimization and it would be nice to catch them before many users waste a lot of time on this.

Comment 22 Julian Seward 2021-03-10 08:26:38 UTC

(In reply to Eyal from comment #21)
> Will anyone consider the patch?

This (problem) is definitely something I want to deal with, and your
patch seems like a good starting place.  However, 3.17 is imminent and
I would prefer to defer till after 3.17 is done.  In part because I'd
like to do more general qualifying runs against gcc-11 and clang-11 
compiled code.  The patch might need to be generalised, so as to make
it handle different vector widths, lane widths and even maybe the scalar
cases.

Regarding the frequency of false positives, this is the first time
I have heard of this problem. By comparison, we have many bug reports
that relate to (eg) clang/gcc doing equality comparisons on partially
defined data.  I'd accept also though, that if this is a new optimisation
in clang, then we might get more such reports in the future.  But right
now it doesn't seem critical enough to put into 3.17.

Comment 23 Eyal 2021-03-10 15:37:16 UTC

Generalizing the patch to handle different bit widths would be easy. In fact, that's why I wrote it using a switch statement with just one case. To generalize it you just need to add cases into the switch statement. Pretty easy.

Another generalization that you didn't mention is covering not just greater-than but also less-than and equality.

I didn't code up all those alternatives because there are no such opcodes in x86 SSE2. So even if I did it, I couldn't test it.

If there are bug reports around vector equality, let me know! I think that I understand this part of the code pretty well by now and also how to test it so I could be helpful there. Send links.

I discovered this bug because the CI system on GitHub upgraded clang to v10 and it broke my testing. As more operating systems upgrade their compiler, there will be more and more of these errors.

Comment 24 Eyal 2021-03-11 05:16:45 UTC

Created attachment 136575 [details]
patch that supports pcmpgtX for 64/32/16/8 bit

This one is more complex but it supports all the different pcmpgt* instructions in x86.  The 8-bit one is a little strange, though.

Comment 25 Eyal 2021-03-12 04:05:05 UTC

Created attachment 136599 [details]
patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates

This one uses c++ in the test only.  It is a much more thorough test because it tests all lanes of the SIMD register.  It uses templates so the code is smaller.

Comment 26 Paul Floyd 2021-04-20 09:02:51 UTC

The testcase code needs some cleaning. I get tons ow warnings for the char* arguments that are getting passed const char* literals. Then ere are errors asking for downcasts in the string_to_ functions.

Comment 27 Eyal 2021-04-20 17:22:17 UTC

Maybe that's with the c code.  I wrote a version in c++ which is uses templates to make the test code much shorter.

Let's focus on just one of them so that I don't have to duplicate too much work.  Do we want c++ for the test code or only c?  Either will work but the c++ is shorter.

Comment 28 Paul Floyd 2021-04-21 13:00:25 UTC

C++ is fine. If you are using any features that old compilers might not support then you will need a feture test in configure.ac (this is also true for C functions and assembler).

Comment 29 Eyal 2021-04-21 15:44:00 UTC

Okay.  I don't think that I'm using any special features of c nor c++... How would I know?  Is there some test that I can run?

I didn't find any errors when I compiled with -Wall -Werror so I didn't see that string being converted to a char *.  Maybe that was with an older patch that wasn't the latest c++ one?

Comment 30 Paul Floyd 2021-04-22 08:48:32 UTC

Indeed, I mixed up the C and C++ versions. I just tried the C++ patch on FreeBSD (not yet officially supported). I haven't yet tried on Linux. 

Here are my results:

pcmpgt.cpp:114:18: warning: suggest braces around initialization of subobject
      [-Wmissing-braces]
   V128 vx128 = {0};
                 ^
                 {}   
+ several similar warnings

pcmpgt.cpp:69:7: error: couldn't allocate output register for constraint 'x'
      "pcmpgt%p[op] %[src],%[dst]"
      ^
pcmpgt.cpp:69:7: error: couldn't allocate output register for constraint 'x'
pcmpgt.cpp:69:7: error: couldn't allocate output register for constraint 'x'
pcmpgt.cpp:69:7: error: couldn't allocate output register for constraint 'x'

(with FreeBSD clang version 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2))

GCC on FreeBSD builds OK but the test fails as follows

paulf> cat pcmpgt.stderr.diff
--- pcmpgt.stderr.exp   2021-04-22 10:23:23.515674000 +0200
+++ pcmpgt.stderr.out   2021-04-22 10:38:08.450893000 +0200
@@ -10,26 +10,17 @@
    by 0x........: void doit<unsigned long>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147)
    by 0x........: main (pcmpgt.cpp:152)
 
-Use of uninitialised value of size 8
-   ...
-   by 0x........: void doit<unsigned long>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147)
-   by 0x........: main (pcmpgt.cpp:152)
-
 Conditional jump or move depends on uninitialised value(s)
    ...
    by 0x........: void doit<unsigned long>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147)
    by 0x........: main (pcmpgt.cpp:152)
 
-Conditional jump or move depends on uninitialised value(s)
+Syscall param write(buf) points to uninitialised byte(s)
    ...
    by 0x........: void doit<unsigned long>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147)
    by 0x........: main (pcmpgt.cpp:152)
+ Address 0x........ is on thread 1's stack
 
-Conditional jump or move depends on uninitialised value(s)
-   ...
-   by 0x........: void doit<unsigned long>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147)
-   by 0x........: main (pcmpgt.cpp:152)
-
 xxxxxxxxxxxxxxxx > xxxxxxxxxxxxxxxx == 0, completely undefined, error above, 1 == 1
 xxxxxxxxxxxxxxxx > xxxxxxxxxxxxxxxx == 0, completely undefined, error above, 1 == 1
 0000000000000000 > 0000000000000000 == 0, completely defined, 0 == 0
@@ -210,6 +201,16 @@
  Address 0x........ is on thread 1's stack
  in frame #0, created by void doit<unsigned short>(char const*, char const*, bool, char const*) (pcmpgt.cpp:138)
 
+Conditional jump or move depends on uninitialised value(s)
+   ...
+   by 0x........: void doit<unsigned short>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147)
+   by 0x........: main (pcmpgt.cpp:186)
+
+Conditional jump or move depends on uninitialised value(s)
+   ...
+   by 0x........: void doit<unsigned short>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147)
+   by 0x........: main (pcmpgt.cpp:186)
+
 xxx0 > f000 == 65535, undefined, error above, 1 == 1
 xxx0 > f000 == 65535, undefined, error above, 1 == 1
 xxx0 > f000 == 65535, undefined, error above, 1 == 1
@@ -519,4 +520,4 @@
 
 Use --track-origins=yes to see where uninitialised values come from
 For lists of detected and suppressed errors, rerun with: -s
-ERROR SUMMARY: 848 errors from 21 contexts (suppressed: 0 from 0)
+ERROR SUMMARY: 592 errors from 21 contexts (suppressed: 0 from 0)

Comment 31 Eyal 2021-04-29 04:26:11 UTC

Created attachment 137990 [details]
c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates

This one compiles with both g++ and clang++, no warnings, no errors, passes tests in both cases

Comment 32 Eyal 2021-05-07 16:53:01 UTC

How does the latest code look?  Are all the warnings gone now?  Passes tests on FreeBSD?

Comment 33 Paul Floyd 2021-05-07 19:30:33 UTC

Compiles cleanly.

The fast fails. The diffs are mainly out-by-one differences in the line numbers. Finally there's also a difference in the allocation summary/.

Comment 34 Eyal 2021-05-07 20:25:18 UTC

Huh.  I don't have any errors on my computer with it.  Can you send an example error from the test?  Maybe the location of the conditional jump depends on the platform...?

Comment 35 Eyal 2021-05-07 20:33:40 UTC

Created attachment 138226 [details]
c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates

Comment 36 Eyal 2021-05-07 20:34:06 UTC

Oh, I see it now.  I must have forgotten to recompile.  I've sent the new patch.

Comment 37 Eyal 2021-06-03 15:38:52 UTC

Any further comments?  This is fixing a real false positive that users of clang will experience with reasonable optimization levels.

Comment 38 Paul Floyd 2021-06-04 07:57:50 UTC

I'll restest this weekend and see if I can get Julian Seward's opinion.

Comment 39 Paul Floyd 2021-06-06 14:40:06 UTC

The build and the expected now look almost fine on FreeBSD.
Just the one small diff

paulf> diff pcmpgt.stderr.exp pcmpgt.stderr.out         
1598c1598
<   total heap usage: 1 allocs, 1 frees, 72,704 bytes allocated
---
>   total heap usage: 0 allocs, 0 frees, 0 bytes allocated

This is probably a libc difference and can be filtered with the filter_allocs script in the parent directory.

Comment 40 Eyal 2021-06-06 17:12:42 UTC

Created attachment 139036 [details]
c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates

filter allocs for FreeBSD

Comment 41 Eyal 2021-06-06 17:13:17 UTC

Got it done!  Thank you for testing out the code.  Try again?

Comment 42 Paul Floyd 2021-06-08 10:20:00 UTC

(In reply to Eyal from comment #41)
> Got it done!  Thank you for testing out the code.  Try again?

On FreeBSD:

paulf> perl tests/vg_regtest memcheck/tests/amd64/pcmpgt                       
pcmpgt:          valgrind   ./pcmpgt -q 

== 1 test, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==

Also tested and looks god on Fedora 33,

Comment 43 Julian Seward 2021-07-24 05:35:26 UTC

As a general note, I have been investigating unsigned < and <=
comparisons on scalar values for arm64, and expect to continue doing
so as a background task for the next few weeks.  So, this is not
forgotten.  Ideally we can arrive at a cheap and effective solution
that works for all targets.

Comment 44 Julian Seward 2021-07-24 05:36:19 UTC

I meant, unsigned < and <= scalar comparisons on undefined data.  Duh.

Comment 45 Eyal 2021-07-24 16:54:12 UTC

Sure, let me know if I can be of assistance.

This was an actual issue for me with false positives in boost in newer clang.  Boost is quite popular so I imagine that as more people upgrade to newer clang, maybe more of them will hit these errors.

Comment 46 Peter Klotz 2021-09-20 07:41:55 UTC

Is there a chance that the available fix for this issue will make it into the upcoming Valgrind version (3.18 I assume)?

Comment 47 Eyal 2022-02-10 16:42:58 UTC

Any progress here?  I see the valgrind is getting new commits all the time.

Comment 48 Paul Floyd 2022-06-23 12:51:32 UTC

Sorry but people committing aren't necessarily the ones able to review this patch.

Comment 49 Paul Floyd 2023-08-27 17:07:18 UTC

I'll use a try branch for this patch.

Comment 50 Paul Floyd 2023-09-03 09:10:58 UTC

(In reply to Paul Floyd from comment #49)
> I'll use a try branch for this patch.

I get two failures, both x86 testcases.

== 829 tests, 2 stderr failures, 2 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
memcheck/tests/x86/insn_sse2             (stdout)
memcheck/tests/x86/insn_sse2             (stderr)
memcheck/tests/x86/sse2_memory           (stdout)
memcheck/tests/x86/sse2_memory           (stderr)

--- sse2_memory.stderr.exp      2020-02-06 13:24:23.300142000 +0100
+++ sse2_memory.stderr.out      2023-09-03 10:43:50.421658000 +0200
@@ -0,0 +1,34 @@
+iselVecExpr (hwcaps = x86-mmxext-sse1-sse2-sse3): can't reduce
+ShlN8x16(V128{0x........},0x........:I8)
+vex: the `impossible' happened:
+   iselVecExpr_wrk
+vex storage: T total 182269004 bytes allocated
+vex storage: P total 496 bytes allocated
+
+valgrind: the 'impossible' happened:
+   LibVEX called failure_exit().
+
+host stacktrace:
+   ...
+
+sched status:
+  running_tid=1
+
+Thread 1: status = VgTs_Runnable (lwpid 102046)
+   ...
+client stack range: [0x........ 0x........] client SP: 0x........
+valgrind stack range: [0x........ 0x........] top usage: 4280 of 1048

--- insn_sse2.stderr.exp        2020-02-06 13:24:23.837082000 +0100
+++ insn_sse2.stderr.out        2023-09-03 10:43:44.391369000 +0200
@@ -0,0 +1,35 @@
+iselVecExpr (hwcaps = x86-mmxext-sse1-sse2-sse3): can't reduce
+ShlN8x16(V128{0x........},0x........:I8)
+vex: the `impossible' happened:
+   iselVecExpr_wrk
+vex storage: T total 252507324 bytes allocated
+vex storage: P total 496 bytes allocated
+
+valgrind: the 'impossible' happened:
+   LibVEX called failure_exit().
+
+host stacktrace:
+   ...
+
+sched status:
+  running_tid=1
+
+Thread 1: status = VgTs_Runnable (lwpid 102046)
+   at 0x........: pcmpgtb_1 (insn_sse2.c:7284)
+   by 0x........: main (insn_sse2.c:15347)
+client stack range: [0x........ 0x........] client SP: 0x........
+valgrind stack range: [0x........ 0x........] top usage: 4280 of 1048576

Comment 51 Eyal 2023-09-03 16:22:19 UTC

I'll take a look.

I gotta say, this patch is super annoying.  I've been working on it for 2.5 years now?  Instead of github, emails.  And no CI.  :-(  I've resorted to my own personal branch of valgrind with all my fixes.

I'll make one last push here.

Comment 52 Eyal 2023-09-03 16:34:16 UTC

Y'all need CI, I can't even build the master branch.  :-(

Comment 53 Mark Wielaard 2023-09-03 16:39:37 UTC

(In reply to Eyal from comment #52)
> Y'all need CI, I can't even build the master branch.  :-(

We do have CI against various distros and arches:
https://builder.sourceware.org/buildbot/#/builders?tags=valgrind
And although some tests are failing it does seem to build everywhere.
What is wrong with the build for you? How does it fail?

Comment 54 Eyal 2023-09-03 16:41:15 UTC

(In reply to Mark Wielaard from comment #53)
> (In reply to Eyal from comment #52)
> > Y'all need CI, I can't even build the master branch.  :-(
> 
> We do have CI against various distros and arches:
> https://builder.sourceware.org/buildbot/#/builders?tags=valgrind
> And although some tests are failing it does seem to build everywhere.
> What is wrong with the build for you? How does it fail?

Good to know that there is CI.

I'm bisecting the build failure now...

Comment 55 Eyal 2023-09-03 17:24:04 UTC

(In reply to Eyal from comment #54)
> (In reply to Mark Wielaard from comment #53)
> > (In reply to Eyal from comment #52)
> > > Y'all need CI, I can't even build the master branch.  :-(
> > 
> > We do have CI against various distros and arches:
> > https://builder.sourceware.org/buildbot/#/builders?tags=valgrind
> > And although some tests are failing it does seem to build everywhere.
> > What is wrong with the build for you? How does it fail?
> 
> Good to know that there is CI.
> 
> I'm bisecting the build failure now...

Alright, it was just that I forgot to run autogen.sh.  It's fine now.

I can't get those failing sse tests to even build.  Is it possible that my reasonably-modern computer doesn't support sse2?  When I run make in the directory memcheck/tests/x86, nothing gets built!  How do I fix this?

Comment 56 Eyal 2023-09-03 18:08:33 UTC

I was able to get the tests to build and pass on master by running `sudo apt-get install libc6-dbg:i386` and `sudo apt-get remove libc6-dbg`.  I guess that this is an x86 and not amd64 problem only?  

I'm able to reproduce the problem now and I'll try to figure out what's going on!

Comment 57 Eyal 2023-09-03 19:57:09 UTC

I've identified the problem and I hope to have a solution, soon.  It is based on a similar solution for AMD64.  Both platforms to not have an xmm shift with lane size of 8.  I'll update the patch and also add test cases for x86.

Comment 58 Eyal 2023-09-04 14:30:12 UTC

Created attachment 161402 [details]
c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates

I fixed support for x86.  To test for x86 on an amd64 computer, run these:

sudo apt-get install gcc-multilib g++-multilib
sudo apt-get install libc6-dbg:i386           
sudo apt-get remove libc6-dbg                

And then recompile and run tests.

Comment 59 Paul Floyd 2023-09-05 06:07:04 UTC

You forgot to update the expecteds (I saw that just the line numbers have changed), and I had to convert the text .cpp files into symlinks (does Linux / GCC accept a plain text .cpp file that just contains a path?).

On FreeBSD 13.2 amd64 with clang 14 I get

== 830 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==

Comment 60 Eyal 2023-09-05 14:38:11 UTC

Created attachment 161418 [details]
c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates

Re-made the amd64 expected output.

Comment 61 Eyal 2023-09-05 14:40:12 UTC

(In reply to Paul Floyd from comment #59)
> You forgot to update the expecteds (I saw that just the line numbers have
> changed), and I had to convert the text .cpp files into symlinks (does Linux
> / GCC accept a plain text .cpp file that just contains a path?).
> 
> On FreeBSD 13.2 amd64 with clang 14 I get
> 
> == 830 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0
> stdoutB failures, 0 post failures ==

Right you are!  I made the patch again: https://bugs.kde.org/attachment.cgi?id=161418

It is normal for git to store a symlink as a small file with just a path in it.  Did you apply the patch using `git am` or using `patch`?  I created the patch using `git format-patch` and if you apply it with `git am` then it should properly handle the symlink.

Comment 62 Paul Floyd 2023-09-07 05:39:46 UTC

OK didn't know that (or had forgotten). I think I tried git patch which didn't like the formatted patch and then my usual fallback 'patch -p1 < patchfile'

Comment 63 Eyal 2023-09-07 23:11:42 UTC

Created attachment 161501 [details]
attachment-882777-0.html

I think that you could have used 'git apply' also. Anyway, I hope that the
tests are all passing now and perhaps we can make progress on the patch?

Eyal

On Wed, Sep 6, 2023, 23:39 Paul Floyd <bugzilla_noreply@kde.org> wrote:

> https://bugs.kde.org/show_bug.cgi?id=432801
>
> --- Comment #62 from Paul Floyd <pjfloyd@wanadoo.fr> ---
> OK didn't know that (or had forgotten). I think I tried git patch which
> didn't
> like the formatted patch and then my usual fallback 'patch -p1 < patchfile'
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Comment 64 Paul Floyd 2023-09-10 16:02:41 UTC

> I think that you could have used 'git apply' also. Anyway, I hope that the
> tests are all passing now and perhaps we can make progress on the patch?

It's next on my list!

Been very busy this weekend updating one of the FreeBSD packages and fixing an issue with aligned_alloc and glibc 2.38.

Comment 65 Eyal 2023-09-14 00:56:06 UTC

Created attachment 161619 [details]
attachment-395894-0.html

Yeah, I saw that you have been busy with that!

Do you prefer that I go through the process of pushing a branch for CI? I
need to get added with some permissions?

Eyal

On Sun, Sep 10, 2023, 10:02 Paul Floyd <bugzilla_noreply@kde.org> wrote:

> https://bugs.kde.org/show_bug.cgi?id=432801
>
> --- Comment #64 from Paul Floyd <pjfloyd@wanadoo.fr> ---
>
> > I think that you could have used 'git apply' also. Anyway, I hope that
> the
> > tests are all passing now and perhaps we can make progress on the patch?
>
> It's next on my list!
>
> Been very busy this weekend updating one of the FreeBSD packages and
> fixing an
> issue with aligned_alloc and glibc 2.38.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Comment 66 Paul Floyd 2023-09-15 05:15:48 UTC

I've run the CI tests, all OK. I'll change the commit message to match typical bugzilla fixes.

Do you want me to keep
Author: eyal0 <109809+eyal0@users.noreply.github.com>
?

If you would like to contribute more to Valgrind then let us know. I think Mark and Julian will also agree.

Comment 67 Paul Floyd 2023-09-19 17:33:21 UTC

Many thanks!

commit 26a3abd27db2ef63dafea1ecab00e8239f341f0f (HEAD -> master, origin/master, origin/HEAD)
Author: Eyal Soha <eyalsoha@gmail.com>
Date:   Thu Feb 10 09:52:54 2022 -0700

    Bug 432801 - Valgrind 3.16.1 reports a jump based on uninitialized memory somehow related to clang and signals
    
    Add support for precise computation of SIMD greater-than on
    amd64 and x86.
    
    This adds support for 64bit, 16bit, and 8bit to the existing 32bit
    support.

Comment 68 Eyal 2023-09-23 21:27:11 UTC

Created attachment 161833 [details]
attachment-1510033-0.html

Thanks for doing this!  Now this one:
https://bugs.kde.org/show_bug.cgi?id=474160 ?

Eyal

On Tue, Sep 19, 2023 at 11:33 AM Paul Floyd <bugzilla_noreply@kde.org>
wrote:

> https://bugs.kde.org/show_bug.cgi?id=432801
>
> Paul Floyd <pjfloyd@wanadoo.fr> changed:
>
>            What    |Removed                     |Added
>
> ----------------------------------------------------------------------------
>          Resolution|---                         |FIXED
>              Status|REPORTED                    |RESOLVED
>
> --- Comment #67 from Paul Floyd <pjfloyd@wanadoo.fr> ---
> Many thanks!
>
> commit 26a3abd27db2ef63dafea1ecab00e8239f341f0f (HEAD -> master,
> origin/master,
> origin/HEAD)
> Author: Eyal Soha <eyalsoha@gmail.com>
> Date:   Thu Feb 10 09:52:54 2022 -0700
>
>     Bug 432801 - Valgrind 3.16.1 reports a jump based on uninitialized
> memory
> somehow related to clang and signals
>
>     Add support for precise computation of SIMD greater-than on
>     amd64 and x86.
>
>     This adds support for 64bit, 16bit, and 8bit to the existing 32bit
>     support.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.