Created attachment 135595 [details] The test program The attached program (extracted from a much larger piece of code) gives us this report: ==2193718== Conditional jump or move depends on uninitialised value(s) ==2193718== at 0x20199D: main (Standalone.c:33) ==2193718== Uninitialised value was created by a stack allocation ==2193718== at 0x201A90: ??? (in a.out) STEPS TO REPRODUCE clang -W -Wall -g -O2 Standalone.c && valgrind --track-origins=yes ./a.out According to objdump the allocation happens in sigaction(): 201a90: ff 25 ea 21 00 00 jmpq *0x21ea(%rip) # 203c80 <sigaction@GLIBC_2.2.5> The use of the uninitialized memory then occurs in line 33 which reads "if (hs==hp)". It is unclear how something that sigaction() does could affect those two local variables. Some strange facts about the issue: * It only occurs with clang (tested with 11.0.0 and 11.0.1), not with gcc (8.3.1 and 4.8.5) * It only happens when -O2 is specified, not with -O1 * It occurs under RHEL 8 (glibc 2.28) and Arch Linux (glibc 2.33) but not RHEL 7 (glibc 2.17) * The problem goes away if slight, seemingly unrelated modifications are made to the program + Skip assignment to variable pattern[0] + Change all size_t variables to unsigned int + Call sigaction() only once + Replace the myLen() call with a constant We are in no way sure, this is a Valgrind issue, it could be a miscompile by clang or a subtle error in the program we are missing. So our hope is that someone with deeper knowledge can point us into the right direction.
The actual instruction that produces the Conditional jump or move warning is relevant. Could you post the complete disassembly of main? Does that disassembly differ between a working and non-working version?
I reproduced this - it's the value of hp that valgrind thinks is uninitialised after the loop computing it. Altering the loop in any way seems to stop it happening. I didn't look at the assembly in any detail but I think clang is folding that loop that adds up a constant char array into some "clever" calculation.
Created attachment 135596 [details] Disassembly of main() when Valgrind shows the warning
Created attachment 135597 [details] Disassembly of main() when Valgrind shows no warning The second sigaction() call was removed. In this case the assembly code of main() looks very similar to the original one. In all other cases (e.g. removing the pattern[0] assignment or replacing the myLen() call with a constant) the disassembly looks much simpler. It seems the compiler optimization removes most of the code. This is interesting because my assumption was that "volatile" should prevent the compiler from doing so.
I have been getting a report about "Conditional jump or move depends on uninitialised value(s)" in the project pcb2gcode that only happens on clang and not gcc, like this one. My bug started when github continuous integration switched from clang version 9 to clang version 10. On a hunch, I tried this code with both old and new clang. It's the same here: old clang does not report an error but new clang does. gcc: no error clang version 7: no error clang version 10: error Not sure if that's useful to you in debugging. You might compare disassembly of different versions of clang.
Here's one that also fails but with a different error: ==12859== Syscall param exit_group(status) contains uninitialised byte(s) ==12859== at 0x492F9D6: _Exit (_exit.c:31) ==12859== by 0x48A2E89: __run_exit_handlers (exit.c:132) ==12859== by 0x48A2EB9: exit (exit.c:139) ==12859== by 0x488D0A1: (below main) (libc-start.c:342) ==12859== Uninitialised value was created by a stack allocation ==12859== at 0x401030: ??? (in /tmp/a.out) // clang -W -Wall -g -O2 Standalone.c && valgrind --track-origins=yes ./a.out #include <signal.h> #include <string.h> int main() { { struct sigaction act; if (sigaction(SIGTERM, 0, &act) < 0) { return 1; } if (sigaction(SIGTERM, 0, &act) < 0) { return 1; } } char pattern[] = "123456786"; pattern[2] = '4'; const unsigned long plen = strlen(pattern); size_t hp=0; for (size_t i = 1; i < plen; ++i) hp += pattern[i]; if (hp) return 1; return 0; }
It seems that the issue is the loop. When the length of pattern is more than 8, the code runs a routine that is able to sum 8 chars at a time. It uses xmm for this. It only enters that code if the number of bytes to sum is at least 8. I'm not sure why removing the call to sigaction matters. Maybe it's an alignment issue? Removing the `pattern[0] = '1'` allows the compiler to figure out that the pattern is a constant and the whole thing gets evaluated at compile-time, as if constexpr. That explains why the line is necessary to cause the bug. The routine to sum 8 bytes at once seems really long to me but I guess clang has decided that it's faster than doing extra jumps. If I can figure out how to have valgrind display the bit-validity values during processing, maybe I can see which instruction is getting instrumented incorrectly. I feel that this is a valgrind bug, not a clang bug.
Even vgdb isn't helping me. Here's the code that I'm using: #include <signal.h> #include <string.h> #include <stdio.h> #include <stdlib.h> int main() { struct sigaction act; if (sigaction(SIGTERM, 0, &act) == 1) { return 12; } if (sigaction(SIGTERM, 0, &act) == 1) { return 12; } char pattern[] = "0123456789"; pattern[8] = 0; const unsigned long plen = strlen(pattern); size_t hp=0; for (size_t i = 0; i < plen; ++i) hp += pattern[i]; volatile int j = 0; if (hp==j) { j++; } return 1; } If I switch the pattern[8] with pattern[9] then the output is nearly identical but it fails a memcheck test in valgrind. I used vgdb to step through the code and I'm seeing strange behavior. In the code, there are two instances of movq to an xmm register: 0x00000000004011ff <+191>: je 0x4012c8 <main+392> 0x0000000000401205 <+197>: mov %rbx,%rdi 0x0000000000401208 <+200>: sub %rsi,%rdi 0x000000000040120b <+203>: movq %r8,%xmm0 and 0x00000000004012c8 <+392>: movq %r8,%xmm0 Identical instructions. Depending on the pattern[] line above, either the jump is taken or not. In either case, eventually there is a movq from register r8 into xmm0. In the working case, I see that $r8 is successfully copied into $xmm0 and the vbits are all cleared to 0, as it expected. But in the broken case, the value is not copied and the vbits are wrong, too! It might just be wrong when looking at the result in vgdb because the output is right in the end. If someone can help me debug this, I'm willing to put in some time.
Oops, I spoke to soon. It's a bug in clang. Here's code that you can try: #include <signal.h> #include <string.h> #include <stdio.h> #include <stdlib.h> void foo() { // Put the garbage number 123 into eax. // It's caller-saved so no problem. __asm__ ("lea 123, %eax;"); // xmm variables are all caller-saved, too. // Fill them with garbage. __asm__ ("movd %eax, %xmm0"); __asm__ ("punpcklbw %xmm0, %xmm0"); __asm__ ("punpcklbw %xmm0, %xmm0"); __asm__ ("punpcklbw %xmm0, %xmm0"); __asm__ ("punpcklbw %xmm0, %xmm0"); __asm__ ("movd %eax, %xmm1"); __asm__ ("punpcklbw %xmm1, %xmm1"); __asm__ ("punpcklbw %xmm1, %xmm1"); __asm__ ("punpcklbw %xmm1, %xmm1"); __asm__ ("punpcklbw %xmm1, %xmm1"); __asm__ ("movd %eax, %xmm2"); __asm__ ("punpcklbw %xmm2, %xmm2"); __asm__ ("punpcklbw %xmm2, %xmm2"); __asm__ ("punpcklbw %xmm2, %xmm2"); __asm__ ("punpcklbw %xmm2, %xmm2"); __asm__ ("movd %eax, %xmm3"); __asm__ ("punpcklbw %xmm3, %xmm3"); __asm__ ("punpcklbw %xmm3, %xmm3"); __asm__ ("punpcklbw %xmm3, %xmm3"); __asm__ ("punpcklbw %xmm3, %xmm3"); __asm__ ("movd %eax, %xmm4"); __asm__ ("punpcklbw %xmm4, %xmm4"); __asm__ ("punpcklbw %xmm4, %xmm4"); __asm__ ("punpcklbw %xmm4, %xmm4"); __asm__ ("punpcklbw %xmm4, %xmm4"); } int main() { char pattern[] = "0123456789"; pattern[9] = 0; const unsigned long plen = strlen(pattern); foo(); size_t hp=0; for (size_t i = 0; i < plen; ++i) hp += pattern[i]; volatile int j = 0; if (hp==j) { j++; } printf("%ld\n", hp); return 1; } Run this with and without foo commented and see that the results are different. No valgrind need. Testing on godbolt, I see that the bug is present even through clang 11.0.1 https://godbolt.org/z/6s1rTd
Hi Eyal Thank you very much for the analysis of the problem! Will you open an LLVM issue?
I'm back to not being sure if this is an llvm issue. There are a few things at play here. One is sigaction(), which can foul up the the contents of xmm registers, especially xmm3, which is the only register that isn't specifically cleared out by the loop. The other issue is clang, which isn't specifically clearing out xmm3 but it sort of is doing thing that happen to clear it out. I'm still investigating. And finally, there is valgrind, which might just not know enough about an operation in order to determine that, yes, it's legal. I feel like I'm making progress. I'll update with more as I go along. If I do find that it is a memcheck problem in clang, is there someone that can help me write a software patch for valgrind?
Okay, now I'm back to thinking that it's a valgrind issue. But it's nothing that valgrind can fix. Here's the problematic asm code: <main+229> movd %edx,%xmm2 (1) <main+233> punpcklbw %xmm2,%xmm2 (2) <main+237> punpcklwd %xmm2,%xmm3 (3) <main+241> movzwl 0xa(%rsp,%rsi,1),%edx <main+246> movd %edx,%xmm2 (4) <main+250> punpcklbw %xmm2,%xmm2 <main+254> punpcklwd %xmm2,%xmm2 <main+258> pxor %xmm4,%xmm4 (5) <main+262> pcmpgtd %xmm3,%xmm4 (6) <main+266> psrad $0x18,%xmm3 This code is some SIMD math that gets made for summing the characters in a string, like in the original code. Before this code, the calls to sigaction have inadvertently fouled up the contents of the xmm registers. That's okay, sigaction is allowed to do that because xmm registers are caller-saved. That means that if the caller wanted them to have valid info, it was up to the caller to save and restore them beforehand. No problem. For the below explanation, we'll use letters (ABCD) for known bytes, X for unknown bytes, and 0 for 0 bytes. (1) is putting a known value into xmm2. So xmm2 is now well-defined as ABCD0000000000000000 (2) is bytewise interleaving the value in xmm2 with itself. So xmm2 is now AABBCCDD00000000. (3) is wordwise (16b) interleaving xmm2 with xmm3. xmm3 is now AAXXBBXXCCXXDDXX. (4) notice that xmm2 has been clobbered with a new value. (5) xmm4 is set to all 0: 0000000000000000 (6) is doing a signed double-word (32-bit) SIMD comparison of xmm3 and xmm4 and putting the result as a 0 or -1 into xmm4. If the xmm4 value is bigger than the xmm3 value, the xmm4 double-word will be filled with ones. Otherwise, zero. The comparison looks like this (MSB-first): 0000 > AAXX ? -1 : 0 0000 > BBXX ? -1 : 0 0000 > CCXX ? -1 : 0 0000 > DDXX ? -1 : 0 Considering the ABCDs: * If they are negative then the MSB is a 1 and zero is greater than all negative numbers so we don't need to look any further. * If they are positive then the whole number is positive and we don't need to look any further. * If they are zero then the number is either 0 or positive. Either way, 0 is not greater than a non-negative number so we don't need to look any further. So as far we're considered, the output here is completely defined! valgrind memcheck doesn't track values, though, only whether or not a value was explicitly defined. So valgrind is seeing this: QRST > AAXX ? -1 : 0 All the letters (other than X) are defined but valgrind's mechanism doesn't keep track of whether or not they are zero. Because it doesn't know that Q is a 0, it might be that QR match AA! In which case, the only way to know the result is to know how ST compares to XX. And XX is unknown so the result is unknown. The "undefinedness" propagates all the way to a branch statement later in the code which valgrind detects and reports. --- In (4) above, we see that xmm2 is clobbered quickly after it's use in (1,2,3). So why use it anyway? Also, lots of the other code doesn't use xmm2 as a scratchpad: <main+246> movd %edx,%xmm2 <main+250> punpcklbw %xmm2,%xmm2 <main+254> punpcklwd %xmm2,%xmm2 <main+305> movd %edx,%xmm0 <main+309> punpcklbw %xmm0,%xmm0 <main+313> punpcklwd %xmm0,%xmm0 <main+322> movd %edx,%xmm1 <main+326> punpcklbw %xmm1,%xmm1 <main+330> punpcklwd %xmm1,%xmm1 So why not do the same here? If I write this code and compile it: void asm_test() { __asm__ ("movd %edx, %xmm2"); __asm__ ("punpcklbw %xmm2, %xmm2"); __asm__ ("punpcklwd %xmm2, %xmm3"); __asm__ ("movd %edx, %xmm3"); __asm__ ("punpcklbw %xmm3, %xmm3"); __asm__ ("punpcklwd %xmm3, %xmm3"); } And then objdump -D on it: 401697: 66 0f 6e d2 movd %edx,%xmm2 40169b: 66 0f 60 d2 punpcklbw %xmm2,%xmm2 40169f: 66 0f 61 da punpcklwd %xmm2,%xmm3 4016a3: 66 0f 6e da movd %edx,%xmm3 4016a7: 66 0f 60 db punpcklbw %xmm3,%xmm3 4016ab: 66 0f 61 db punpcklwd %xmm3,%xmm3 I can see that the instruction sizes are the same. So I can use emacs hexl-mode or xxd or objdump -R to modify the binary and try it. It only took a moment and it solved the problem. Valgrind no longer reports any errors.
https://bugs.llvm.org/show_bug.cgi?id=49372 I hope that LLVM can help us out here! I think that it would be nice.
A possible solution in Valgrind: https://github.com/eyal0/valgrind/commit/899ea491e358013579f87e443beff0a30c69e348 This improves Valgrind's check for definedness when doing 32x4 SIMD signed greater than. It solves the problem for the test case below. If you can, consider testing if it solves the problem for your original code.
Hi Eyal Your patch indeed solves the Valgrind error that occurred in the original program. Really great work, thanks a lot!
Created attachment 136346 [details] patch for expensive greater than comparisons
Interesting analysis, and a plausible patch; thank you for that. This seems like a new trick from LLVM. I'm still struggling to understand what's going on, though. I can see that for (size_t i = 0; i < plen; ++i) hp += pattern[i]; could be vectorised as you say, so that it loads 4 bytes at a time, and uses punpcklbw twice to interleave them as described in comment 12. But: * where's the addition instruction that merges the lanes together? I don't see that. * what is the purpose of the pcmpgtd instruction? The original sources contain a scalar comparison against zero if (hp==j) { j++; } Is that related? If so, how does a scalar 32-bit equality test against zero get translated into a vector 32x4 signed-greater-than operation? --- In the patch, there's mention of biasing: + // From here on out, we're dealing with biased integers instead of 2's + // complement. What does that mean, in this context? Regarding the test: * you put it in memcheck/tests/x86; "x86" here means 32-bit only. Is that what you intended? I would have expected it to go in the "amd64" directory. * because the test is written in C, whether or not it tests what you expect it to test depends entirely on the compiler used to compile it. And most likely, it won't be vectorised, or won't be vectorised in the same way. This kind of test really needs to be written in assembly (inline assembly) so we know what we're testing.
>Interesting analysis, and a plausible patch; thank you for that. This seems >like a new trick from LLVM. Thanks. Yes >* where's the addition instruction that merges the lanes together? I don't > see that. It's just beyond the part of the asm that I put in this post. I didn't include it because it wasn't relevant to the bug. Here you go: 0x0000000000401225 <+213>: movd %edx,%xmm2 0x0000000000401229 <+217>: punpcklbw %xmm2,%xmm2 0x000000000040122d <+221>: punpcklwd %xmm2,%xmm3 0x0000000000401231 <+225>: movzwl 0xa(%rsp,%rsi,1),%edx 0x0000000000401236 <+230>: movd %edx,%xmm2 0x000000000040123a <+234>: punpcklbw %xmm2,%xmm2 0x000000000040123e <+238>: punpcklwd %xmm2,%xmm2 0x0000000000401242 <+242>: pxor %xmm4,%xmm4 0x0000000000401246 <+246>: pcmpgtd %xmm3,%xmm4 0x000000000040124a <+250>: psrad $0x18,%xmm3 0x000000000040124f <+255>: punpckldq %xmm4,%xmm3 0x0000000000401253 <+259>: paddq %xmm0,%xmm3 That last line is the add. >* what is the purpose of the pcmpgtd instruction? The original sources > contain a scalar comparison against zero For sign-extension. Because the numbers are 8-bit characters (char) but the output is 32-bit signed (int), we need to sign extend the char. The comparison of 0>$xmm3 is filling the register with 1s if the number is negative. The following psrad does an arithmetic shift, sign extending the negative to fill the 24-bit difference. You can see the paddq below it. I didn't analyze it too much; I trust the LLVM people know what they're doing. if (hp==j) { j++; } > Is that related? If so, how does a scalar 32-bit equality test against zero > get translated into a vector 32x4 signed-greater-than operation? Later in the code the vector eventually gets collapsed into a single value, like we need. This comparison is what causes the error because memcheck only reports the error when the value gets used in a branch, not when it's created. I could have also returned the sum; that would have worked as well. --- >In the patch, there's mention of biasing: > >+ // From here on out, we're dealing with biased integers instead of 2's >+ // complement. > >What does that mean, in this context? Probably just bad wording on my part. What I should have just said is that the MSB of a signed number is 0 for positive and 1 for negative so to get the max or min value, we need to invert the MSB with xor. Or another way to think about it is that we convert the number space from 2's complement, where the 0x00 is 0 and 0xff is -1, to a "biased" space where 0x00 is -128, 0x7f is -1, 0x80 is 0, 0x81 is 1, all the way up to 0xff which is 127. Each number is stored in bits as it's value +128. Writing numbers that way fixes the problem of 2's complement were a 1 in the MSB is *less* than a 0 in the MSB. I will remove the confusing explanation and instead just mention the MSB thing. >* you put it in memcheck/tests/x86; "x86" here means 32-bit only. Is that > what you intended? I would have expected it to go in the "amd64" directory. I'll fix that. >* because the test is written in C, whether or not it tests what you expect it > to test depends entirely on the compiler used to compile it. And most > likely, it won't be vectorised, or won't be vectorised in the same way. > This kind of test really needs to be written in assembly (inline assembly) > so we know what we're testing. Yes. For that reason I put the binary in the patch! That was bad. I'll use asm instructions. I had a bad time with them when I was debugging but I'll figure it out. Also, I'll be able to test more of the edge cases. There were bugs in the original patch anyway. It's a delicate bit of code. Because it's only for Intel SSE2 processors, if I put the test in the amd64 directory, is that enough to ensure that only the SSE2-enabled chips will run the test?
Comment on attachment 136346 [details] patch for expensive greater than comparisons Obsolete now.
Created attachment 136381 [details] patch with tests
Will anyone consider the patch? There might be a lot of false positives out there now that clang is doing this optimization and it would be nice to catch them before many users waste a lot of time on this.
(In reply to Eyal from comment #21) > Will anyone consider the patch? This (problem) is definitely something I want to deal with, and your patch seems like a good starting place. However, 3.17 is imminent and I would prefer to defer till after 3.17 is done. In part because I'd like to do more general qualifying runs against gcc-11 and clang-11 compiled code. The patch might need to be generalised, so as to make it handle different vector widths, lane widths and even maybe the scalar cases. Regarding the frequency of false positives, this is the first time I have heard of this problem. By comparison, we have many bug reports that relate to (eg) clang/gcc doing equality comparisons on partially defined data. I'd accept also though, that if this is a new optimisation in clang, then we might get more such reports in the future. But right now it doesn't seem critical enough to put into 3.17.
Generalizing the patch to handle different bit widths would be easy. In fact, that's why I wrote it using a switch statement with just one case. To generalize it you just need to add cases into the switch statement. Pretty easy. Another generalization that you didn't mention is covering not just greater-than but also less-than and equality. I didn't code up all those alternatives because there are no such opcodes in x86 SSE2. So even if I did it, I couldn't test it. If there are bug reports around vector equality, let me know! I think that I understand this part of the code pretty well by now and also how to test it so I could be helpful there. Send links. I discovered this bug because the CI system on GitHub upgraded clang to v10 and it broke my testing. As more operating systems upgrade their compiler, there will be more and more of these errors.
Created attachment 136575 [details] patch that supports pcmpgtX for 64/32/16/8 bit This one is more complex but it supports all the different pcmpgt* instructions in x86. The 8-bit one is a little strange, though.
Created attachment 136599 [details] patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates This one uses c++ in the test only. It is a much more thorough test because it tests all lanes of the SIMD register. It uses templates so the code is smaller.
The testcase code needs some cleaning. I get tons ow warnings for the char* arguments that are getting passed const char* literals. Then ere are errors asking for downcasts in the string_to_ functions.
Maybe that's with the c code. I wrote a version in c++ which is uses templates to make the test code much shorter. Let's focus on just one of them so that I don't have to duplicate too much work. Do we want c++ for the test code or only c? Either will work but the c++ is shorter.
C++ is fine. If you are using any features that old compilers might not support then you will need a feture test in configure.ac (this is also true for C functions and assembler).
Okay. I don't think that I'm using any special features of c nor c++... How would I know? Is there some test that I can run? I didn't find any errors when I compiled with -Wall -Werror so I didn't see that string being converted to a char *. Maybe that was with an older patch that wasn't the latest c++ one?
Indeed, I mixed up the C and C++ versions. I just tried the C++ patch on FreeBSD (not yet officially supported). I haven't yet tried on Linux. Here are my results: pcmpgt.cpp:114:18: warning: suggest braces around initialization of subobject [-Wmissing-braces] V128 vx128 = {0}; ^ {} + several similar warnings pcmpgt.cpp:69:7: error: couldn't allocate output register for constraint 'x' "pcmpgt%p[op] %[src],%[dst]" ^ pcmpgt.cpp:69:7: error: couldn't allocate output register for constraint 'x' pcmpgt.cpp:69:7: error: couldn't allocate output register for constraint 'x' pcmpgt.cpp:69:7: error: couldn't allocate output register for constraint 'x' (with FreeBSD clang version 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)) GCC on FreeBSD builds OK but the test fails as follows paulf> cat pcmpgt.stderr.diff --- pcmpgt.stderr.exp 2021-04-22 10:23:23.515674000 +0200 +++ pcmpgt.stderr.out 2021-04-22 10:38:08.450893000 +0200 @@ -10,26 +10,17 @@ by 0x........: void doit<unsigned long>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147) by 0x........: main (pcmpgt.cpp:152) -Use of uninitialised value of size 8 - ... - by 0x........: void doit<unsigned long>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147) - by 0x........: main (pcmpgt.cpp:152) - Conditional jump or move depends on uninitialised value(s) ... by 0x........: void doit<unsigned long>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147) by 0x........: main (pcmpgt.cpp:152) -Conditional jump or move depends on uninitialised value(s) +Syscall param write(buf) points to uninitialised byte(s) ... by 0x........: void doit<unsigned long>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147) by 0x........: main (pcmpgt.cpp:152) + Address 0x........ is on thread 1's stack -Conditional jump or move depends on uninitialised value(s) - ... - by 0x........: void doit<unsigned long>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147) - by 0x........: main (pcmpgt.cpp:152) - xxxxxxxxxxxxxxxx > xxxxxxxxxxxxxxxx == 0, completely undefined, error above, 1 == 1 xxxxxxxxxxxxxxxx > xxxxxxxxxxxxxxxx == 0, completely undefined, error above, 1 == 1 0000000000000000 > 0000000000000000 == 0, completely defined, 0 == 0 @@ -210,6 +201,16 @@ Address 0x........ is on thread 1's stack in frame #0, created by void doit<unsigned short>(char const*, char const*, bool, char const*) (pcmpgt.cpp:138) +Conditional jump or move depends on uninitialised value(s) + ... + by 0x........: void doit<unsigned short>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147) + by 0x........: main (pcmpgt.cpp:186) + +Conditional jump or move depends on uninitialised value(s) + ... + by 0x........: void doit<unsigned short>(char const*, char const*, bool, char const*) (pcmpgt.cpp:147) + by 0x........: main (pcmpgt.cpp:186) + xxx0 > f000 == 65535, undefined, error above, 1 == 1 xxx0 > f000 == 65535, undefined, error above, 1 == 1 xxx0 > f000 == 65535, undefined, error above, 1 == 1 @@ -519,4 +520,4 @@ Use --track-origins=yes to see where uninitialised values come from For lists of detected and suppressed errors, rerun with: -s -ERROR SUMMARY: 848 errors from 21 contexts (suppressed: 0 from 0) +ERROR SUMMARY: 592 errors from 21 contexts (suppressed: 0 from 0)
Created attachment 137990 [details] c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates This one compiles with both g++ and clang++, no warnings, no errors, passes tests in both cases
How does the latest code look? Are all the warnings gone now? Passes tests on FreeBSD?
Compiles cleanly. The fast fails. The diffs are mainly out-by-one differences in the line numbers. Finally there's also a difference in the allocation summary/.
Huh. I don't have any errors on my computer with it. Can you send an example error from the test? Maybe the location of the conditional jump depends on the platform...?
Created attachment 138226 [details] c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates
Oh, I see it now. I must have forgotten to recompile. I've sent the new patch.
Any further comments? This is fixing a real false positive that users of clang will experience with reasonable optimization levels.
I'll restest this weekend and see if I can get Julian Seward's opinion.
The build and the expected now look almost fine on FreeBSD. Just the one small diff paulf> diff pcmpgt.stderr.exp pcmpgt.stderr.out 1598c1598 < total heap usage: 1 allocs, 1 frees, 72,704 bytes allocated --- > total heap usage: 0 allocs, 0 frees, 0 bytes allocated This is probably a libc difference and can be filtered with the filter_allocs script in the parent directory.
Created attachment 139036 [details] c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates filter allocs for FreeBSD
Got it done! Thank you for testing out the code. Try again?
(In reply to Eyal from comment #41) > Got it done! Thank you for testing out the code. Try again? On FreeBSD: paulf> perl tests/vg_regtest memcheck/tests/amd64/pcmpgt pcmpgt: valgrind ./pcmpgt -q == 1 test, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures == Also tested and looks god on Fedora 33,
As a general note, I have been investigating unsigned < and <= comparisons on scalar values for arm64, and expect to continue doing so as a background task for the next few weeks. So, this is not forgotten. Ideally we can arrive at a cheap and effective solution that works for all targets.
I meant, unsigned < and <= scalar comparisons on undefined data. Duh.
Sure, let me know if I can be of assistance. This was an actual issue for me with false positives in boost in newer clang. Boost is quite popular so I imagine that as more people upgrade to newer clang, maybe more of them will hit these errors.
Is there a chance that the available fix for this issue will make it into the upcoming Valgrind version (3.18 I assume)?
Any progress here? I see the valgrind is getting new commits all the time.
Sorry but people committing aren't necessarily the ones able to review this patch.
I'll use a try branch for this patch.
(In reply to Paul Floyd from comment #49) > I'll use a try branch for this patch. I get two failures, both x86 testcases. == 829 tests, 2 stderr failures, 2 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures == memcheck/tests/x86/insn_sse2 (stdout) memcheck/tests/x86/insn_sse2 (stderr) memcheck/tests/x86/sse2_memory (stdout) memcheck/tests/x86/sse2_memory (stderr) --- sse2_memory.stderr.exp 2020-02-06 13:24:23.300142000 +0100 +++ sse2_memory.stderr.out 2023-09-03 10:43:50.421658000 +0200 @@ -0,0 +1,34 @@ +iselVecExpr (hwcaps = x86-mmxext-sse1-sse2-sse3): can't reduce +ShlN8x16(V128{0x........},0x........:I8) +vex: the `impossible' happened: + iselVecExpr_wrk +vex storage: T total 182269004 bytes allocated +vex storage: P total 496 bytes allocated + +valgrind: the 'impossible' happened: + LibVEX called failure_exit(). + +host stacktrace: + ... + +sched status: + running_tid=1 + +Thread 1: status = VgTs_Runnable (lwpid 102046) + ... +client stack range: [0x........ 0x........] client SP: 0x........ +valgrind stack range: [0x........ 0x........] top usage: 4280 of 1048 --- insn_sse2.stderr.exp 2020-02-06 13:24:23.837082000 +0100 +++ insn_sse2.stderr.out 2023-09-03 10:43:44.391369000 +0200 @@ -0,0 +1,35 @@ +iselVecExpr (hwcaps = x86-mmxext-sse1-sse2-sse3): can't reduce +ShlN8x16(V128{0x........},0x........:I8) +vex: the `impossible' happened: + iselVecExpr_wrk +vex storage: T total 252507324 bytes allocated +vex storage: P total 496 bytes allocated + +valgrind: the 'impossible' happened: + LibVEX called failure_exit(). + +host stacktrace: + ... + +sched status: + running_tid=1 + +Thread 1: status = VgTs_Runnable (lwpid 102046) + at 0x........: pcmpgtb_1 (insn_sse2.c:7284) + by 0x........: main (insn_sse2.c:15347) +client stack range: [0x........ 0x........] client SP: 0x........ +valgrind stack range: [0x........ 0x........] top usage: 4280 of 1048576
I'll take a look. I gotta say, this patch is super annoying. I've been working on it for 2.5 years now? Instead of github, emails. And no CI. :-( I've resorted to my own personal branch of valgrind with all my fixes. I'll make one last push here.
Y'all need CI, I can't even build the master branch. :-(
(In reply to Eyal from comment #52) > Y'all need CI, I can't even build the master branch. :-( We do have CI against various distros and arches: https://builder.sourceware.org/buildbot/#/builders?tags=valgrind And although some tests are failing it does seem to build everywhere. What is wrong with the build for you? How does it fail?
(In reply to Mark Wielaard from comment #53) > (In reply to Eyal from comment #52) > > Y'all need CI, I can't even build the master branch. :-( > > We do have CI against various distros and arches: > https://builder.sourceware.org/buildbot/#/builders?tags=valgrind > And although some tests are failing it does seem to build everywhere. > What is wrong with the build for you? How does it fail? Good to know that there is CI. I'm bisecting the build failure now...
(In reply to Eyal from comment #54) > (In reply to Mark Wielaard from comment #53) > > (In reply to Eyal from comment #52) > > > Y'all need CI, I can't even build the master branch. :-( > > > > We do have CI against various distros and arches: > > https://builder.sourceware.org/buildbot/#/builders?tags=valgrind > > And although some tests are failing it does seem to build everywhere. > > What is wrong with the build for you? How does it fail? > > Good to know that there is CI. > > I'm bisecting the build failure now... Alright, it was just that I forgot to run autogen.sh. It's fine now. I can't get those failing sse tests to even build. Is it possible that my reasonably-modern computer doesn't support sse2? When I run make in the directory memcheck/tests/x86, nothing gets built! How do I fix this?
I was able to get the tests to build and pass on master by running `sudo apt-get install libc6-dbg:i386` and `sudo apt-get remove libc6-dbg`. I guess that this is an x86 and not amd64 problem only? I'm able to reproduce the problem now and I'll try to figure out what's going on!
I've identified the problem and I hope to have a solution, soon. It is based on a similar solution for AMD64. Both platforms to not have an xmm shift with lane size of 8. I'll update the patch and also add test cases for x86.
Created attachment 161402 [details] c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates I fixed support for x86. To test for x86 on an amd64 computer, run these: sudo apt-get install gcc-multilib g++-multilib sudo apt-get install libc6-dbg:i386 sudo apt-get remove libc6-dbg And then recompile and run tests.
You forgot to update the expecteds (I saw that just the line numbers have changed), and I had to convert the text .cpp files into symlinks (does Linux / GCC accept a plain text .cpp file that just contains a path?). On FreeBSD 13.2 amd64 with clang 14 I get == 830 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
Created attachment 161418 [details] c++ patch that supports pcmpgtX for 64/32/16/8 bit testing all lanes and using templates Re-made the amd64 expected output.
(In reply to Paul Floyd from comment #59) > You forgot to update the expecteds (I saw that just the line numbers have > changed), and I had to convert the text .cpp files into symlinks (does Linux > / GCC accept a plain text .cpp file that just contains a path?). > > On FreeBSD 13.2 amd64 with clang 14 I get > > == 830 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 > stdoutB failures, 0 post failures == Right you are! I made the patch again: https://bugs.kde.org/attachment.cgi?id=161418 It is normal for git to store a symlink as a small file with just a path in it. Did you apply the patch using `git am` or using `patch`? I created the patch using `git format-patch` and if you apply it with `git am` then it should properly handle the symlink.
OK didn't know that (or had forgotten). I think I tried git patch which didn't like the formatted patch and then my usual fallback 'patch -p1 < patchfile'
Created attachment 161501 [details] attachment-882777-0.html I think that you could have used 'git apply' also. Anyway, I hope that the tests are all passing now and perhaps we can make progress on the patch? Eyal On Wed, Sep 6, 2023, 23:39 Paul Floyd <bugzilla_noreply@kde.org> wrote: > https://bugs.kde.org/show_bug.cgi?id=432801 > > --- Comment #62 from Paul Floyd <pjfloyd@wanadoo.fr> --- > OK didn't know that (or had forgotten). I think I tried git patch which > didn't > like the formatted patch and then my usual fallback 'patch -p1 < patchfile' > > -- > You are receiving this mail because: > You are on the CC list for the bug.
> I think that you could have used 'git apply' also. Anyway, I hope that the > tests are all passing now and perhaps we can make progress on the patch? It's next on my list! Been very busy this weekend updating one of the FreeBSD packages and fixing an issue with aligned_alloc and glibc 2.38.
Created attachment 161619 [details] attachment-395894-0.html Yeah, I saw that you have been busy with that! Do you prefer that I go through the process of pushing a branch for CI? I need to get added with some permissions? Eyal On Sun, Sep 10, 2023, 10:02 Paul Floyd <bugzilla_noreply@kde.org> wrote: > https://bugs.kde.org/show_bug.cgi?id=432801 > > --- Comment #64 from Paul Floyd <pjfloyd@wanadoo.fr> --- > > > I think that you could have used 'git apply' also. Anyway, I hope that > the > > tests are all passing now and perhaps we can make progress on the patch? > > It's next on my list! > > Been very busy this weekend updating one of the FreeBSD packages and > fixing an > issue with aligned_alloc and glibc 2.38. > > -- > You are receiving this mail because: > You are on the CC list for the bug.
I've run the CI tests, all OK. I'll change the commit message to match typical bugzilla fixes. Do you want me to keep Author: eyal0 <109809+eyal0@users.noreply.github.com> ? If you would like to contribute more to Valgrind then let us know. I think Mark and Julian will also agree.
Many thanks! commit 26a3abd27db2ef63dafea1ecab00e8239f341f0f (HEAD -> master, origin/master, origin/HEAD) Author: Eyal Soha <eyalsoha@gmail.com> Date: Thu Feb 10 09:52:54 2022 -0700 Bug 432801 - Valgrind 3.16.1 reports a jump based on uninitialized memory somehow related to clang and signals Add support for precise computation of SIMD greater-than on amd64 and x86. This adds support for 64bit, 16bit, and 8bit to the existing 32bit support.
Created attachment 161833 [details] attachment-1510033-0.html Thanks for doing this! Now this one: https://bugs.kde.org/show_bug.cgi?id=474160 ? Eyal On Tue, Sep 19, 2023 at 11:33 AM Paul Floyd <bugzilla_noreply@kde.org> wrote: > https://bugs.kde.org/show_bug.cgi?id=432801 > > Paul Floyd <pjfloyd@wanadoo.fr> changed: > > What |Removed |Added > > ---------------------------------------------------------------------------- > Resolution|--- |FIXED > Status|REPORTED |RESOLVED > > --- Comment #67 from Paul Floyd <pjfloyd@wanadoo.fr> --- > Many thanks! > > commit 26a3abd27db2ef63dafea1ecab00e8239f341f0f (HEAD -> master, > origin/master, > origin/HEAD) > Author: Eyal Soha <eyalsoha@gmail.com> > Date: Thu Feb 10 09:52:54 2022 -0700 > > Bug 432801 - Valgrind 3.16.1 reports a jump based on uninitialized > memory > somehow related to clang and signals > > Add support for precise computation of SIMD greater-than on > amd64 and x86. > > This adds support for 64bit, 16bit, and 8bit to the existing 32bit > support. > > -- > You are receiving this mail because: > You are on the CC list for the bug.