valgrind reports a false-positive conditional jump dependence on an uninitialized value in the function ._ZN4llvm7APFloat6assignERKS0_; I've simplified and annotated some assembly code below, but first, I think it is easier to understand the underlying problem in terms of the LLVM IR transformation being performed (which I've also annotated): As part of the optimization pipeline, LLVM transforms this function: define linkonce_odr hidden zeroext i1 @_ZNK4llvm7APFloat10isInfinityEv(%"class.llvm::APFloat"* %this) #0 align 2 { entry: %category = getelementptr inbounds %"class.llvm::APFloat"* %this, i32 0, i32 3 %bf.load = load i8* %category, align 2 ; load some byte (when executed, the lower 4 bits are undefined) %bf.lshr = lshr i8 %bf.load, 5 ; shift right by 5 (to isolate the upper three bits) %bf.cast = zext i8 %bf.lshr to i32 ; zero extend the result %cmp = icmp eq i32 %bf.cast, 0 ; compare to zero ret i1 %cmp } into this function: define linkonce_odr hidden zeroext i1 @_ZNK4llvm7APFloat10isInfinityEv(%"class.llvm::APFloat"* %this) #0 align 2 { entry: %category = getelementptr inbounds %"class.llvm::APFloat"* %this, i64 0, i32 3 %bf.load = load i8* %category, align 2 ; loading some byte (when executed, the lower 4 bits are undefined) %cmp = icmp ult i8 %bf.load, 32 ; this is equivalent to asking if the upper three bits are zero ret i1 %cmp } This code is isolating a bit field, stored as the upper three bits in some byte, and comparing them to zero. The optimization eliminates the shift by instead comparing the total byte value to 32. Even though the lower 4 bits are undefined, their value does not affect the result of the comparison (because only the upper defined bits control whether the value is less than 32 or not). In assembly, in a context similar to that in the real code, looks like this: # BB#0: # %entry lbz 5, 18(4) lbz 6, 18(3) rlwinm 5, 5, 0, 27, 27 <--- isolate rhs sign bit rlwinm 6, 6, 0, 28, 26 <--- zero out this sign bit rlwimi 6, 5, 0, 27, 27 <--- insert rhs sign bit into this stb 6, 18(3) <--- store result rlwinm 5, 6, 0, 27, 31 <--- isolate bits [27,31] of this ([24,27] are defined, [28,31] are undefined) lbz 12, 18(4) rlwinm 4, 12, 0, 0, 26 <--- isolate bits [0,26] from rhs, all are defined (only [24,26] matter, == category) rlwimi 4, 5, 0, 27, 31 <--- insert bits [27,31] of this (of which [28,31] are undefined) into above rlwinm 5, 4, 0, 24, 31 <--- zero out [0,23] for comparison, this leaves bits [24,31] of which [28,31] are undefined stb 4, 18(3) cmplwi 0, 5, 32 <--- valgrind incorrectly complains about this jump! blt 0, .LBB0_2 In the annotations, I'm using IBM's convention for bit numbering. Even though there are undefined bits in r5 when 'cmplwi 0, 5, 32' is executed, not *all* of the resulting condition-register bits are undefined, and the 'blt 0, .LBB0_2' does not depend on any of those undefined bits. Reproducible: Always Steps to Reproduce: 1. Download and compile trunk LLVM/Clang with itself using -O3 -mcpu=ppc64 2. Run the APFloat unit test under valgrind (APFloatTest.next in particular) Actual Results: ==2394== Conditional jump or move depends on uninitialised value(s) ==2394== at 0x10290070: llvm::APFloat::assign(llvm::APFloat const&) (in build.stage2/unittests/ADT/Release+Asserts/ADTTests) ==2394== by 0x1029048B: llvm::APFloat::operator=(llvm::APFloat const&) (in build.stage2/unittests/ADT/Release+Asserts/ADTTests) ==2394== by 0x1008CC6F: (anonymous namespace)::APFloatTest_next_Test::TestBody() (in build.stage2/unittests/ADT/Release+Asserts/ADTTests) ... Expected Results: This is a false positive as explained above.
It had been suggested to me that this problem might not be ppc specific; I've two comments on that: 1. The same code will not have the same problem on x86, because on x86 the bit fields are laid out low-bit first (and so the undefined bits will be the high bits and not the low bits as on ppc. 2. I wrote a simple test in an attempt to reproduce the problem on x86: valgrind does not report a false positive on the following code when compiled with Clang trunk on x86 (even though the optimizers do the same optimization that proves problematic to valgrind on ppc): #include <stdio.h> struct s { #if defined(__ppc__) || defined(__ppc64__) unsigned h : 3; unsigned l : 5; #else unsigned l : 5; unsigned h : 3; #endif }; void __attribute__ ((noinline)) init(struct s *v) { v->h = 0; } void __attribute__ ((noinline)) foo(struct s *v) { if (v->h == 0) { printf("yep, it is zero\n"); } } int main() { struct s v; init(&v); foo(&v); return 0; } On the other hand, when run on pp64, valgrind does (incorrectly) complain: ==45300== Conditional jump or move depends on uninitialised value(s) ==45300== at 0x10000578: foo (bf2.c:18) ==45300== by 0x100005D7: main (bf2.c:26)
Julian, would you (or someone else knowledgeable with memcheck) have any pointers on where to start looking at this? Thanks.
> 2. I wrote a simple test in an attempt to reproduce the problem on x86: > valgrind does not report a false positive on the following code when > compiled with Clang trunk on x86 (even though the optimizers do the same > optimization that proves problematic to valgrind on ppc): On x86_64 it generates a test instruction (AND, set the flags, throw away the result of the AND) which tests just the bits in question. That's why. 0000000000400580 <foo>: 400580: f6 07 e0 testb $0xe0,(%rdi) 400583: 74 01 je 400586 <foo+0x6> 400585: c3 retq 400586: bf 70 06 40 00 mov $0x400670,%edi 40058b: e9 c0 fe ff ff jmpq 400450 <puts@plt>
gcc-4.7.2 that is installed on gcc110.fsffrance.org generates this: 0000000010000610 <.foo>: 10000610: 81 23 00 00 lwz r9,0(r3) 10000614: 75 2a 07 00 andis. r10,r9,1792 10000618: 4c 82 00 20 bnelr which is the same scheme as clang uses for x86_64. Hence no complaint. I don't have access to a ppc box with clang installed, so I can't repro the problem. Hmm.
Changing foo thusly does trigger it on ppc64 though: void __attribute__ ((noinline)) foo(struct s *v) { unsigned char* p = (unsigned char*)v; if (*p < 32) { printf("yep, it is zero\n"); } } giving 0000000010000610 <.foo>: 10000610: 89 23 00 00 lbz r9,0(r3) 10000614: 2b 89 00 1f cmplwi cr7,r9,31 10000618: 4d 9d 00 20 bgtlr cr7 post-cleanup, uninstrumented IR looks like this: # t2 holds value of R3 ------ IMark(0x10000610, 4, 0) ------ t20 = 8Uto64(LDbe:I8(t2)) # t20 = *(unsigned char*)(R3) PUT(88) = t20 # R9 = t20 ------ IMark(0x10000614, 4, 0) ------ t22 = 32to8(CmpORD32U(64to32(t20),0x1F:I32)) # t22<3:1> = "t20 <u 31" In memcheck/mc_translate.c, there is already special-casing for the definedness-flow instrumentation for CmpORD{32,64}{U,S}, which is how V internally represents the PPC comparison operations. It is special cased for the the 2nd argument being zero. Look for this /* --------- Semi-accurate interpretation of CmpORD. --------- */ It might be possible to extend the special-casing to situations where the 2nd argument is exactly a power of 2. That would cover this case. Note that we have repeatedly said that Memcheck's definedness tracking may be confused by highly optimised code, even if in practice it does pretty well. Official advice is not to use it on code generated at -O2 or above.
(In reply to comment #3) > > > 2. I wrote a simple test in an attempt to reproduce the problem on x86: > > valgrind does not report a false positive on the following code when > > compiled with Clang trunk on x86 (even though the optimizers do the same > > optimization that proves problematic to valgrind on ppc): > > On x86_64 it generates a test instruction (AND, set the flags, throw > away the result of the AND) which tests just the bits in question. > That's why. > > 0000000000400580 <foo>: > 400580: f6 07 e0 testb $0xe0,(%rdi) > 400583: 74 01 je 400586 <foo+0x6> > 400585: c3 retq > 400586: bf 70 06 40 00 mov $0x400670,%edi > 40058b: e9 c0 fe ff ff jmpq 400450 <puts@plt> When I compile the test on x86_64 using clang trunk r200705, I get this: init: # @init .cfi_startproc # BB#0: # %entry andb $31, (%rdi) retq ... foo: # @foo .cfi_startproc # BB#0: # %entry movzbl (%rdi), %eax cmpl $31, %eax ja .LBB1_1 # BB#2: # %if.then movl $.Lstr, %edi jmp puts # TAILCALL .LBB1_1: # %if.end retq which, as best that I can tell, is really the same thing that clang is doing on ppc (except that valgrind is fine with the x86_64 version).
> Note that we have repeatedly said that Memcheck's definedness tracking > may be confused by highly optimised code, even if in practice it does > pretty well. Official advice is not to use it on code generated at > -O2 or above. As far as I can tell, Clang/LLVM perform this transformation at all optimization levels. (-O1 does the same thing, on both ppc and x86_64).
Properly fixing bug 352364 (https://bugs.kde.org/show_bug.cgi?id=352364) will go a long way to fixing this one too. They are not exact duplicates, though.