330617 – ppc false positive conditional jump depends on uninitialised value

Bug 330617 - ppc false positive conditional jump depends on uninitialised value

Summary: ppc false positive conditional jump depends on uninitialised value

Status:	REPORTED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	3.9.0
Platform:	Compiled Sources Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-01-31 17:18 UTC by Hal Finkel
Modified:	2016-08-16 06:56 UTC (History)
CC List:	3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Hal Finkel 2014-01-31 17:18:28 UTC

valgrind reports a false-positive conditional jump dependence on an uninitialized value in the function ._ZN4llvm7APFloat6assignERKS0_; I've simplified and annotated some assembly code below, but first, I think it is easier to understand the underlying problem in terms of the LLVM IR transformation being performed (which I've also annotated):

As part of the optimization pipeline, LLVM transforms this function:

define linkonce_odr hidden zeroext i1 @_ZNK4llvm7APFloat10isInfinityEv(%"class.llvm::APFloat"* %this) #0 align 2 {
entry:
  %category = getelementptr inbounds %"class.llvm::APFloat"* %this, i32 0, i32 3
  %bf.load = load i8* %category, align 2   ; load some byte (when executed, the lower 4 bits are undefined)
  %bf.lshr = lshr i8 %bf.load, 5                   ; shift right by 5 (to isolate the upper three bits)
  %bf.cast = zext i8 %bf.lshr to i32           ; zero extend the result
  %cmp = icmp eq i32 %bf.cast, 0            ; compare to zero
  ret i1 %cmp
}

into this function:

define linkonce_odr hidden zeroext i1 @_ZNK4llvm7APFloat10isInfinityEv(%"class.llvm::APFloat"* %this) #0 align 2 {
entry:
  %category = getelementptr inbounds %"class.llvm::APFloat"* %this, i64 0, i32 3
  %bf.load = load i8* %category, align 2 ; loading some byte  (when executed, the lower 4 bits are undefined)
  %cmp = icmp ult i8 %bf.load, 32           ; this is equivalent to asking if the upper three bits are zero
  ret i1 %cmp
}

This code is isolating a bit field, stored as the upper three bits in some byte, and comparing them to zero. The optimization eliminates the shift by instead comparing the total byte value to 32. Even though the lower 4 bits are undefined, their value does not affect the result of the comparison (because only the upper defined bits control whether the value is less than 32 or not).

In assembly, in a context similar to that in the real code, looks like this:

# BB#0:                                 # %entry
        lbz 5, 18(4)
        lbz 6, 18(3)
        rlwinm 5, 5, 0, 27, 27   <--- isolate rhs sign bit
        rlwinm 6, 6, 0, 28, 26   <--- zero out this sign bit
        rlwimi 6, 5, 0, 27, 27   <--- insert rhs sign bit into this
        stb 6, 18(3)             <--- store result
        rlwinm 5, 6, 0, 27, 31   <--- isolate bits [27,31] of this ([24,27] are defined, [28,31] are undefined)
        lbz 12, 18(4)
        rlwinm 4, 12, 0, 0, 26   <--- isolate bits [0,26] from rhs, all are defined (only [24,26] matter, == category)
        rlwimi 4, 5, 0, 27, 31   <--- insert bits [27,31] of this (of which [28,31] are undefined) into above
        rlwinm 5, 4, 0, 24, 31   <--- zero out [0,23] for comparison, this leaves bits [24,31] of which [28,31] are undefined
        stb 4, 18(3)
        cmplwi 0, 5, 32           <--- valgrind incorrectly complains about this jump!
        blt 0, .LBB0_2

In the annotations, I'm using IBM's convention for bit numbering. Even though there are undefined bits in r5 when 'cmplwi 0, 5, 32' is executed, not *all* of the resulting condition-register bits are undefined, and the 'blt 0, .LBB0_2' does not depend on any of those undefined bits.


Reproducible: Always

Steps to Reproduce:
1. Download and compile trunk LLVM/Clang with itself using -O3 -mcpu=ppc64
2. Run the APFloat unit test under valgrind (APFloatTest.next in particular)
Actual Results:  
==2394== Conditional jump or move depends on uninitialised value(s)
==2394==    at 0x10290070: llvm::APFloat::assign(llvm::APFloat const&) (in build.stage2/unittests/ADT/Release+Asserts/ADTTests)
==2394==    by 0x1029048B: llvm::APFloat::operator=(llvm::APFloat const&) (in build.stage2/unittests/ADT/Release+Asserts/ADTTests)
==2394==    by 0x1008CC6F: (anonymous namespace)::APFloatTest_next_Test::TestBody() (in build.stage2/unittests/ADT/Release+Asserts/ADTTests)
...

Expected Results:  
This is a false positive as explained above.

Comment 1 Hal Finkel 2014-01-31 18:07:55 UTC

It had been suggested to me that this problem might not be ppc specific; I've two comments on that:

1. The same code will not have the same problem on x86, because on x86 the bit fields are laid out low-bit first (and so the undefined bits will be the high bits and not the low bits as on ppc.

2. I wrote a simple test in an attempt to reproduce the problem on x86: valgrind does not report a false positive on the following code when compiled with Clang trunk on x86 (even though the optimizers do the same optimization that proves problematic to valgrind on ppc):

#include <stdio.h>

struct s {
#if defined(__ppc__) || defined(__ppc64__)
  unsigned h : 3;
  unsigned l : 5;
#else
  unsigned l : 5;
  unsigned h : 3;
#endif
};

void __attribute__ ((noinline)) init(struct s *v) {
  v->h = 0;
}

void __attribute__ ((noinline)) foo(struct s *v) {
  if (v->h == 0) {
    printf("yep, it is zero\n");
  }
}

int main() {
  struct s v;
  init(&v);
  foo(&v);
  return 0;
}

On the other hand, when run on pp64, valgrind does (incorrectly) complain:

==45300== Conditional jump or move depends on uninitialised value(s)
==45300==    at 0x10000578: foo (bf2.c:18)
==45300==    by 0x100005D7: main (bf2.c:26)

Comment 2 Maynard Johnson 2014-01-31 23:41:26 UTC

Julian, would you (or someone else knowledgeable with memcheck) have any pointers on where to start looking at this?  Thanks.

Comment 3 Julian Seward 2014-02-10 13:45:07 UTC

> 2. I wrote a simple test in an attempt to reproduce the problem on x86:
> valgrind does not report a false positive on the following code when
> compiled with Clang trunk on x86 (even though the optimizers do the same
> optimization that proves problematic to valgrind on ppc):

On x86_64 it generates a test instruction (AND, set the flags, throw
away the result of the AND) which tests just the bits in question.
That's why.

0000000000400580 <foo>:
  400580:       f6 07 e0                testb  $0xe0,(%rdi)
  400583:       74 01                   je     400586 <foo+0x6>
  400585:       c3                      retq   
  400586:       bf 70 06 40 00          mov    $0x400670,%edi
  40058b:       e9 c0 fe ff ff          jmpq   400450 <puts@plt>

Comment 4 Julian Seward 2014-02-10 13:53:00 UTC

gcc-4.7.2 that is installed on gcc110.fsffrance.org generates this:

0000000010000610 <.foo>:
    10000610:   81 23 00 00     lwz     r9,0(r3)
    10000614:   75 2a 07 00     andis.  r10,r9,1792
    10000618:   4c 82 00 20     bnelr   

which is the same scheme as clang uses for x86_64.  Hence no complaint.

I don't have access to a ppc box with clang installed, so I can't repro the
problem.  Hmm.

Comment 5 Julian Seward 2014-02-10 14:10:07 UTC

Changing foo thusly does trigger it on ppc64 though:

void __attribute__ ((noinline)) foo(struct s *v) {
  unsigned char* p = (unsigned char*)v;
  if (*p < 32) { printf("yep, it is zero\n"); }
}

giving

0000000010000610 <.foo>:
    10000610:   89 23 00 00     lbz     r9,0(r3)
    10000614:   2b 89 00 1f     cmplwi  cr7,r9,31
    10000618:   4d 9d 00 20     bgtlr   cr7

post-cleanup, uninstrumented IR looks like this:

   # t2 holds value of R3
   ------ IMark(0x10000610, 4, 0) ------
   t20 = 8Uto64(LDbe:I8(t2))    # t20 = *(unsigned char*)(R3)
   PUT(88) = t20                # R9 = t20
   ------ IMark(0x10000614, 4, 0) ------
   t22 = 32to8(CmpORD32U(64to32(t20),0x1F:I32))
                                # t22<3:1> = "t20 <u 31"

In memcheck/mc_translate.c, there is already special-casing for the
definedness-flow instrumentation for CmpORD{32,64}{U,S}, which is how
V internally represents the PPC comparison operations.  It is special
cased for the the 2nd argument being zero.  Look for this

/* --------- Semi-accurate interpretation of CmpORD. --------- */

It might be possible to extend the special-casing to situations where
the 2nd argument is exactly a power of 2.  That would cover this case.

Note that we have repeatedly said that Memcheck's definedness tracking
may be confused by highly optimised code, even if in practice it does
pretty well.  Official advice is not to use it on code generated at
-O2 or above.

Comment 6 Hal Finkel 2014-02-10 18:50:04 UTC

(In reply to comment #3)
> 
> > 2. I wrote a simple test in an attempt to reproduce the problem on x86:
> > valgrind does not report a false positive on the following code when
> > compiled with Clang trunk on x86 (even though the optimizers do the same
> > optimization that proves problematic to valgrind on ppc):
> 
> On x86_64 it generates a test instruction (AND, set the flags, throw
> away the result of the AND) which tests just the bits in question.
> That's why.
> 
> 0000000000400580 <foo>:
>   400580:       f6 07 e0                testb  $0xe0,(%rdi)
>   400583:       74 01                   je     400586 <foo+0x6>
>   400585:       c3                      retq   
>   400586:       bf 70 06 40 00          mov    $0x400670,%edi
>   40058b:       e9 c0 fe ff ff          jmpq   400450 <puts@plt>

When I compile the test on x86_64 using clang trunk r200705, I get this:

init:                                   # @init
	.cfi_startproc
# BB#0:                                 # %entry
	andb	$31, (%rdi)
	retq
...
foo:                                    # @foo
	.cfi_startproc
# BB#0:                                 # %entry
	movzbl	(%rdi), %eax
	cmpl	$31, %eax
	ja	.LBB1_1
# BB#2:                                 # %if.then
	movl	$.Lstr, %edi
	jmp	puts                    # TAILCALL
.LBB1_1:                                # %if.end
	retq

which, as best that I can tell, is really the same thing that clang is doing on ppc (except that valgrind is fine with the x86_64 version).

Comment 7 Hal Finkel 2014-02-10 22:43:30 UTC

> Note that we have repeatedly said that Memcheck's definedness tracking
> may be confused by highly optimised code, even if in practice it does
> pretty well.  Official advice is not to use it on code generated at
> -O2 or above.

As far as I can tell, Clang/LLVM perform this transformation at all optimization levels. (-O1 does the same thing, on both ppc and x86_64).

Comment 8 Julian Seward 2016-08-16 06:56:22 UTC

Properly fixing bug 352364 (https://bugs.kde.org/show_bug.cgi?id=352364)
will go a long way to fixing this one too.  They are not exact duplicates, though.