Bug 242137

Summary: Valgrind on LLVM compiled code reports tons of 'Conditional jump or move depends on uninitialised' false positives
Product: [Developer tools] valgrind Reporter: clattner
Component: generalAssignee: Julian Seward <jseward>
Status: REPORTED ---    
Severity: normal CC: daniel, sbergman
Priority: NOR    
Version: 3.4 SVN   
Target Milestone: ---   
Platform: Unlisted Binaries   
OS: macOS   
Latest Commit: Version Fixed In:
Attachments: Clang output for APFloat derived test case.
Clang i386 output for APFloat derived example (eliminates EH cruft)

Description clattner 2010-06-19 08:17:06 UTC
Version:           3.4 SVN (using Devel) 
OS:                OS X

The LLVM code generator will emit logical or operations as adds when it can tell that it is safe to do so.  This is useful because adds can often be folded into addressing modes, turned into lea to avoid a copy etc.  It is safe to turn an 'or' into an 'add' when the code generator knows that intersection of the bits that might be set on the LHS and RHS of the or are guaranteed to be empty (no common bits).  This is really really common in bitfield code.

The problem is that valgrind is very precise about reasoning about uninitialized bits flowing through an 'or' but assumes that any uninitialized bits coming into an 'add' make the entire result undefined.  This results in a ton of false positives in LLVM compiled code.  Here is a silly example:

void f2(DeclaratorChunk::FunctionTypeInfo &IFun, bool hasProto, unsigned TypeQuals) {
...
  IFun.TypeQuals        = TypeQuals;
}

This bitfield insertion (of 3 bits) compiles to this x86 machine code:

	movl	12(%esp), %ecx
	shll	$2, %ecx
	andl	$28, %ecx
	movl	(%eax), %edx
	andl	$-29, %edx
	addl	%ecx, %edx
	movl	%edx, (%eax)

Note the use of 'addl' and that the left hand and right hand of the add obviously cannot share common bits.  The problem with this is that it will produce tons of false positives from valgrind, swamping the (extremely) useful output from the tool.

Is it possible to enhance valgrind to accurately track uninitialized bits through an add?


Reproducible: Always

Steps to Reproduce:
Here's a silly little example that illustrates the problem.  Please let me know if you need anything else:

$ cat t.c 
#include <stdio.h>

struct foo {
  int x : 4;
  int y : 2;
};

void set(struct foo *F, int x) {
  F->y = x;
}

int main(int argc, char **argv) { 
  struct foo A;
  set(&A, 42);
  printf("hello %d\n", A.y);
}


$ clang t.c -O1 -m32 -S -o -
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_set
	.align	4, 0x90
_set:                                   ## @set
## BB#0:                                ## %entry
	pushl	%ebp
	movl	%esp, %ebp
	movl	12(%ebp), %eax
	shll	$4, %eax
	andl	$48, %eax
	movl	8(%ebp), %ecx
	movl	(%ecx), %edx
	andl	$-49, %edx
	addl	%eax, %edx
	movl	%edx, (%ecx)
	popl	%ebp
	ret

	.globl	_main
	.align	4, 0x90
_main:                                  ## @main
## BB#0:                                ## %entry
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%esi
	subl	$20, %esp
	call	L1$pb
L1$pb:
	popl	%esi
	leal	-8(%ebp), %eax
	movl	%eax, (%esp)
	movl	$42, 4(%esp)
	call	_set
	movl	-8(%ebp), %eax
	shll	$26, %eax
	sarl	$30, %eax
	movl	%eax, 4(%esp)
	leal	L_.str-L1$pb(%esi), %eax
	movl	%eax, (%esp)
	call	_printf
	xorl	%eax, %eax
	addl	$20, %esp
	popl	%esi
	popl	%ebp
	ret

	.section	__TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
	.asciz	 "hello %d\n"


.subsections_via_symbols

$ clang t.c -O1 -m32 
$ valgrind ./a.out
==59755== Memcheck, a memory error detector.
...
==59755== 
==59755== ERROR
==59755== 
==59755== Conditional jump or move depends on uninitialised value(s)
==59755==    at 0x380E3: __vfprintf+3241 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x5541A: vfprintf_l+99 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x9F68E: printf+78 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x1F6A: main+58 (in ./a.out)
==59755==  Uninitialised value was created by a stack allocation
==59755==    at 0x1F34: main+4 (in ./a.out)
==59755== 
==59755== 
==59755== 
==59755== ERROR
==59755== 
==59755== Conditional jump or move depends on uninitialised value(s)
==59755==    at 0x394DC: __vfprintf+8354 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x5541A: vfprintf_l+99 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x9F68E: printf+78 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x1F6A: main+58 (in ./a.out)
==59755==  Uninitialised value was created by a stack allocation
==59755==    at 0x1F34: main+4 (in ./a.out)
==59755== 
==59755== 
==59755== 
==59755== ERROR
==59755== 
==59755== Conditional jump or move depends on uninitialised value(s)
==59755==    at 0x110AAC: __ultoa+60 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x39549: __vfprintf+8463 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x5541A: vfprintf_l+99 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x9F68E: printf+78 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x1F6A: main+58 (in ./a.out)
==59755==  Uninitialised value was created by a stack allocation
==59755==    at 0x1F34: main+4 (in ./a.out)
==59755== 
==59755== 
==59755== 
==59755== ERROR
==59755== 
==59755== Conditional jump or move depends on uninitialised value(s)
==59755==    at 0x12266: memchr+24 (in /Network/Servers/kenny/homes/chef/public/lib/valgrind/x86-darwin/vgpreload_memcheck.so)
==59755==    by 0x3CEC8: __sfvwrite+591 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x3C8FD: __vfprintf+21699 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x5541A: vfprintf_l+99 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x9F68E: printf+78 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x1F6A: main+58 (in ./a.out)
==59755==  Uninitialised value was created by a stack allocation
==59755==    at 0x1F34: main+4 (in ./a.out)
==59755== 
==59755== 
==59755== 
==59755== ERROR
==59755== 
==59755== Syscall param write(buf) points to uninitialised byte(s)
==59755==    at 0x3E44E: write$NOCANCEL$UNIX2003+10 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x3D1DC: __sflush+64 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x3CF7A: __sfvwrite+769 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x3C952: __vfprintf+21784 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x5541A: vfprintf_l+99 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x9F68E: printf+78 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x1F6A: main+58 (in ./a.out)
==59755==  Address 0x255367 is 7 bytes inside a block of size 4,096 alloc'd
==59755==    at 0xEDCA: malloc+211 (in /Network/Servers/kenny/homes/chef/public/lib/valgrind/x86-darwin/vgpreload_memcheck.so)
==59755==    by 0x55560: __smakebuf+113 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x554C0: __swsetup+133 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x374C3: __vfprintf+137 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x5541A: vfprintf_l+99 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x9F68E: printf+78 (in /usr/lib/libSystem.B.dylib)
==59755==    by 0x1F6A: main+58 (in ./a.out)
==59755==  Uninitialised value was created by a stack allocation
==59755==    at 0x1F34: main+4 (in ./a.out)
hello -2
==59755== 
==59755== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)



Expected Results:  
No errors should be flagged here.
Comment 1 Julian Seward 2010-06-19 11:05:54 UTC
Memcheck already has an optional mode to do exactly precise tracking
of bits through adds.  The reason it is not turned on by default is 
that it is much more expensive than the default handling for adds,
gcc generated code it is not necessary, and we I don't want to burden
instrumentation of all add operations with expense which is not necessary
for 99.9% of the dynamically encountered adds in code.  It would be best
if llvm didn't generate adds that rely on the behavior of partially
initialised values.
Comment 2 clattner 2010-06-19 19:38:04 UTC
The problem here is that the compiler doesn't know the adds are of partially undefined values.  Compiling the 'set' function, for example, the compiler doesn't know the state of the other fields.

How do I enable precise tracking of adds?  Note that all llvm needs in this case is to know for an undefined bit in the output to not propagate "up" the register if the bit it is being added to is known-zero.  I don't know if that simplifies things or not.
Comment 3 Julian Seward 2010-06-21 11:37:29 UTC
(In reply to comment #2)
> How do I enable precise tracking of adds?

memcheck/mc_translate.c: find this

      case Iop_Add32:
         if (mce->bogusLiterals)
            return expensiveAddSub(mce,True,Ity_I32, 
                                   vatom1,vatom2, atom1,atom2);
         else
            goto cheap_AddSub32;

change that to "if (1)."  Ditto for Iop_Add64 if you want the
same on a 64-bit platform.  I would be interested to know if that
helps.
Comment 4 Julian Seward 2010-07-01 10:10:18 UTC
> I would be interested to know if that helps.

Chris, ping ..
Comment 5 clattner 2010-07-01 18:48:29 UTC
Thanks Julian.  I haven't had a chance to try rebuilding valgrind (lame excuse I know).  Is there any way to get mce->bogusLiterals to be true on the command line?
Comment 6 clattner 2010-07-01 18:51:30 UTC
Oh also: I haven't had a chance to work on this either, but I plan to change the llvm code generator to generate *fewer* adds in these cases.  Right now, an early phase in the code generator unconditionally turns any 'or' into an 'add' if it can, because it doesn't know if register allocator will want to turn it into an LEA or something else (e.g. if both inputs are still live after the or).

When I get back to this, I hope to figure out some way to have a late phase change raw adds back into ors.  This will mean that only leas and address-mode foldings will have this, which should reduce the impact of it somewhat, making the valgrind output less noisy.
Comment 7 Julian Seward 2010-07-01 19:25:03 UTC
(In reply to comment #5)
> Is there any way to get mce->bogusLiterals to be true on the command line?

Not currently.  That was however the purpose of my question -- I'm trying
to establish whether it would be worth adding such a flag.
Comment 8 Julian Seward 2010-07-01 19:38:08 UTC
(In reply to comment #6)

> When I get back to this, I hope to figure out some way to have a late phase
> change raw adds back into ors.  This will mean that only leas and address-mode
> foldings will have this, which should reduce the impact of it somewhat, making
> the valgrind output less noisy.

Well, realistically, I don't think this is worth it.  Since only 
an effectively zero false error rate is acceptable, "somewhat reducing"
the impact is not going to make any difference: we will still have to
deploy the expensive Add/Sub interpretation everywhere.  What _would_
be extremely useful is if you could figure out some way that Valgrind
can know which specific adds/leas need to be handled expensively
(or be able to identify a superset of them).  But I can't see how that
could be done.

Or (from my point of view, better, but probably doesn't fit with
your mission goals :-) get rid of the transformation entirely.

Thinking about this more ..

AIUI, the only cases that need to be expensively handled are
addq, addl, leaq and leal.  No need for expensive handling of
subq/l nor for adds which happen as part of effective-address
computations in memory referencing instructions.  Is that correct?
Comment 9 clattner 2010-07-12 01:12:32 UTC
Right, this only kicks in for add, and addressing modes (in lea, load, store etc).  It does not kick in for subtracts or other operations.

I agree with you that 'reducing' the false positive rate isn't really interesting.  To me at least, valgrind is incredibly useful because it is *right*. :-)
Comment 10 clattner 2010-07-12 01:13:03 UTC
BTW, in case I didn't say it before, thank you for making such an amazing tool!
Comment 11 Julian Seward 2010-07-12 16:29:23 UTC
FTR, this is going to remain unresolved until such time as someone
tells me whether the proposals in comment #3 and comment #7 make
any difference.
Comment 12 Daniel Dunbar 2010-09-28 16:32:50 UTC
Hi Julian,

I tried the approach in #3 and it didn't cover all the problems. However, I didn't have a small test case at hand. At some point I'm hoping to be able to spend some time on this and (a) find a small test case, (b) determine if this particular issue is the source of all the valgrind false positives on LLVM code.

And as Chris says, thanks for an awesome tool and especially for getting the SnowLeopard work in!
Comment 13 Julian Seward 2010-09-28 18:48:23 UTC
> I tried the approach in #3 and it didn't cover all the problems. However, I
> didn't have a small test case at hand. At some point I'm hoping to be able to
> spend some time on this and (a) find a small test case, (b) determine if this
> particular issue is the source of all the valgrind false positives on LLVM
> code.

Please do.  It would be cool if that could happen in the next week or
so, since we are arriving rapidly at code freeze for Valgrind 3.6, and
if there is an easy fix that makes Memcheck happy(er) with LLVM code,
then I'd like to be able to ship that in 3.6.
Comment 14 Daniel Dunbar 2010-10-01 04:28:39 UTC
Here is a reduced test case, this is derived from our APFloat.cpp in the compiler itself. I haven't tried reducing the .s output yet...

--
ddunbar@ozzy:vg$ cat t.c
#include <assert.h>
#include <stdlib.h>

struct T {
  unsigned semantics;
  short exponent;
  unsigned category: 3;
  unsigned sign: 1;
};

int main() {
  struct T A, B;

  A.semantics = 10;
  A.category = 0;
  A.sign = 0;

  B.semantics = A.semantics;
  assert(A.semantics == B.semantics);
  B.sign = A.sign;
  B.category = A.category;
  if (A.category != B.category)
    exit(1);

  return 0;
}
ddunbar@ozzy:vg$ gcc -m32 t.c && valgrind -q ./a.out
ddunbar@ozzy:vg$ clang -m32 t.c && valgrind -q ./a.out
==82596== Conditional jump or move depends on uninitialised value(s)
==82596==    at 0x1F1F: main (in ./a.out)
==82596== 

ddunbar@ozzy:vg$ clang -v
clang version 2.9 (trunk 115188)
Target: x86_64-apple-darwin10
Thread model: posix

ddunbar@ozzy:vg$ clang -O0 -S -o t.s t.c
ddunbar@ozzy:vg$ 
--
Comment 15 Daniel Dunbar 2010-10-01 04:30:05 UTC
Created attachment 52133 [details]
Clang output for APFloat derived test case.
Comment 16 Daniel Dunbar 2010-10-01 04:32:09 UTC
Created attachment 52134 [details]
Clang i386 output for APFloat derived example (eliminates EH cruft)
Comment 17 Julian Seward 2010-10-04 22:22:12 UTC
(In reply to comment #16)
> Created an attachment (id=52134) [details]
> Clang i386 output for APFloat derived example (eliminates EH cruft)

This unfortunately isn't much use, because I have no way to 
connect the allegedly offending code address ("at 0x1F1F: main (in ./a.out)")
to any particular instruction in the assembly.  Can you attach
a disassembly of the offending a.out so I can make the connection?
Comment 18 Daniel Dunbar 2010-10-05 01:18:13 UTC
Here is the disassembly:
--
ddunbar@ozzy:tmp$ gcc -m32 t.i386.s 
ddunbar@ozzy:tmp$ valgrind ./a.out
==3073== Memcheck, a memory error detector
==3073== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==3073== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
==3073== Command: ./a.out
==3073== 
--3073-- ./a.out:
--3073-- dSYM directory is missing; consider using --dsymutil=yes
==3073== Conditional jump or move depends on uninitialised value(s)
==3073==    at 0x1F22: main (in ./a.out)
==3073== 
==3073== 
==3073== HEAP SUMMARY:
==3073==     in use at exit: 320 bytes in 7 blocks
==3073==   total heap usage: 7 allocs, 0 frees, 320 bytes allocated
==3073== 
==3073== LEAK SUMMARY:
==3073==    definitely lost: 0 bytes in 0 blocks
==3073==    indirectly lost: 0 bytes in 0 blocks
==3073==      possibly lost: 0 bytes in 0 blocks
==3073==    still reachable: 320 bytes in 7 blocks
==3073==         suppressed: 0 bytes in 0 blocks
==3073== Rerun with --leak-check=full to see details of leaked memory
==3073== 
==3073== For counts of detected and suppressed errors, rerun with: -v
==3073== Use --track-origins=yes to see where uninitialised values come from
==3073== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
ddunbar@ozzy:tmp$ otool -tvr ./a.out
./a.out:
(__TEXT,__text) section
start:
00001e10	pushl	$0x00
00001e12	movl	%esp,%ebp
00001e14	andl	$0xf0,%esp
00001e17	subl	$0x10,%esp
00001e1a	movl	0x04(%ebp),%ebx
00001e1d	movl	%ebx,(%esp)
00001e20	leal	0x08(%ebp),%ecx
00001e23	movl	%ecx,0x04(%esp)
00001e27	addl	$0x01,%ebx
00001e2a	shll	$0x02,%ebx
00001e2d	addl	%ecx,%ebx
00001e2f	movl	%ebx,0x08(%esp)
00001e33	movl	(%ebx),%eax
00001e35	addl	$0x04,%ebx
00001e38	testl	%eax,%eax
00001e3a	jne	0x100001e33
00001e3c	movl	%ebx,0x0c(%esp)
00001e40	calll	0x00001e50
00001e45	movl	%eax,(%esp)
00001e48	calll	0x00001f82
00001e4d	hlt
00001e4e	nop
00001e4f	nop
_main:
00001e50	pushl	%ebp
00001e51	movl	%esp,%ebp
00001e53	pushl	%edi
00001e54	pushl	%esi
00001e55	subl	$0x40,%esp
00001e58	calll	0x00001e5d
00001e5d	popl	%eax
00001e5e	movl	$0x00000000,0xf4(%ebp)
00001e65	movl	$0x0000000a,0xe8(%ebp)
00001e6c	movl	0xec(%ebp),%ecx
00001e6f	andl	$0xfff8ffff,%ecx
00001e75	movl	%ecx,0xec(%ebp)
00001e78	movl	0xec(%ebp),%ecx
00001e7b	andl	$0xfff7ffff,%ecx
00001e81	movl	%ecx,0xec(%ebp)
00001e84	movl	0xe8(%ebp),%ecx
00001e87	movl	%ecx,0xe0(%ebp)
00001e8a	movl	0xe8(%ebp),%ecx
00001e8d	cmpl	0xe0(%ebp),%ecx
00001e90	sete	%dl
00001e93	xorb	$0x01,%dl
00001e96	testb	%dl,%dl
00001e98	movl	%eax,0xdc(%ebp)
00001e9b	jne	0x00001e9f
00001e9d	jmp	0x00001ed4
00001e9f	movl	0xdc(%ebp),%eax
00001ea2	leal	0x000000f3(%eax),%ecx
00001ea8	leal	0x000000f8(%eax),%edx
00001eae	movl	$0x00000013,%esi
00001eb3	leal	0x00000103(%eax),%edi
00001eb9	movl	%ecx,(%esp)
00001ebc	movl	%edx,0x04(%esp)
00001ec0	movl	$0x00000013,0x08(%esp)
00001ec8	movl	%edi,0x0c(%esp)
00001ecc	movl	%esi,0xd8(%ebp)
00001ecf	calll	0x00001f7c
00001ed4	movl	$0x00000010,%eax
00001ed9	movl	0xec(%ebp),%ecx
00001edc	leal	0xe0(%ebp),%edx
00001edf	movl	0xe4(%ebp),%esi
00001ee2	andl	$0xfff7ffff,%esi
00001ee8	andl	$0x00080000,%ecx
00001eee	addl	%esi,%ecx
00001ef0	movl	%ecx,0xe4(%ebp)
00001ef3	andl	$0xfff8ffff,%ecx
00001ef9	movzwl	0xee(%ebp),%esi
00001efd	shll	$0x10,%esi
00001f00	andl	$0x00070000,%esi
00001f06	addl	%ecx,%esi
00001f08	movl	%esi,0xe4(%ebp)
00001f0b	movzwl	0xee(%ebp),%ecx
00001f0f	andl	$0x07,%ecx
00001f12	addl	$0x04,%edx
00001f15	movl	(%edx),%edx
00001f17	shrl	$0x10,%edx
00001f1a	andl	$0x07,%edx
00001f1d	cmpl	%edx,%ecx
00001f1f	movl	%eax,0xd4(%ebp)
00001f22	je	0x00001f38
00001f24	movl	$0x00000001,%eax
00001f29	movl	$0x00000001,(%esp)
00001f30	movl	%eax,0xd0(%ebp)
00001f33	calll	0x00001f82
00001f38	movl	$0x00000000,%eax
00001f3d	addl	$0x40,%esp
00001f40	popl	%esi
00001f41	popl	%edi
00001f42	popl	%ebp
00001f43	ret
ddunbar@ozzy:tmp$ 
--
Comment 19 Julian Seward 2010-10-07 18:17:32 UTC
I looked at this just now, but the data flow is too gnarly
to follow with my puny human brain (evidently Memcheck does
better :-)  Is there a zero-hassle way I can install LLVM 2.8 on
my 10.6 box and try to make a further reduced test case?
Comment 20 Daniel Dunbar 2010-10-07 21:39:49 UTC
Yes, you should be able to reproduce this with Clang/LLVM 2.8, available here:
  http://llvm.org/releases/

Zero hassle way:
--
$ cd /tmp
$ curl -O http://llvm.org/releases/2.8/clang+llvm-2.8-x86_64-apple-darwin10.tar.gz
$ tar zxf clang+llvm-2.8-x86_64-apple-darwin10.tar.gz
$ cat t.c
#include <assert.h>
#include <stdlib.h>

struct T {
  unsigned semantics;
  short exponent;
  unsigned category: 3;
  unsigned sign: 1;
};

int main() {
  struct T A, B;

  A.semantics = 10;
  A.category = 0;
  A.sign = 0;

  B.semantics = A.semantics;
  assert(A.semantics == B.semantics);
  B.sign = A.sign;
  B.category = A.category;
  if (A.category != B.category)
    exit(1);

  return 0;
}
$ clang+llvm-2.8-x86_64-apple-darwin10/bin/clang -m32 t.c
$ valgrind ./a.out
==42550== Memcheck, a memory error detector
==42550== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==42550== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
==42550== Command: ./a.out
==42550== 
--42550-- ./a.out:
--42550-- dSYM directory is missing; consider using --dsymutil=yes
==42550== Conditional jump or move depends on uninitialised value(s)
==42550==    at 0x1F22: main (in ./a.out)
==42550== 
==42550== 
==42550== HEAP SUMMARY:
==42550==     in use at exit: 320 bytes in 7 blocks
==42550==   total heap usage: 7 allocs, 0 frees, 320 bytes allocated
==42550== 
==42550== LEAK SUMMARY:
==42550==    definitely lost: 0 bytes in 0 blocks
==42550==    indirectly lost: 0 bytes in 0 blocks
==42550==      possibly lost: 0 bytes in 0 blocks
==42550==    still reachable: 320 bytes in 7 blocks
==42550==         suppressed: 0 bytes in 0 blocks
==42550== Rerun with --leak-check=full to see details of leaked memory
==42550== 
==42550== For counts of detected and suppressed errors, rerun with: -v
==42550== Use --track-origins=yes to see where uninitialised values come from
==42550== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
$
--
Comment 21 Daniel Dunbar 2010-10-08 02:44:48 UTC
Here is another test case from Chris, this one is important because here we are actually using the ability to turn the value into an leal:
--
$ cat t.c
int test(int a, int b) {
  a &= 6;
  b &= 48;
  return a|b;
}

$ clang -fomit-frame-pointer -m64 -Os -S -o - t.c
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test
_test:
Leh_func_begin0:
	andl	$6, %edi
	andl	$48, %esi
	leal	(%rsi,%rdi), %eax
	ret
...
--
Comment 22 clattner 2010-10-08 06:19:21 UTC
I just committed as series of patches to LLVM mainline (will be in LLVM 2.9 and later) that change LLVM to stop emitting 'add' instructions when 'or' could do in its place.  However, we still do generate LEA instructions when doing so is advantageous (when one LEA replaces an add + copy + reg clobber).  This means that the original testcase and the testcase from comment #20 no longer generate the fpos from valgrind, but that other more complex ones still do.

It would still be great for Valgrind to handle the tests from comment #20 and the original one, but it is less critical long term than handling the LEA case daniel pasted in comment #21.
Comment 23 Julian Seward 2010-10-08 11:18:50 UTC
(In reply to comment #14)
> Here is a reduced test case, this is derived from our APFloat.cpp in the
> compiler itself. I haven't tried reducing the .s output yet...

Daniel, are you sure you conducted the experiment in comment #3
correctly?  For me, it causes memcheck to stop complaining about
the test program in this comment.
Comment 24 Daniel Dunbar 2010-10-08 16:20:28 UTC
> Daniel, are you sure you conducted the experiment in comment #3
> correctly?  For me, it causes memcheck to stop complaining about
> the test program in this comment.

Julian, 

No, I'm not.

I didn't actually have a small test program at that point, what I did was change the valgrind code & recompile, and then try running on a full self hosted Clang compiler (which is what I was trying to debug at the time). I still got lots of false positives, so I assumed it didn't work (or there was more to the issue).

 - Daniel
Comment 25 Julian Seward 2010-10-09 02:48:10 UTC
On x86_64-Linux I was able to build enough of qt-4.5.2 to be able
to run a convincing size C++ test case.  This is with clang++-2.8
for x86_64-linux, -g -O.

Vanilla valgrind spews out thousands of errors when running one
of the examples, but the patch below makes it run completely
clean.

btw, when asked to read the full Dwarf3 generated by LLVM (instead
of just the line number info), using --read-var-info=yes, V did not
take kindly to what it found.  I have not investigated:


--17503-- WARNING: Serious error when reading debug info
--17503-- When reading debug info from /home/sewardj/Tools/qt-x11-opensource-src-4.5.2/lib/libQtGui.so.4.5.2:
--17503-- negative range in .debug_loc section
--17503-- WARNING: Serious error when reading debug info
--17503-- When reading debug info from /home/sewardj/Tools/qt-x11-opensource-src-4.5.2/lib/libQtCore.so.4.5.2:
--17503-- negative range in .debug_loc section




Index: memcheck/mc_translate.c
===================================================================
--- memcheck/mc_translate.c     (revision 11417)
+++ memcheck/mc_translate.c     (working copy)
@@ -2929,7 +2929,7 @@
          return mkLazy2(mce, Ity_I64, vatom1, vatom2);
 
       case Iop_Add32:
-         if (mce->bogusLiterals)
+         if (1||mce->bogusLiterals)
             return expensiveAddSub(mce,True,Ity_I32, 
                                    vatom1,vatom2, atom1,atom2);
          else
@@ -2952,7 +2952,7 @@
          return doCmpORD(mce, op, vatom1,vatom2, atom1,atom2);
 
       case Iop_Add64:
-         if (mce->bogusLiterals)
+         if (1||mce->bogusLiterals)
             return expensiveAddSub(mce,True,Ity_I64, 
                                    vatom1,vatom2, atom1,atom2);
          else
Comment 26 Julian Seward 2010-10-09 02:51:47 UTC
(In reply to comment #22)
> I just committed as series of patches to LLVM mainline (will be in LLVM 2.9 and
> later) that change LLVM to stop emitting 'add' instructions when 'or' could do
> in its place.

As per comment #8, I reiterate that this is pointless from the
perspective of keeping Memcheck happy.  If you have some other reasons
for it (lower latency, lower power use, shorter encoding, whatever)
then fine.
Comment 27 Julian Seward 2010-10-09 11:14:11 UTC
(In reply to comment #25)
> On x86_64-Linux I was able to build enough of qt-4.5.2 to be able
> to run a convincing size C++ test case.  This is with clang++-2.8
> for x86_64-linux, -g -O.
> 
> Vanilla valgrind spews out thousands of errors when running one
> of the examples, but the patch below makes it run completely
> clean.

Patch also works ok for qt-4.5.2 build -g -O2.
Comment 28 Julian Seward 2010-10-12 02:47:18 UTC
Deferring to Valgrind 3.7.  Anybody needing an emergency fix in
3.6 can use the patch in comment 25.
Comment 29 Julian Seward 2011-04-10 12:59:31 UTC
Is this still a problem with LLVM 2.9 ?  I can't reproduce it with 2.9.
Comment 30 Julian Seward 2011-10-12 10:04:17 UTC
Let's revisit this once LLVM 3.0 is out.
Comment 31 Julian Seward 2012-03-27 10:21:24 UTC
r12647 turns on by default, the expensive interpretation for Add32/Add64
on MacOSX.  So, providing you're using LLVM on MacOSX, this problem
will simply disappear.  Users of LLVM on Linux still have to contend with
it.
Comment 32 Stephan Bergmann 2016-04-13 06:17:07 UTC
Modifying memcheck/mc_translate.c to set mce.useLLVMworkarounds = True is still the only way to enable this on Linux, right?
(I'm routinely building with Clang on Linux and spent quite some time now tracking down a false Memcheck:Cond caused by this issue. So it might be nice if this feature were more easily discoverable by users, say if it were controlled at valgrind runtime by a documented command line switch.)
Comment 33 Julian Seward 2016-04-13 11:36:50 UTC
Those checks are enabled now by --expensive-definedness-checks=yes, which
was introduced in 3.11.0.

    --expensive-definedness-checks=no|yes
                                     Use extra-precise definedness tracking [no]

There's still a problem with LLVM optimised code though, and to some extent
also with gcc.  Both of these compilers will now sometimes compile

   if (a && b) ...

as if it was written

  if (b && a)

in the case that they can prove that a is always false whenever b is undefined.
The transformation is correct but it means that there is a branch on uninitialised
data and so Memcheck complains (wrongly.)