Summary: | Valgrind on LLVM compiled code reports tons of 'Conditional jump or move depends on uninitialised' false positives | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | clattner |
Component: | general | Assignee: | Julian Seward <jseward> |
Status: | REPORTED --- | ||
Severity: | normal | CC: | daniel, sbergman |
Priority: | NOR | ||
Version: | 3.4 SVN | ||
Target Milestone: | --- | ||
Platform: | Unlisted Binaries | ||
OS: | macOS | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Clang output for APFloat derived test case.
Clang i386 output for APFloat derived example (eliminates EH cruft) |
Description
clattner
2010-06-19 08:17:06 UTC
Memcheck already has an optional mode to do exactly precise tracking of bits through adds. The reason it is not turned on by default is that it is much more expensive than the default handling for adds, gcc generated code it is not necessary, and we I don't want to burden instrumentation of all add operations with expense which is not necessary for 99.9% of the dynamically encountered adds in code. It would be best if llvm didn't generate adds that rely on the behavior of partially initialised values. The problem here is that the compiler doesn't know the adds are of partially undefined values. Compiling the 'set' function, for example, the compiler doesn't know the state of the other fields. How do I enable precise tracking of adds? Note that all llvm needs in this case is to know for an undefined bit in the output to not propagate "up" the register if the bit it is being added to is known-zero. I don't know if that simplifies things or not. (In reply to comment #2) > How do I enable precise tracking of adds? memcheck/mc_translate.c: find this case Iop_Add32: if (mce->bogusLiterals) return expensiveAddSub(mce,True,Ity_I32, vatom1,vatom2, atom1,atom2); else goto cheap_AddSub32; change that to "if (1)." Ditto for Iop_Add64 if you want the same on a 64-bit platform. I would be interested to know if that helps.
> I would be interested to know if that helps.
Chris, ping ..
Thanks Julian. I haven't had a chance to try rebuilding valgrind (lame excuse I know). Is there any way to get mce->bogusLiterals to be true on the command line? Oh also: I haven't had a chance to work on this either, but I plan to change the llvm code generator to generate *fewer* adds in these cases. Right now, an early phase in the code generator unconditionally turns any 'or' into an 'add' if it can, because it doesn't know if register allocator will want to turn it into an LEA or something else (e.g. if both inputs are still live after the or). When I get back to this, I hope to figure out some way to have a late phase change raw adds back into ors. This will mean that only leas and address-mode foldings will have this, which should reduce the impact of it somewhat, making the valgrind output less noisy. (In reply to comment #5) > Is there any way to get mce->bogusLiterals to be true on the command line? Not currently. That was however the purpose of my question -- I'm trying to establish whether it would be worth adding such a flag. (In reply to comment #6) > When I get back to this, I hope to figure out some way to have a late phase > change raw adds back into ors. This will mean that only leas and address-mode > foldings will have this, which should reduce the impact of it somewhat, making > the valgrind output less noisy. Well, realistically, I don't think this is worth it. Since only an effectively zero false error rate is acceptable, "somewhat reducing" the impact is not going to make any difference: we will still have to deploy the expensive Add/Sub interpretation everywhere. What _would_ be extremely useful is if you could figure out some way that Valgrind can know which specific adds/leas need to be handled expensively (or be able to identify a superset of them). But I can't see how that could be done. Or (from my point of view, better, but probably doesn't fit with your mission goals :-) get rid of the transformation entirely. Thinking about this more .. AIUI, the only cases that need to be expensively handled are addq, addl, leaq and leal. No need for expensive handling of subq/l nor for adds which happen as part of effective-address computations in memory referencing instructions. Is that correct? Right, this only kicks in for add, and addressing modes (in lea, load, store etc). It does not kick in for subtracts or other operations. I agree with you that 'reducing' the false positive rate isn't really interesting. To me at least, valgrind is incredibly useful because it is *right*. :-) BTW, in case I didn't say it before, thank you for making such an amazing tool! FTR, this is going to remain unresolved until such time as someone tells me whether the proposals in comment #3 and comment #7 make any difference. Hi Julian, I tried the approach in #3 and it didn't cover all the problems. However, I didn't have a small test case at hand. At some point I'm hoping to be able to spend some time on this and (a) find a small test case, (b) determine if this particular issue is the source of all the valgrind false positives on LLVM code. And as Chris says, thanks for an awesome tool and especially for getting the SnowLeopard work in!
> I tried the approach in #3 and it didn't cover all the problems. However, I
> didn't have a small test case at hand. At some point I'm hoping to be able to
> spend some time on this and (a) find a small test case, (b) determine if this
> particular issue is the source of all the valgrind false positives on LLVM
> code.
Please do. It would be cool if that could happen in the next week or
so, since we are arriving rapidly at code freeze for Valgrind 3.6, and
if there is an easy fix that makes Memcheck happy(er) with LLVM code,
then I'd like to be able to ship that in 3.6.
Here is a reduced test case, this is derived from our APFloat.cpp in the compiler itself. I haven't tried reducing the .s output yet... -- ddunbar@ozzy:vg$ cat t.c #include <assert.h> #include <stdlib.h> struct T { unsigned semantics; short exponent; unsigned category: 3; unsigned sign: 1; }; int main() { struct T A, B; A.semantics = 10; A.category = 0; A.sign = 0; B.semantics = A.semantics; assert(A.semantics == B.semantics); B.sign = A.sign; B.category = A.category; if (A.category != B.category) exit(1); return 0; } ddunbar@ozzy:vg$ gcc -m32 t.c && valgrind -q ./a.out ddunbar@ozzy:vg$ clang -m32 t.c && valgrind -q ./a.out ==82596== Conditional jump or move depends on uninitialised value(s) ==82596== at 0x1F1F: main (in ./a.out) ==82596== ddunbar@ozzy:vg$ clang -v clang version 2.9 (trunk 115188) Target: x86_64-apple-darwin10 Thread model: posix ddunbar@ozzy:vg$ clang -O0 -S -o t.s t.c ddunbar@ozzy:vg$ -- Created attachment 52133 [details]
Clang output for APFloat derived test case.
Created attachment 52134 [details]
Clang i386 output for APFloat derived example (eliminates EH cruft)
(In reply to comment #16) > Created an attachment (id=52134) [details] > Clang i386 output for APFloat derived example (eliminates EH cruft) This unfortunately isn't much use, because I have no way to connect the allegedly offending code address ("at 0x1F1F: main (in ./a.out)") to any particular instruction in the assembly. Can you attach a disassembly of the offending a.out so I can make the connection? Here is the disassembly: -- ddunbar@ozzy:tmp$ gcc -m32 t.i386.s ddunbar@ozzy:tmp$ valgrind ./a.out ==3073== Memcheck, a memory error detector ==3073== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==3073== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info ==3073== Command: ./a.out ==3073== --3073-- ./a.out: --3073-- dSYM directory is missing; consider using --dsymutil=yes ==3073== Conditional jump or move depends on uninitialised value(s) ==3073== at 0x1F22: main (in ./a.out) ==3073== ==3073== ==3073== HEAP SUMMARY: ==3073== in use at exit: 320 bytes in 7 blocks ==3073== total heap usage: 7 allocs, 0 frees, 320 bytes allocated ==3073== ==3073== LEAK SUMMARY: ==3073== definitely lost: 0 bytes in 0 blocks ==3073== indirectly lost: 0 bytes in 0 blocks ==3073== possibly lost: 0 bytes in 0 blocks ==3073== still reachable: 320 bytes in 7 blocks ==3073== suppressed: 0 bytes in 0 blocks ==3073== Rerun with --leak-check=full to see details of leaked memory ==3073== ==3073== For counts of detected and suppressed errors, rerun with: -v ==3073== Use --track-origins=yes to see where uninitialised values come from ==3073== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ddunbar@ozzy:tmp$ otool -tvr ./a.out ./a.out: (__TEXT,__text) section start: 00001e10 pushl $0x00 00001e12 movl %esp,%ebp 00001e14 andl $0xf0,%esp 00001e17 subl $0x10,%esp 00001e1a movl 0x04(%ebp),%ebx 00001e1d movl %ebx,(%esp) 00001e20 leal 0x08(%ebp),%ecx 00001e23 movl %ecx,0x04(%esp) 00001e27 addl $0x01,%ebx 00001e2a shll $0x02,%ebx 00001e2d addl %ecx,%ebx 00001e2f movl %ebx,0x08(%esp) 00001e33 movl (%ebx),%eax 00001e35 addl $0x04,%ebx 00001e38 testl %eax,%eax 00001e3a jne 0x100001e33 00001e3c movl %ebx,0x0c(%esp) 00001e40 calll 0x00001e50 00001e45 movl %eax,(%esp) 00001e48 calll 0x00001f82 00001e4d hlt 00001e4e nop 00001e4f nop _main: 00001e50 pushl %ebp 00001e51 movl %esp,%ebp 00001e53 pushl %edi 00001e54 pushl %esi 00001e55 subl $0x40,%esp 00001e58 calll 0x00001e5d 00001e5d popl %eax 00001e5e movl $0x00000000,0xf4(%ebp) 00001e65 movl $0x0000000a,0xe8(%ebp) 00001e6c movl 0xec(%ebp),%ecx 00001e6f andl $0xfff8ffff,%ecx 00001e75 movl %ecx,0xec(%ebp) 00001e78 movl 0xec(%ebp),%ecx 00001e7b andl $0xfff7ffff,%ecx 00001e81 movl %ecx,0xec(%ebp) 00001e84 movl 0xe8(%ebp),%ecx 00001e87 movl %ecx,0xe0(%ebp) 00001e8a movl 0xe8(%ebp),%ecx 00001e8d cmpl 0xe0(%ebp),%ecx 00001e90 sete %dl 00001e93 xorb $0x01,%dl 00001e96 testb %dl,%dl 00001e98 movl %eax,0xdc(%ebp) 00001e9b jne 0x00001e9f 00001e9d jmp 0x00001ed4 00001e9f movl 0xdc(%ebp),%eax 00001ea2 leal 0x000000f3(%eax),%ecx 00001ea8 leal 0x000000f8(%eax),%edx 00001eae movl $0x00000013,%esi 00001eb3 leal 0x00000103(%eax),%edi 00001eb9 movl %ecx,(%esp) 00001ebc movl %edx,0x04(%esp) 00001ec0 movl $0x00000013,0x08(%esp) 00001ec8 movl %edi,0x0c(%esp) 00001ecc movl %esi,0xd8(%ebp) 00001ecf calll 0x00001f7c 00001ed4 movl $0x00000010,%eax 00001ed9 movl 0xec(%ebp),%ecx 00001edc leal 0xe0(%ebp),%edx 00001edf movl 0xe4(%ebp),%esi 00001ee2 andl $0xfff7ffff,%esi 00001ee8 andl $0x00080000,%ecx 00001eee addl %esi,%ecx 00001ef0 movl %ecx,0xe4(%ebp) 00001ef3 andl $0xfff8ffff,%ecx 00001ef9 movzwl 0xee(%ebp),%esi 00001efd shll $0x10,%esi 00001f00 andl $0x00070000,%esi 00001f06 addl %ecx,%esi 00001f08 movl %esi,0xe4(%ebp) 00001f0b movzwl 0xee(%ebp),%ecx 00001f0f andl $0x07,%ecx 00001f12 addl $0x04,%edx 00001f15 movl (%edx),%edx 00001f17 shrl $0x10,%edx 00001f1a andl $0x07,%edx 00001f1d cmpl %edx,%ecx 00001f1f movl %eax,0xd4(%ebp) 00001f22 je 0x00001f38 00001f24 movl $0x00000001,%eax 00001f29 movl $0x00000001,(%esp) 00001f30 movl %eax,0xd0(%ebp) 00001f33 calll 0x00001f82 00001f38 movl $0x00000000,%eax 00001f3d addl $0x40,%esp 00001f40 popl %esi 00001f41 popl %edi 00001f42 popl %ebp 00001f43 ret ddunbar@ozzy:tmp$ -- I looked at this just now, but the data flow is too gnarly to follow with my puny human brain (evidently Memcheck does better :-) Is there a zero-hassle way I can install LLVM 2.8 on my 10.6 box and try to make a further reduced test case? Yes, you should be able to reproduce this with Clang/LLVM 2.8, available here: http://llvm.org/releases/ Zero hassle way: -- $ cd /tmp $ curl -O http://llvm.org/releases/2.8/clang+llvm-2.8-x86_64-apple-darwin10.tar.gz $ tar zxf clang+llvm-2.8-x86_64-apple-darwin10.tar.gz $ cat t.c #include <assert.h> #include <stdlib.h> struct T { unsigned semantics; short exponent; unsigned category: 3; unsigned sign: 1; }; int main() { struct T A, B; A.semantics = 10; A.category = 0; A.sign = 0; B.semantics = A.semantics; assert(A.semantics == B.semantics); B.sign = A.sign; B.category = A.category; if (A.category != B.category) exit(1); return 0; } $ clang+llvm-2.8-x86_64-apple-darwin10/bin/clang -m32 t.c $ valgrind ./a.out ==42550== Memcheck, a memory error detector ==42550== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==42550== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info ==42550== Command: ./a.out ==42550== --42550-- ./a.out: --42550-- dSYM directory is missing; consider using --dsymutil=yes ==42550== Conditional jump or move depends on uninitialised value(s) ==42550== at 0x1F22: main (in ./a.out) ==42550== ==42550== ==42550== HEAP SUMMARY: ==42550== in use at exit: 320 bytes in 7 blocks ==42550== total heap usage: 7 allocs, 0 frees, 320 bytes allocated ==42550== ==42550== LEAK SUMMARY: ==42550== definitely lost: 0 bytes in 0 blocks ==42550== indirectly lost: 0 bytes in 0 blocks ==42550== possibly lost: 0 bytes in 0 blocks ==42550== still reachable: 320 bytes in 7 blocks ==42550== suppressed: 0 bytes in 0 blocks ==42550== Rerun with --leak-check=full to see details of leaked memory ==42550== ==42550== For counts of detected and suppressed errors, rerun with: -v ==42550== Use --track-origins=yes to see where uninitialised values come from ==42550== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) $ -- Here is another test case from Chris, this one is important because here we are actually using the ability to turn the value into an leal: -- $ cat t.c int test(int a, int b) { a &= 6; b &= 48; return a|b; } $ clang -fomit-frame-pointer -m64 -Os -S -o - t.c .section __TEXT,__text,regular,pure_instructions .globl _test _test: Leh_func_begin0: andl $6, %edi andl $48, %esi leal (%rsi,%rdi), %eax ret ... -- I just committed as series of patches to LLVM mainline (will be in LLVM 2.9 and later) that change LLVM to stop emitting 'add' instructions when 'or' could do in its place. However, we still do generate LEA instructions when doing so is advantageous (when one LEA replaces an add + copy + reg clobber). This means that the original testcase and the testcase from comment #20 no longer generate the fpos from valgrind, but that other more complex ones still do. It would still be great for Valgrind to handle the tests from comment #20 and the original one, but it is less critical long term than handling the LEA case daniel pasted in comment #21. (In reply to comment #14) > Here is a reduced test case, this is derived from our APFloat.cpp in the > compiler itself. I haven't tried reducing the .s output yet... Daniel, are you sure you conducted the experiment in comment #3 correctly? For me, it causes memcheck to stop complaining about the test program in this comment. > Daniel, are you sure you conducted the experiment in comment #3
> correctly? For me, it causes memcheck to stop complaining about
> the test program in this comment.
Julian,
No, I'm not.
I didn't actually have a small test program at that point, what I did was change the valgrind code & recompile, and then try running on a full self hosted Clang compiler (which is what I was trying to debug at the time). I still got lots of false positives, so I assumed it didn't work (or there was more to the issue).
- Daniel
On x86_64-Linux I was able to build enough of qt-4.5.2 to be able to run a convincing size C++ test case. This is with clang++-2.8 for x86_64-linux, -g -O. Vanilla valgrind spews out thousands of errors when running one of the examples, but the patch below makes it run completely clean. btw, when asked to read the full Dwarf3 generated by LLVM (instead of just the line number info), using --read-var-info=yes, V did not take kindly to what it found. I have not investigated: --17503-- WARNING: Serious error when reading debug info --17503-- When reading debug info from /home/sewardj/Tools/qt-x11-opensource-src-4.5.2/lib/libQtGui.so.4.5.2: --17503-- negative range in .debug_loc section --17503-- WARNING: Serious error when reading debug info --17503-- When reading debug info from /home/sewardj/Tools/qt-x11-opensource-src-4.5.2/lib/libQtCore.so.4.5.2: --17503-- negative range in .debug_loc section Index: memcheck/mc_translate.c =================================================================== --- memcheck/mc_translate.c (revision 11417) +++ memcheck/mc_translate.c (working copy) @@ -2929,7 +2929,7 @@ return mkLazy2(mce, Ity_I64, vatom1, vatom2); case Iop_Add32: - if (mce->bogusLiterals) + if (1||mce->bogusLiterals) return expensiveAddSub(mce,True,Ity_I32, vatom1,vatom2, atom1,atom2); else @@ -2952,7 +2952,7 @@ return doCmpORD(mce, op, vatom1,vatom2, atom1,atom2); case Iop_Add64: - if (mce->bogusLiterals) + if (1||mce->bogusLiterals) return expensiveAddSub(mce,True,Ity_I64, vatom1,vatom2, atom1,atom2); else (In reply to comment #22) > I just committed as series of patches to LLVM mainline (will be in LLVM 2.9 and > later) that change LLVM to stop emitting 'add' instructions when 'or' could do > in its place. As per comment #8, I reiterate that this is pointless from the perspective of keeping Memcheck happy. If you have some other reasons for it (lower latency, lower power use, shorter encoding, whatever) then fine. (In reply to comment #25) > On x86_64-Linux I was able to build enough of qt-4.5.2 to be able > to run a convincing size C++ test case. This is with clang++-2.8 > for x86_64-linux, -g -O. > > Vanilla valgrind spews out thousands of errors when running one > of the examples, but the patch below makes it run completely > clean. Patch also works ok for qt-4.5.2 build -g -O2. Deferring to Valgrind 3.7. Anybody needing an emergency fix in 3.6 can use the patch in comment 25. Is this still a problem with LLVM 2.9 ? I can't reproduce it with 2.9. Let's revisit this once LLVM 3.0 is out. r12647 turns on by default, the expensive interpretation for Add32/Add64 on MacOSX. So, providing you're using LLVM on MacOSX, this problem will simply disappear. Users of LLVM on Linux still have to contend with it. Modifying memcheck/mc_translate.c to set mce.useLLVMworkarounds = True is still the only way to enable this on Linux, right? (I'm routinely building with Clang on Linux and spent quite some time now tracking down a false Memcheck:Cond caused by this issue. So it might be nice if this feature were more easily discoverable by users, say if it were controlled at valgrind runtime by a documented command line switch.) Those checks are enabled now by --expensive-definedness-checks=yes, which was introduced in 3.11.0. --expensive-definedness-checks=no|yes Use extra-precise definedness tracking [no] There's still a problem with LLVM optimised code though, and to some extent also with gcc. Both of these compilers will now sometimes compile if (a && b) ... as if it was written if (b && a) in the case that they can prove that a is always false whenever b is undefined. The transformation is correct but it means that there is a branch on uninitialised data and so Memcheck complains (wrongly.) |