Bug 266389

Summary: False Memcheck reports on gcc -O1 optimizations around CMP instructions.
Product: [Developer tools] valgrind Reporter: Timur Iskhodzhanov <timurrrr>
Component: memcheckAssignee: Julian Seward <jseward>
Status: RESOLVED WORKSFORME    
Severity: normal CC: pjfloyd
Priority: NOR    
Version: 3.7 SVN   
Target Milestone: ---   
Platform: Unlisted Binaries   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: experimental patch to enable expensive checks via command

Description Timur Iskhodzhanov 2011-02-15 17:53:27 UTC
[this is a copy-n-paste from my e-mail to valgrind-developers, just want to have a bug issue with a permalink]

I'm Valgrinding Chromium code and recently I've found a number of
false positives in memcheck on x86_64:
http://code.google.com/p/chromium/issues/detail?id=64918
http://code.google.com/p/chromium/issues/detail?id=64894
(please note the stack traces are different and spread across multiple modules)

After some investigation, it revelead to be a false positive which was
caused by gcc -O1 optimization.
http://codereview.chromium.org/5599004
There was a line like "bool cond = (a == 1 && b == 2);" and "a != 1"
and "b" was uninitialized.

I wrote the following reproducer (works on x86 too):
<<<<<<<<<<<<<<<<<<<<<
#include <stdio.h>

struct S {
  int x;  // for alignment.
  short a, b;
} S;

bool foo(struct S *s) {
  return s->a == 0x1234 && s->b == 0x5678;
}

int main() {
  struct S s;
  s.a = 0;
  // s.b is undefined but shouldn't be read since s.a != 0x1234
  if (foo(&s))
    printf("ASD\n");
  return 0;
}
>>>>>>>>>>>>>>>>>>>>>
$ g++ -g opt_demo.cpp -o o0 && valgrind ./o0
... ERROR SUMMARY: 0 errors from 0 contexts

$ g++ -g -O1 opt_demo.cpp -o o1 && valgrind ./o1
...
Conditional jump or move depends on uninitialised value(s)
 0x400604: main (opt_demo.cpp:16)
...

$ objdump -d o0 | grep "foo.*:" --after-context=17
00000000004005b4 <_Z3fooP1S>:
  4005b4:       55                      push   %rbp
  4005b5:       48 89 e5                mov    %rsp,%rbp
  4005b8:       48 89 7d f8             mov    %rdi,-0x8(%rbp)
  4005bc:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4005c0:       0f b7 40 04             movzwl 0x4(%rax),%eax
  4005c4:       66 3d 34 12             cmp    $0x1234,%ax
  4005c8:       75 15                   jne    4005df <_Z3fooP1S+0x2b>
  4005ca:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4005ce:       0f b7 40 06             movzwl 0x6(%rax),%eax
  4005d2:       66 3d 78 56             cmp    $0x5678,%ax
  4005d6:       75 07                   jne    4005df <_Z3fooP1S+0x2b>
  4005d8:       b8 01 00 00 00          mov    $0x1,%eax
  4005dd:       eb 05                   jmp    4005e4 <_Z3fooP1S+0x30>
  4005df:       b8 00 00 00 00          mov    $0x0,%eax
  4005e4:       c9                      leaveq
  4005e5:       c3                      retq

$ objdump -d o1 | grep "foo.*:" --after-context=4
00000000004005e4 <_Z3fooP1S>:
 4005e4:       81 7f 04 34 12 78 56    cmpl   $0x56781234,0x4(%rdi)
 4005eb:       0f 94 c0                sete   %al
 4005ee:       c3                      retq

-O1 has put "a" and "b" into a single 32-bit value and compares it
against 0x56781234.
As a result, Valgrind thinks that the foo() return value depends on
s->b hence is uninitialized.
This barks on the "if (foo(...))" line.

The same happens on x86_64 when "a" and "b" are int and "x" is a
64-bit value (this was the case for that Chromium report)

There are two options how to get rid of these warnings:
1) Use less optimizations when compiling tests for Valgrind.
We've been using "g++ -g -O1 -fno-inline -fno-omit-frame-pointer
-fno-builtin" for more than a year on x86 and it worked fine.
(these -fno-... flags don't help in this case)
The CMP issue was found when we've started running Valgrind on x86_64
and two adjacent enums were merged into a single 64-bit integer
similar to the code above.
I don't know a flag to disable these CMP-related optimizations.
Also, one can do such an optimization "manually" and this will make
Valgrind sad even in -O0.

2) Set EQ/NE flags as "defined" when the CMP EQ/NE results can be
calculated based on the defined parts of the operands.
(e.g 0xXXXX0000 != 0x56781234 no matter what XXXX is)
Looks like the "expensiveCmpEQorNE" function in
memcheck/mc_translate.c does the trick but it is not called on this
reproducer.

The reports gone away once I applied this patch:
<<<<<<<<<<<<<<<<<<<<<<<
Index: memcheck/mc_translate.c
===================================================================
--- memcheck/mc_translate.c     (revision 11483)
+++ memcheck/mc_translate.c     (working copy)
@@ -2980,7 +2980,7 @@

      case Iop_CmpEQ64:
      case Iop_CmpNE64:
-         if (mce->bogusLiterals)
+         if (1 || mce->bogusLiterals)
            return expensiveCmpEQorNE(mce,Ity_I64, vatom1,vatom2,
atom1,atom2 );
         else
            goto cheap_cmp64;
@@ -2991,7 +2991,7 @@

      case Iop_CmpEQ32:
      case Iop_CmpNE32:
-         if (mce->bogusLiterals)
+         if (1 || mce->bogusLiterals)
            return expensiveCmpEQorNE(mce,Ity_I32, vatom1,vatom2,
atom1,atom2 );
         else
            goto cheap_cmp32;
>>>>>>>>>>>>>>>>>>>>>>>

Is it OK that these literals are not considered "bogus"?
With this patch my Chromium tests run about ~10% slower.
I believe it's possible to do something which is still fast enough yet correct.
Comment 1 Julian Seward 2011-02-15 18:27:22 UTC
> Is it OK that these literals are not considered "bogus"?

It's always OK to use the expensive cmpEQ/NE versions, if that's what
you mean.

> With this patch my Chromium tests run about ~10% slower.

10% loss is a lot to pay for a (very) small gain in accuracy.

> I believe it's possible to do something which is still fast enough yet correct.

Well, that would be nice.  Have you got any details?
Comment 2 Timur Iskhodzhanov 2011-02-15 18:44:18 UTC
> It's always OK to use the expensive cmpEQ/NE versions, if that's what
> you mean.
Yes.

>> With this patch my Chromium tests run about ~10% slower.
>> I believe it's possible to do something which is still
>> fast enough yet correct.
> 10% loss is a lot to pay for a (very) small gain in accuracy.
> Well, that would be nice.
Absolutely, 10% is not worth it.
But if it's possible to do it with 1% - I want it.

> Have you got any details?
Nope, I'm not that familiar with the Memcheck code.
Maybe you can find something out?
Comment 3 Christian Borntraeger 2011-02-15 19:12:11 UTC
Created attachment 57279 [details]
experimental patch to enable expensive checks via command

During the port of s390x I was experimenting with this patch for fix partiallydefineq. I finally fixed it differently, but this patch might be another way to solve your problem
Comment 4 Paul Floyd 2023-02-02 14:10:13 UTC
I'm not sure what changed or when but I get
gcc (GCC) 12.0.1 20220420 (experimental) [was at or close to the 12.0.1 release version]

objdump --disassemble=foo bug266389

0000000000401130 <foo>:
  401130:	81 7f 04 34 12 78 56 	cmpl   $0x56781234,0x4(%rdi)
  401137:	0f 94 c0             	sete   %al
  40113a:	c3                   	ret  

No change there

~/tools/valgrind/bin/valgrind -q ./bug266389
[no output]