Bug 245925

Summary: x86-64 red zone handling problem
Product: [Developer tools] valgrind Reporter: Jakub Jelinek <jakub>
Component: generalAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version: 3.5.0   
Target Milestone: ---   
Platform: Fedora RPMs   
OS: Linux   
URL: http://bugzilla.redhat.com/618360
Latest Commit: Version Fixed In:
Attachments: valgr.s - assembly for the testcase
valgr.c

Description Jakub Jelinek 2010-07-27 18:32:02 UTC
Created attachment 49541 [details]
valgr.s - assembly for the testcase

On the attached testcase valgrind reports a segfault, but there isn't and shouldn't be any.

The code in a leaf function loads a value from memory:
movq    unit_final_absence_set_table(%rip), %rax
then after a few insns stores it into the red zone:
movq    %rax, -8(%rsp)
and a few insns later, without changing -8(%rsp), but with setting %rax:
        movq    -8(%rsp), %rdx
        cltq
        movq    (%rdx,%rax,8), %r8
On this last insn it reports segfault, %rax is the expected 2, but %rdx is 12 instead of the expected pointer (pointer returned from malloc).  With --db-attach=yes the debugger shows that -8(%rsp) has that bogus value of 12, which is different from the expected value.  The value at -16(%rsp) looks correct though.  The function is leaf and doesn't touch -8(%rsp) once it is stored.
Comment 1 Jakub Jelinek 2010-07-27 18:33:12 UTC
Created attachment 49542 [details]
valgr.c

Corresponding C source.  Assembly attached mainly because other compilers will likely generate different code, which wouldn't trigger this.
Comment 2 Julian Seward 2010-07-27 22:13:05 UTC
This happened because of an error in valgrind's emulation of:

  400588:	4d 0f a3 cb          	bt     %r9,%r11

In the x86 emulation, 'bt reg,reg' is done with a horrible kludge
where we push the operand on the stack, and then use the same handling
as for the 'bt mem,reg' case.  (Of course removing the operand from
the stack afterwards).  The x86_64 code was derived from this, and I
forgot about the red zone, so of course this kludge trashes -8(%rsp)
as you observed.  (Duh).

Here is a temporary fix.

Index: priv/guest_amd64_toIR.c
===================================================================
--- priv/guest_amd64_toIR.c	(revision 1989)
+++ priv/guest_amd64_toIR.c	(working copy)
@@ -7337,7 +7337,10 @@
       t_rsp = newTemp(Ity_I64);
       t_addr0 = newTemp(Ity_I64);
 
-      assign( t_rsp, binop(Iop_Sub64, getIReg64(R_RSP), mkU64(sz)) );
+      vassert(vbi->guest_stack_redzone_size == 128);
+      assign( t_rsp, binop(Iop_Sub64,
+                           getIReg64(R_RSP),
+                           mkU64(sz + vbi->guest_stack_redzone_size)) );
       putIReg64(R_RSP, mkexpr(t_rsp));
 
       storeLE( mkexpr(t_rsp), getIRegE(sz, pfx, modrm) );
@@ -7436,7 +7439,9 @@
          standard zero-extend rule */
       if (op != BtOpNone)
          putIRegE(sz, pfx, modrm, loadLE(szToITy(sz), mkexpr(t_rsp)) );
-      putIReg64(R_RSP, binop(Iop_Add64, mkexpr(t_rsp), mkU64(sz)) );
+      putIReg64(R_RSP, binop(Iop_Add64,
+                             mkexpr(t_rsp),
+                             mkU64(sz + vbi->guest_stack_redzone_size)) );
    }
 
    DIP("bt%s%c %s, %s\n",
Comment 3 Julian Seward 2010-07-29 20:18:49 UTC
Fixed (vex r1997).

Jakub, thank you very much for the small test case.  I had suspected
since about Feb this year that there was a bug in the x86_64 integer
simulation, but I could not figure out what it was, despite
considerable effort.  The smallest test case I managed to find was
"firefox, built w/ gcc-4.4 -O2, running on V, crashes when loading
http://en.wikipedia.org", so until your report I still had no idea
what the problem was.

This is really a critical emulation fix that should be pushed via the
distros into shipping 3.5.0s or 3.6.0.SVNs.  After a bit more testing,
perhaps.