Created attachment 49541 [details] valgr.s - assembly for the testcase On the attached testcase valgrind reports a segfault, but there isn't and shouldn't be any. The code in a leaf function loads a value from memory: movq unit_final_absence_set_table(%rip), %rax then after a few insns stores it into the red zone: movq %rax, -8(%rsp) and a few insns later, without changing -8(%rsp), but with setting %rax: movq -8(%rsp), %rdx cltq movq (%rdx,%rax,8), %r8 On this last insn it reports segfault, %rax is the expected 2, but %rdx is 12 instead of the expected pointer (pointer returned from malloc). With --db-attach=yes the debugger shows that -8(%rsp) has that bogus value of 12, which is different from the expected value. The value at -16(%rsp) looks correct though. The function is leaf and doesn't touch -8(%rsp) once it is stored.
Created attachment 49542 [details] valgr.c Corresponding C source. Assembly attached mainly because other compilers will likely generate different code, which wouldn't trigger this.
This happened because of an error in valgrind's emulation of: 400588: 4d 0f a3 cb bt %r9,%r11 In the x86 emulation, 'bt reg,reg' is done with a horrible kludge where we push the operand on the stack, and then use the same handling as for the 'bt mem,reg' case. (Of course removing the operand from the stack afterwards). The x86_64 code was derived from this, and I forgot about the red zone, so of course this kludge trashes -8(%rsp) as you observed. (Duh). Here is a temporary fix. Index: priv/guest_amd64_toIR.c =================================================================== --- priv/guest_amd64_toIR.c (revision 1989) +++ priv/guest_amd64_toIR.c (working copy) @@ -7337,7 +7337,10 @@ t_rsp = newTemp(Ity_I64); t_addr0 = newTemp(Ity_I64); - assign( t_rsp, binop(Iop_Sub64, getIReg64(R_RSP), mkU64(sz)) ); + vassert(vbi->guest_stack_redzone_size == 128); + assign( t_rsp, binop(Iop_Sub64, + getIReg64(R_RSP), + mkU64(sz + vbi->guest_stack_redzone_size)) ); putIReg64(R_RSP, mkexpr(t_rsp)); storeLE( mkexpr(t_rsp), getIRegE(sz, pfx, modrm) ); @@ -7436,7 +7439,9 @@ standard zero-extend rule */ if (op != BtOpNone) putIRegE(sz, pfx, modrm, loadLE(szToITy(sz), mkexpr(t_rsp)) ); - putIReg64(R_RSP, binop(Iop_Add64, mkexpr(t_rsp), mkU64(sz)) ); + putIReg64(R_RSP, binop(Iop_Add64, + mkexpr(t_rsp), + mkU64(sz + vbi->guest_stack_redzone_size)) ); } DIP("bt%s%c %s, %s\n",
Fixed (vex r1997). Jakub, thank you very much for the small test case. I had suspected since about Feb this year that there was a bug in the x86_64 integer simulation, but I could not figure out what it was, despite considerable effort. The smallest test case I managed to find was "firefox, built w/ gcc-4.4 -O2, running on V, crashes when loading http://en.wikipedia.org", so until your report I still had no idea what the problem was. This is really a critical emulation fix that should be pushed via the distros into shipping 3.5.0s or 3.6.0.SVNs. After a bit more testing, perhaps.