Platform: Ubuntu 12.04, AMD64, 3.2.0-53-generic See "Steps to reproduce" for valgrind's output. Reproducible: Always Steps to Reproduce: 1. Consider the code below: ----------------- code ------------------------ .text .globl _start _start: movq $8, %rbx movq $12, %rdx btq %rbx, %rdx #btc %rbx, %rdx #uncomment to reproduce #btr %rbx, %rdx #uncomment to reproduce #bts %rbx, %rdx #uncomment to reproduce # exit movq $60, %rax # use the _exit syscall movq $0, %rdi # error code 0 syscall # make syscall ----------------- end code ------------------------------- 2. Compile the code as follows: $as -o test.o test.s $ld -o test test.o 3. Run Valgrind as follows: $valgrind --vex-guest-chase-thresh=0 --trace-flags=10000000 --trace-notabove=999 ./test The out is as follows. ------------------------ Output ------------------------ 0x4000B0: movq $8, %rbx ------ IMark(0x4000B0, 7, 0) ------ PUT(40) = 0x8:I64 PUT(184) = 0x4000B7:I64 0x4000B7: movq $12, %rdx ------ IMark(0x4000B7, 7, 0) ------ PUT(32) = 0xC:I64 PUT(184) = 0x4000BE:I64 0x4000BE: btq %rbx, %rdx ------ IMark(0x4000BE, 4, 0) ------ t2 = GET:I64(40) t6 = Sub64(GET:I64(48),0x120:I64) PUT(48) = t6 STle(t6) = GET:I64(32) t7 = t6 t3 = And64(t2,0x3F:I64) t5 = Add64(t7,Sar64(t3,0x3:I8)) t4 = 64to8(And64(t3,0x7:I64)) t0 = LDle:I8(t5) PUT(144) = 0x0:I64 PUT(160) = 0x0:I64 PUT(152) = And64(Shr64(8Uto64(t0),t4),0x1:I64) PUT(168) = 0x0:I64 PUT(48) = Add64(t6,0x120:I64) PUT(184) = 0x4000C2:I64 [...] ---------------------- end output ----------------------- Actual Results: The VEX translation of bt, btc, btr, and bts seem incorrect. See below. x86-64 documetation says: bt, btc, btr and bts instructions test one bit of their first operand, whose index is given by the second operand, and store the value of that bit into the carry flag. However, as can be seen in valgrind output, 2 things don't add up: 1- line t6 = Sub64(GET:I64(48),0x120:I64) means it's reading register 48. However this register is never set prior to reading it. Maybe this is assumed to be 0 on startup? 2- the code is doing some erroneous calculations on certain registers that it interprets as memory addresses. Subsequently, these memory locations are used to read (t0 = LDle:I8(t5)) and store (STle(t6) = GET:I64(32)). This appears to be wrong because the bt instruction should not update its operands, let alone memory locations. Instructions btc, btr and bts update their operands. However, we found that when translated by VEX, their resulting IR accesses memory locations even if their operands are registers with constant values (see our test.s by un-commenting commented lines). When using memory address as operands for btc, btr and bts instruction, memory accesses also appear to be incorrect.
What you're seeing is the result of a kludge, in which btq for a register operand is implemented by pushing the argument on the (guest) stack temporarily, and then executing the same IR as for btq with a memory operand. Have a look at the relevant bits of guest_amd64_toIR.c. I'm sure it's documented there.
Honestly .. do you think any large program would actually run properly on Valgrind if these instructions had really been misimplemented?
Thx for your reply. The implementation surely works, but the workaround adds side effects that the original program would not have, eg when doing taint tracking -- which is what we were doing...