SUMMARY The RISCV64 ISA specifications say that the shift operations: - sra, srl, sll use the lowest 6 bits of rd2 - sraw, srlw, sllw use the lowest 5 bits of rd2 as amount to shift by. (Can be found in the RISCV unprivileged manual, chapter 4.2.1. Integer Register-Immediate Instructions) In `VEX/priv/guest_riscv64_toIR.c` lines 1534-1579 and lines 1713-1740 these shift instructions are decoded, but the lowest 8 bits of rs2 are taken as parameter for the shift. This can result in shifts by more than 63 for sra, srl, sll and more than 31 for raw, srlw, sllw. This was found while using angr (with pyvex for lifting), so there is no crash. STEPS TO REPRODUCE angr reproducer: https://github.com/Cskorpion/angr-shift-reproducer OBSERVED RESULT z3 formula for register x7 when executing the riscv instruction 0x41b5d3b3 (sra x7, x11, x27) with angr: reg_x11_9_64 >> Concat(0, Extract(7, 0, reg_x27_8_64)) EXPECTED RESULT reg_x11_9_64 >> Concat(0, Extract(5, 0, reg_x27_8_64)) ADDITIONAL INFORMATION Fix this issue by replacing `unop(Iop_64to8, getIReg64(rs2))` in corresponding lines with - 'binop(Iop_And8, mkU8(0b00111111), unop(Iop_64to8, getIReg64(rs2)))' for sra, srl, sll - 'binop(Iop_And8, mkU8(0b00011111), unop(Iop_64to8, getIReg64(rs2)))' for sraw, srlw, sllw
Do you have a non-python reproducer?
We (Christoph and I are collaborating on this) have thought about this a bit more. There is no Valgrind-level reproducer, and probably no real bug in Valgrind. The problem is more lack of documentation/clarity of the precise semantics of the VEX IR shift instructions, which the angr project misinterpreted. Why there is no Valgrind-level reproducer, using RISC-V sra as an example: sra gets translated into VEX IR Sar64(rs1, 64to8(rs2)). Sar64 then gets translated into an sra instruction again. If we execute the untranslated and translated variants with a value of rs2 >= 64, then both the untranslated and the translated form will perform a mask, ignoring all the higher bits. The mask of the translated form is performed in two steps, first masking of the highest 56 bits by the 64to8 instruction, then the sar will mask off two further bits before doing the actual shift. The resulting values in both situations are the same. The real "problem" is a vagueness of the semantics of the VEX shift instruction like Shl64/Shr64/Sar64 (and the other bitwidths too). How should they behave if the shift amount is larger than log2 of the first argument size? Is the shift amount implicitly masked and the upper bits ignored (this is what VEX RISC-V instruction selection does right now)? Or should the result be e.g. 0/-1 in the case of an arithmetic right shift with too large shift amount (this is how angr interprets the semantics)? Maybe the Valgrind project could give some kind of pronouncement/add a comment to the header files clarifying this point. (In my personal opinion Valgrind should probably decide that shifts perform an implicit mask. It's what much of the code seems to assume already anyway. It's my superficial impression that some parts of the Valgrind code might also benefit from clarity here. E.g. ir_opt.h doesn't constant-fold shifts with too large shift amounts, and there's a fixme about shift amounts on ARM for vector x vector shifts.)
Florian, since you are cleaning up the VEX constant folding and algebraic simplifications you might have an opinion on this?
Looks like a bug in IR generation to me. 4.2.2. Integer Register-Register Operations of the ISA spec says: SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in register rs2. In RV64I, only the low 6 bits of rs2 are considered for the shift amount. guest_riscv64_toIR.c does beginning on line 1551: case 0b001: expr = binop(Iop_Shl64, getIReg64(rs1), unop(Iop_64to8, getIReg64(rs2))); break; Thereby taking the lowest 8 and not 6 bits of the register as shift amount. Should be something like: expr = binop(Iop_Shl64, getIReg64(rs1), unop(Iop_64to8, binop(Iop_And64, getIReg64(rs2), mkU64(0x3F)))); As for the semantics of "shifting by a too large amount": VEX semantics doesn't say anything specific about this because what happens in that case may be architecture dependent. Adding a comment to that effect: If the value of the right operand is greater than or equal to the width of the left operand, the behavior is undefined. would be good. So ideally, if an architecture defines a behaviour for too large shift amounts, then it is the job of IR generation to implement it in order to avoid the undefined behaviour in the VEX IROp. In practice that might not happen, though, because typically a shift insn in the guest code will be implemented by the same insn in the jitted code.
Thanks for clarifying, Florian! Indeed, looking at aarch64 and amd64, they both explicitly introduce and-instructions to mask off the higher bits. We can prepare a patch to introduce the same masks in the RISC-V code, and maybe also add a comment about the intended semantics of VEX shifts. However, I am not sure what kind of test we should usefully add?
Patches are always welcome. The procedure is to open a bug here and attach the patch to it. As always: make regtest should pass. And you might want to check that angr now produces the result you expect to see. As for a testcase.... I understood there is no observable difference in a regular valgrind run. You can observe the generated IR by adding --trace-notbelow=0 --trace-flags=10000000 to the command line. But figuring out the superblock we're interested to see would be difficult to do in a robust manner which we would have to do for a testcase. So that's effectively a no-go. However, if you're keen to do add a testcase my recommendation would be to make a riscv version of auxprogs/s390-runone. That would the prerequisite to concocting a testcase.
Created attachment 185240 [details] patch for masking the shift amounts in VEX/priv/guest_riscv64_toIR.c The fix is attached as patch. angr works as expected when the shift amounts are masked. When running make regtest on riscv64 qemu with openSUSE Tumbleweed, some tests fail. All of the following tests fail both with and without the patch: == 760 tests, 17 stderr failures, 0 stdout failures, 0 stderrB failures, 1 stdoutB failure, 0 post failures == gdbserver_tests/hgtls (stdoutB) memcheck/tests/descr_belowsp (stderr) memcheck/tests/fwrite (stderr) memcheck/tests/linux/stack_changes (stderr) memcheck/tests/linux/sys-copy_file_range (stderr) memcheck/tests/linux/sys-preadv_pwritev (stderr) memcheck/tests/origin2-not-quite (stderr) memcheck/tests/sendmsg (stderr) memcheck/tests/writev1 (stderr) helgrind/tests/tc22_exit_w_lock (stderr) drd/tests/fork-parallel (stderr) drd/tests/fork-serial (stderr) drd/tests/std_thread2 (stderr) drd/tests/threaded-fork-vcs (stderr) drd/tests/threaded-fork (stderr) none/tests/track_bad (stderr) none/tests/track_new (stderr) none/tests/use_after_close (stderr) Additionally drd/tests/tls_threads seems to be flaky and fails sometimes. Are all tests supposed to pass on riscv64?
Thanks for the patch. Applied in 97831bbbc208f3c574095770aff9b19e5a2c6aae with some formatting changes. I don't have access to a riscv machine and there are no nightly regression runs AFAICT. So I do not know what the status is on that platform. Thanks for the before/after regression run. Yes, there are some DRD and helgrind tests that fail occasionally. It has been like that for at least a decade.. :-(