| Summary: | riscv64: shift instructions can behave wrong | ||
|---|---|---|---|
| Product: | [Developer tools] valgrind | Reporter: | Christoph Jung <christoph.j27> |
| Component: | vex | Assignee: | Julian Seward <jseward> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | cfbolz, flo2030, mark, pjfloyd |
| Priority: | NOR | ||
| Version First Reported In: | unspecified | ||
| Target Milestone: | --- | ||
| Platform: | Ubuntu | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed In: | ||
| Sentry Crash Report: | |||
| Attachments: | patch for masking the shift amounts in VEX/priv/guest_riscv64_toIR.c | ||
|
Description
Christoph Jung
2025-09-05 15:06:59 UTC
Do you have a non-python reproducer? We (Christoph and I are collaborating on this) have thought about this a bit more. There is no Valgrind-level reproducer, and probably no real bug in Valgrind. The problem is more lack of documentation/clarity of the precise semantics of the VEX IR shift instructions, which the angr project misinterpreted. Why there is no Valgrind-level reproducer, using RISC-V sra as an example: sra gets translated into VEX IR Sar64(rs1, 64to8(rs2)). Sar64 then gets translated into an sra instruction again. If we execute the untranslated and translated variants with a value of rs2 >= 64, then both the untranslated and the translated form will perform a mask, ignoring all the higher bits. The mask of the translated form is performed in two steps, first masking of the highest 56 bits by the 64to8 instruction, then the sar will mask off two further bits before doing the actual shift. The resulting values in both situations are the same. The real "problem" is a vagueness of the semantics of the VEX shift instruction like Shl64/Shr64/Sar64 (and the other bitwidths too). How should they behave if the shift amount is larger than log2 of the first argument size? Is the shift amount implicitly masked and the upper bits ignored (this is what VEX RISC-V instruction selection does right now)? Or should the result be e.g. 0/-1 in the case of an arithmetic right shift with too large shift amount (this is how angr interprets the semantics)? Maybe the Valgrind project could give some kind of pronouncement/add a comment to the header files clarifying this point. (In my personal opinion Valgrind should probably decide that shifts perform an implicit mask. It's what much of the code seems to assume already anyway. It's my superficial impression that some parts of the Valgrind code might also benefit from clarity here. E.g. ir_opt.h doesn't constant-fold shifts with too large shift amounts, and there's a fixme about shift amounts on ARM for vector x vector shifts.) Florian, since you are cleaning up the VEX constant folding and algebraic simplifications you might have an opinion on this? Looks like a bug in IR generation to me.
4.2.2. Integer Register-Register Operations of the ISA spec says:
SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register
rs1 by the shift amount held in register rs2. In RV64I, only the low 6 bits of rs2 are considered for the
shift amount.
guest_riscv64_toIR.c does beginning on line 1551:
case 0b001:
expr = binop(Iop_Shl64, getIReg64(rs1),
unop(Iop_64to8, getIReg64(rs2)));
break;
Thereby taking the lowest 8 and not 6 bits of the register as shift amount.
Should be something like:
expr = binop(Iop_Shl64, getIReg64(rs1),
unop(Iop_64to8, binop(Iop_And64, getIReg64(rs2), mkU64(0x3F))));
As for the semantics of "shifting by a too large amount":
VEX semantics doesn't say anything specific about this because what happens in that case
may be architecture dependent.
Adding a comment to that effect:
If the value of the right operand is greater than or equal to the width of the left operand,
the behavior is undefined.
would be good.
So ideally, if an architecture defines a behaviour for too large shift amounts, then it is the
job of IR generation to implement it in order to avoid the undefined behaviour in the VEX IROp.
In practice that might not happen, though, because typically a shift insn in the guest code
will be implemented by the same insn in the jitted code.
Thanks for clarifying, Florian! Indeed, looking at aarch64 and amd64, they both explicitly introduce and-instructions to mask off the higher bits. We can prepare a patch to introduce the same masks in the RISC-V code, and maybe also add a comment about the intended semantics of VEX shifts. However, I am not sure what kind of test we should usefully add? Patches are always welcome. The procedure is to open a bug here and attach the patch to it. As always: make regtest should pass. And you might want to check that angr now produces the result you expect to see. As for a testcase.... I understood there is no observable difference in a regular valgrind run. You can observe the generated IR by adding --trace-notbelow=0 --trace-flags=10000000 to the command line. But figuring out the superblock we're interested to see would be difficult to do in a robust manner which we would have to do for a testcase. So that's effectively a no-go. However, if you're keen to do add a testcase my recommendation would be to make a riscv version of auxprogs/s390-runone. That would the prerequisite to concocting a testcase. Created attachment 185240 [details]
patch for masking the shift amounts in VEX/priv/guest_riscv64_toIR.c
The fix is attached as patch.
angr works as expected when the shift amounts are masked.
When running make regtest on riscv64 qemu with openSUSE Tumbleweed, some tests fail.
All of the following tests fail both with and without the patch:
== 760 tests, 17 stderr failures, 0 stdout failures, 0 stderrB failures, 1 stdoutB failure, 0 post failures ==
gdbserver_tests/hgtls (stdoutB)
memcheck/tests/descr_belowsp (stderr)
memcheck/tests/fwrite (stderr)
memcheck/tests/linux/stack_changes (stderr)
memcheck/tests/linux/sys-copy_file_range (stderr)
memcheck/tests/linux/sys-preadv_pwritev (stderr)
memcheck/tests/origin2-not-quite (stderr)
memcheck/tests/sendmsg (stderr)
memcheck/tests/writev1 (stderr)
helgrind/tests/tc22_exit_w_lock (stderr)
drd/tests/fork-parallel (stderr)
drd/tests/fork-serial (stderr)
drd/tests/std_thread2 (stderr)
drd/tests/threaded-fork-vcs (stderr)
drd/tests/threaded-fork (stderr)
none/tests/track_bad (stderr)
none/tests/track_new (stderr)
none/tests/use_after_close (stderr)
Additionally drd/tests/tls_threads seems to be flaky and fails sometimes.
Are all tests supposed to pass on riscv64?
Thanks for the patch. Applied in 97831bbbc208f3c574095770aff9b19e5a2c6aae with some formatting changes. I don't have access to a riscv machine and there are no nightly regression runs AFAICT. So I do not know what the status is on that platform. Thanks for the before/after regression run. Yes, there are some DRD and helgrind tests that fail occasionally. It has been like that for at least a decade.. :-( |