Bug 509157 - riscv64: shift instructions can behave wrong
Summary: riscv64: shift instructions can behave wrong
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (other bugs)
Version First Reported In: unspecified
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-09-05 15:06 UTC by Christoph Jung
Modified: 2025-09-30 20:24 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
patch for masking the shift amounts in VEX/priv/guest_riscv64_toIR.c (2.29 KB, patch)
2025-09-25 08:42 UTC, Christoph Jung
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christoph Jung 2025-09-05 15:06:59 UTC
SUMMARY

The RISCV64 ISA specifications say that the shift operations:
 - sra, srl, sll use the lowest 6 bits of rd2 
 - sraw, srlw, sllw use the lowest 5 bits  of rd2
as amount to shift by.
(Can be found in the RISCV unprivileged manual, chapter 4.2.1. Integer Register-Immediate Instructions)

In `VEX/priv/guest_riscv64_toIR.c` lines 1534-1579 and lines 1713-1740 these shift instructions are decoded, but the lowest 8 bits of rs2 are taken as parameter for the shift.
This can result in shifts by more than 63 for sra, srl, sll and more than 31 for raw, srlw, sllw.

This was found while using angr (with pyvex for lifting), so there is no crash.

STEPS TO REPRODUCE

angr reproducer: https://github.com/Cskorpion/angr-shift-reproducer

OBSERVED RESULT

z3 formula for register x7 when executing the riscv instruction 0x41b5d3b3 (sra x7, x11, x27) with angr:

reg_x11_9_64 >> Concat(0, Extract(7, 0, reg_x27_8_64))

EXPECTED RESULT

reg_x11_9_64 >> Concat(0, Extract(5, 0, reg_x27_8_64))

ADDITIONAL INFORMATION

Fix this issue by replacing `unop(Iop_64to8, getIReg64(rs2))` in corresponding lines with 
- 'binop(Iop_And8, mkU8(0b00111111), unop(Iop_64to8, getIReg64(rs2)))' for sra, srl, sll
- 'binop(Iop_And8, mkU8(0b00011111), unop(Iop_64to8, getIReg64(rs2)))' for sraw, srlw, sllw
Comment 1 Paul Floyd 2025-09-08 12:52:21 UTC
Do you have a non-python reproducer?
Comment 2 CF Bolz-Tereick 2025-09-09 11:13:05 UTC
We (Christoph and I are collaborating on this) have thought about this a bit more. There is no Valgrind-level reproducer, and probably no real bug in Valgrind. The problem is more lack of documentation/clarity of the precise semantics of the VEX IR shift instructions, which the angr project misinterpreted.

Why there is no Valgrind-level reproducer, using RISC-V sra as an example:
sra gets translated into VEX IR Sar64(rs1, 64to8(rs2)). Sar64 then gets translated into an sra instruction again. If we execute the untranslated and translated variants with a value of rs2 >= 64, then both the untranslated and the translated form will perform a mask, ignoring all the higher bits. The mask of the translated form is performed in two steps, first masking of the highest 56 bits by the 64to8 instruction, then the sar will mask off two further bits before doing the actual shift. The resulting values in both situations are the same.

The real "problem" is a vagueness of the semantics of the VEX shift instruction like Shl64/Shr64/Sar64 (and the other bitwidths too). How should they behave if the shift amount is larger than log2 of the first argument size? Is the shift amount implicitly masked and the upper bits ignored (this is what VEX RISC-V instruction selection does right now)? Or should the result be e.g. 0/-1 in the case of an arithmetic right shift with too large shift amount (this is how angr interprets the semantics)?

Maybe the Valgrind project could give some kind of pronouncement/add a comment to the header files clarifying this point.

(In my personal opinion Valgrind should probably decide that shifts perform an implicit mask. It's what much of the code seems to assume already anyway. It's my superficial impression that some parts of the Valgrind code might also benefit from clarity here. E.g. ir_opt.h doesn't constant-fold shifts with too large shift amounts, and there's a fixme about shift amounts on ARM for vector x vector shifts.)
Comment 3 Mark Wielaard 2025-09-12 14:35:22 UTC
Florian, since you are cleaning up the VEX constant folding and algebraic simplifications you might have an opinion on this?
Comment 4 Florian Krohm 2025-09-13 08:58:47 UTC
Looks like a bug in IR generation to me.

4.2.2. Integer Register-Register Operations of the ISA spec says:

SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register
rs1 by the shift amount held in register rs2. In RV64I, only the low 6 bits of rs2 are considered for the
shift amount.

guest_riscv64_toIR.c does  beginning on line 1551:

  case 0b001:
     expr = binop(Iop_Shl64, getIReg64(rs1),
                             unop(Iop_64to8, getIReg64(rs2)));
     break;

Thereby taking the lowest 8 and not 6 bits of the register as shift amount.
Should be something like:

    expr = binop(Iop_Shl64, getIReg64(rs1),   
                 unop(Iop_64to8, binop(Iop_And64, getIReg64(rs2), mkU64(0x3F))));

As for the semantics of "shifting by a too large amount":
VEX semantics doesn't say anything specific about this because what happens in that case
may be architecture dependent. 
Adding a comment to that effect:

If the value of the right operand is greater than or equal to the width of the left operand, 
the behavior is undefined.

would be good.
So ideally, if an architecture defines a behaviour for too large shift amounts, then it is the
job of IR generation to implement it in order to avoid the undefined behaviour in the VEX IROp.
In practice that might not happen, though, because typically a shift insn in the guest code
will be implemented by the same insn in the jitted code.
Comment 5 CF Bolz-Tereick 2025-09-13 12:50:53 UTC
Thanks for clarifying, Florian! Indeed, looking at aarch64 and amd64, they both explicitly introduce and-instructions to mask off the higher bits. We can prepare a patch to introduce the same masks in the RISC-V code, and maybe also add a comment about the intended semantics of VEX shifts. However, I am not sure what kind of test we should usefully add?
Comment 6 Florian Krohm 2025-09-13 19:57:12 UTC
Patches are always welcome. The procedure is to open a bug here and attach the patch to it.
As always: make regtest should pass. And you might want to check that angr now produces
the result you expect to see.

As for a testcase.... I understood there is no observable difference in a regular valgrind run.
You can observe the generated IR by adding --trace-notbelow=0 --trace-flags=10000000
to the command line. But  figuring out the superblock we're interested to see would be difficult
to do in a robust manner which we would have to do for a testcase. So that's effectively a no-go.
However, if you're keen to do add a testcase my recommendation would be to make a riscv
version of auxprogs/s390-runone.  That would the prerequisite to concocting a testcase.
Comment 7 Christoph Jung 2025-09-25 08:42:15 UTC
Created attachment 185240 [details]
patch for masking the shift amounts in VEX/priv/guest_riscv64_toIR.c

The fix is attached as patch.

angr works as expected when the shift amounts are masked.

When running make regtest on riscv64 qemu with openSUSE Tumbleweed, some tests fail.
All of the following tests fail both with and without the patch:

== 760 tests, 17 stderr failures, 0 stdout failures, 0 stderrB failures, 1 stdoutB failure, 0 post failures ==
gdbserver_tests/hgtls                    (stdoutB)
memcheck/tests/descr_belowsp             (stderr)
memcheck/tests/fwrite                    (stderr)
memcheck/tests/linux/stack_changes       (stderr)
memcheck/tests/linux/sys-copy_file_range (stderr)
memcheck/tests/linux/sys-preadv_pwritev  (stderr)
memcheck/tests/origin2-not-quite         (stderr)
memcheck/tests/sendmsg                   (stderr)
memcheck/tests/writev1                   (stderr)
helgrind/tests/tc22_exit_w_lock          (stderr)
drd/tests/fork-parallel                  (stderr)
drd/tests/fork-serial                    (stderr)
drd/tests/std_thread2                    (stderr)
drd/tests/threaded-fork-vcs              (stderr)
drd/tests/threaded-fork                  (stderr)
none/tests/track_bad                     (stderr)
none/tests/track_new                     (stderr)
none/tests/use_after_close               (stderr) 

Additionally drd/tests/tls_threads seems to be flaky and fails sometimes.

Are all tests supposed to pass on riscv64?
Comment 8 Florian Krohm 2025-09-30 20:24:41 UTC
Thanks for the patch. Applied in 97831bbbc208f3c574095770aff9b19e5a2c6aae with some formatting changes.

I don't have access to a riscv machine and there are no nightly regression runs AFAICT.
So I do not know what the status is on that platform. Thanks for the before/after regression run.

Yes, there are some DRD and helgrind tests that fail occasionally. It has been like that for at least a decade.. :-(