Bug 511931

Summary: vex: improve register allocator when non-virtual registers are assigned in insn selection
Product: [Developer tools] valgrind Reporter: Florian Krohm <flo2030>
Component: vexAssignee: Julian Seward <jseward>
Status: REPORTED ---    
Severity: normal    
Priority: NOR    
Version First Reported In: unspecified   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Florian Krohm 2025-11-10 20:45:26 UTC
It seems that the register allocator does not like when insn selection assigns 
non-virtual registers. I observe unneeded register-register copies in that case.
This happens on s390 where 128-bit values for BFP and DFP need to be located in
a register pair (p,q) such that regno(q) == regno(p) + 2
The reg allocator does not know about that requirement and hence non-virtual register
pairs are assigned during insn selection to avoid SIGILLs.
To demonstrate:

auxprogs/s390-runone -t -i "sqxbr %f0,%f1" > foo.c
auxprogs/s390-runone -b foo.c
vg-in-place --tool=none --trace-notbelow=0 --trace-flags=00000110 ./foo >& foo.trace

The register allocated code has this:

v-load   %f7,80(%r13) 
v-load   %f6,112(%r13) 
v-move   %f13,%f7
v-move   %f15,%f6
v-fsqrt  %f12,%f13 
v-move   %f7,%f12
v-move   %f6,%f14
v-store  %f7,64(%r13) 
v-store  %f6,96(%r13) 

wasting two registers (f6, f7) just to move things around.
It should be:

v-load   %f13,80(%r13) 
v-load   %f15,112(%r13)
v-fsqrt  %f12,%f13 
v-store  %f12,64(%r13) 
v-store  %f14,96(%r13) 

It gets significantly worse when computations get a bit more complex. E.g. try adding 3 numbers..