It seems that the register allocator does not like when insn selection assigns non-virtual registers. I observe unneeded register-register copies in that case. This happens on s390 where 128-bit values for BFP and DFP need to be located in a register pair (p,q) such that regno(q) == regno(p) + 2 The reg allocator does not know about that requirement and hence non-virtual register pairs are assigned during insn selection to avoid SIGILLs. To demonstrate: auxprogs/s390-runone -t -i "sqxbr %f0,%f1" > foo.c auxprogs/s390-runone -b foo.c vg-in-place --tool=none --trace-notbelow=0 --trace-flags=00000110 ./foo >& foo.trace The register allocated code has this: v-load %f7,80(%r13) v-load %f6,112(%r13) v-move %f13,%f7 v-move %f15,%f6 v-fsqrt %f12,%f13 v-move %f7,%f12 v-move %f6,%f14 v-store %f7,64(%r13) v-store %f6,96(%r13) wasting two registers (f6, f7) just to move things around. It should be: v-load %f13,80(%r13) v-load %f15,112(%r13) v-fsqrt %f12,%f13 v-store %f12,64(%r13) v-store %f14,96(%r13) It gets significantly worse when computations get a bit more complex. E.g. try adding 3 numbers..