For a particular architecture, VEX structure RRegUniverse describes real registers available to the register allocator and also those used only internally by the backend. For many architectures, such as ARM32, ARM64, MIPS, the callee saved registers are listed first in RRegUniverse's allocatable section. However AMD64, X86 and PPC have caller saved registers first. A small performance improvement for both VEX register allocator v2 and v3 has been observed if the callee saved registers are listed first. Register allocator is more likely to pick the callee saved register first and therefore does not need to spill them when calling helper functions. Measuring Memcheck on perf/bz2, instruction count: amd64 (Intel i5): ~~~~~~~~~~~~~~~~~ v2 baseline: 45,192 M total; 208 M register allocator v2 reorder: 45,183 M total; 205 M register allocator v3 baseline: 45,112 M total; 167 M register allocator v3 reorder: 45,094 M total; 165 M register allocator ppc (power8 le): ~~~~~~~~~~~~~~~~ v2 baseline: 61,983 M total; 397 M register allocator v2 reorder: 61,970 M total; 376 M register allocator v3 baseline: 61,939 M total; 348.6 M register allocator v3 reorder: 61,909 M total; 347.9 M register allocator It can be seen that v3 benefits more than v2 from the reordering.
Created attachment 107794 [details] patch
Mostly looks OK to me, except for the bit below: * why is it in this patch? It's not related to reordering the allocatable registers for amd64. * even if necessary, is it really correct? It removes RAX, RCX, RDX and R11 from the list of caller-saved regs, which will surely cause the allocator not to preserve them across calls. Maybe I misunderstand? @@ -1460,18 +1460,12 @@ void getRegUsage_AMD64Instr ( HRegUsage* u, const AMD64Instr* i, Bool mode64 ) /* This is a bit subtle. */ /* First off, claim it trashes all the caller-saved regs which fall within the register allocator's jurisdiction. - These I believe to be: rax rcx rdx rsi rdi r8 r9 r10 r11 - and all the xmm registers. - */ - addHRegUse(u, HRmWrite, hregAMD64_RAX()); - addHRegUse(u, HRmWrite, hregAMD64_RCX()); - addHRegUse(u, HRmWrite, hregAMD64_RDX()); - addHRegUse(u, HRmWrite, hregAMD64_RSI()); + These are: rdi rsi r8 r9 r10 and all the xmm registers. */ addHRegUse(u, HRmWrite, hregAMD64_RDI()); + addHRegUse(u, HRmWrite, hregAMD64_RSI()); addHRegUse(u, HRmWrite, hregAMD64_R8()); addHRegUse(u, HRmWrite, hregAMD64_R9()); addHRegUse(u, HRmWrite, hregAMD64_R10()); - addHRegUse(u, HRmWrite, hregAMD64_R11()); addHRegUse(u, HRmWrite, hregAMD64_XMM0()); addHRegUse(u, HRmWrite, hregAMD64_XMM1()); addHRegUse(u, HRmWrite, hregAMD64_XMM3());
(In reply to Julian Seward from comment #2) > Mostly looks OK to me, except for the bit below: [..] Following irc chat with Ivo, it seems I misunderstood this part of the patch. Sorry about that. So -- to me it looks fine now.
The part with %rax, %rcx, %rdx, %r11 removal from getRegUsage_AMD64Instr() will be removed from the patch. The reason is explained in greater detail in: Bug 384676 - VEX AMD64 backend should list more real registers as available for the register allocator https://bugs.kde.org/show_bug.cgi?id=384676
Committed as 00d4667295a821fef9eb198abcb0c942dffb6045. https://sourceware.org/git/?p=valgrind.git;a=commitdiff;h=00d4667295a821fef9eb198abcb0c942dffb6045
At the end only %r11 was removed from the list in getRegUsage_AMD64Instr() because it's used overall as a temporary register for destination address. %rax, %rcx, %rdx could in theory be used as allocatable but performance measurements did not prove it worthwhile (see bug 384676).