Bug 384584 - Callee saved registers listed first for AMD64, X86, and PPC architectures
Summary: Callee saved registers listed first for AMD64, X86, and PPC architectures
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.14 SVN
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Ivo Raisr
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-11 08:10 UTC by Ivo Raisr
Modified: 2017-09-14 12:53 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
patch (14.38 KB, patch)
2017-09-11 08:14 UTC, Ivo Raisr
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ivo Raisr 2017-09-11 08:10:16 UTC
For a particular architecture, VEX structure RRegUniverse describes real registers available to the register allocator and also those used only internally by the backend.

For many architectures, such as ARM32, ARM64, MIPS, the callee saved registers are listed first in RRegUniverse's allocatable section.
However AMD64, X86 and PPC have caller saved registers first.

A small performance improvement for both VEX register allocator v2 and v3 has been observed if the callee saved registers are listed first.
Register allocator is more likely to pick the callee saved register first and therefore does not need to spill them when calling helper functions.


Measuring Memcheck on perf/bz2, instruction count:

amd64 (Intel i5):
~~~~~~~~~~~~~~~~~
v2 baseline: 45,192 M total; 208 M register allocator
v2 reorder:  45,183 M total; 205 M register allocator
v3 baseline: 45,112 M total; 167 M register allocator
v3 reorder:  45,094 M total; 165 M register allocator

ppc (power8 le):
~~~~~~~~~~~~~~~~
v2 baseline: 61,983 M total; 397 M register allocator
v2 reorder:  61,970 M total; 376 M register allocator
v3 baseline: 61,939 M total; 348.6 M register allocator
v3 reorder:  61,909 M total; 347.9 M register allocator


It can be seen that v3 benefits more than v2 from the reordering.
Comment 1 Ivo Raisr 2017-09-11 08:14:38 UTC
Created attachment 107794 [details]
patch
Comment 2 Julian Seward 2017-09-12 19:16:45 UTC
Mostly looks OK to me, except for the bit below:

* why is it in this patch?  It's not related to reordering the allocatable
  registers for amd64.

* even if necessary, is it really correct?  It removes RAX, RCX, RDX and R11
  from the list of caller-saved regs, which will surely cause the allocator
  not to preserve them across calls.  Maybe I misunderstand?

@@ -1460,18 +1460,12 @@ void getRegUsage_AMD64Instr ( HRegUsage* u, const AMD64Instr* i, Bool mode64 )
          /* This is a bit subtle. */
          /* First off, claim it trashes all the caller-saved regs
             which fall within the register allocator's jurisdiction.
-            These I believe to be: rax rcx rdx rsi rdi r8 r9 r10 r11 
-            and all the xmm registers.
-         */
-         addHRegUse(u, HRmWrite, hregAMD64_RAX());
-         addHRegUse(u, HRmWrite, hregAMD64_RCX());
-         addHRegUse(u, HRmWrite, hregAMD64_RDX());
-         addHRegUse(u, HRmWrite, hregAMD64_RSI());
+            These are: rdi rsi r8 r9 r10 and all the xmm registers. */
          addHRegUse(u, HRmWrite, hregAMD64_RDI());
+         addHRegUse(u, HRmWrite, hregAMD64_RSI());
          addHRegUse(u, HRmWrite, hregAMD64_R8());
          addHRegUse(u, HRmWrite, hregAMD64_R9());
          addHRegUse(u, HRmWrite, hregAMD64_R10());
-         addHRegUse(u, HRmWrite, hregAMD64_R11());
          addHRegUse(u, HRmWrite, hregAMD64_XMM0());
          addHRegUse(u, HRmWrite, hregAMD64_XMM1());
          addHRegUse(u, HRmWrite, hregAMD64_XMM3());
Comment 3 Julian Seward 2017-09-13 16:54:46 UTC
(In reply to Julian Seward from comment #2)
> Mostly looks OK to me, except for the bit below: [..]

Following irc chat with Ivo, it seems I misunderstood this part
of the patch.  Sorry about that.  So -- to me it looks fine now.
Comment 4 Ivo Raisr 2017-09-13 21:47:58 UTC
The part with %rax, %rcx, %rdx, %r11 removal from getRegUsage_AMD64Instr() will be removed from the patch.

The reason is explained in greater detail in:
Bug 384676 - VEX AMD64 backend should list more real registers as available for the register allocator
https://bugs.kde.org/show_bug.cgi?id=384676
Comment 5 Ivo Raisr 2017-09-14 12:51:50 UTC
Committed as 00d4667295a821fef9eb198abcb0c942dffb6045.
https://sourceware.org/git/?p=valgrind.git;a=commitdiff;h=00d4667295a821fef9eb198abcb0c942dffb6045
Comment 6 Ivo Raisr 2017-09-14 12:53:57 UTC
At the end only %r11 was removed from the list in getRegUsage_AMD64Instr() because it's used overall as a temporary register for destination address.

%rax, %rcx, %rdx could in theory be used as allocatable but performance measurements did not prove it worthwhile (see bug 384676).