Bug 384526

Summary:	reduce number of spill instructions generated by VEX register allocator v3
Product:	[Developer tools] valgrind	Reporter:	Ivo Raisr <ivosh>
Component:	vex	Assignee:	Ivo Raisr <ivosh>
Status:	RESOLVED FIXED
Severity:	normal	CC:	ivosh
Priority:	NOR
Version:	3.14 SVN
Target Milestone:	---
Platform:	Compiled Sources
OS:	Linux
See Also:	https://bugs.kde.org/show_bug.cgi?id=384337
Latest Commit:		Version Fixed In:
Attachments:	proposed patch

Description Ivo Raisr 2017-09-09 15:32:09 UTC

VEX register allocator v2 has a clever optimization which reduces a number of spill instructions generated.

Let's have the following vcode:
...
 97: orq %vR250,%vR570
 98: movslq %vR570d,%vR569
 99: movq %vR329,%vR571
100: orq %vR569,%vR571
101: callnz[0,RLPri_None]

and the following vreg ranges:
%vR250: [ 12, 214)
%vR329: [ 72, 162)
%vR569: [ 98, 101)
%vR570: [ 95,  99)
%vR571: [ 99, 101)

Now reg. alloc. v2 produces the following rcode:
 82   orq %rbx,%r8
 83   movslq %r8d,%rdi
 84   movq 0xB28(%rbp),%r8
 85   movq %r8,%rsi
 86   orq %rdi,%rsi
 87   movq %r9,0xAE0(%rbp)
 88   callnz[0,RLPri_None] 0x5805E8E0

However current regalloc v3 produces more rcode:
 82   orq %rbx,%r8
 83   movslq %r8d,%rdi
 84   movq 0xB28(%rbp),%r8
 85   movq %r8,%rsi
 86   orq %rdi,%rsi
 87   movq %r9,0xAE0(%rbp)
 88   movq %r8,0xB28(%rbp)
 89   callnz[0,RLPri_None] 0x5805E8E0

It can be seen that spilling %r8 (instruction #88) is useless because its value is still equal to that one in the spill slot.

Comment 1 Ivo Raisr 2017-09-10 06:30:29 UTC

Created attachment 107778 [details]
proposed patch

Comment 2 Ivo Raisr 2017-09-10 06:41:40 UTC

We can see the following performance improvement:

running Memcheck on perf/bz2:
baseline: 45,132 M instructions total; 161 M instructions doRegisterAllocation_v3
patched:  45,107 M instructions total; 168 M instructions doRegisterAllocation_v3

running Memcheck on /bin/true:
baseline: 3,511 M instructions total; 59 M instructions doRegisterAllocation_v3
patched:  3,508 M instructions total; 61 M instructions doRegisterAllocation_v3

So it can be seen that even when running the simplest program (/bin/true), small additional cost of register allocation is compensated by much higher gain in terms of total instruction cost.

Comment 3 Ivo Raisr 2017-09-10 09:39:00 UTC

Fixed by changeset 3117cd9637a843cbab5de302fb30e22153fbfc1c:

https://sourceware.org/git/?p=valgrind.git;a=commit;h=3117cd9637a843cbab5de302fb30e22153fbfc1c