Bug 126348

Summary: (rdpmc) vex x86->IR: unhandled instruction bytes: 0xF 0x33 0x89 0x45
Product: [Developer tools] valgrind Reporter: Thomas Kühne <thomas-dloop>
Component: vexAssignee: Julian Seward <jseward>
Status: REPORTED ---    
Severity: crash CC: akruppa, mark, vincent.weaver
Priority: NOR    
Version: 3.2 SVN   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Bug Depends on:    
Bug Blocks: 256630    
Attachments: patch to enable AMD64 rdpmc instruction support

Description Thomas Kühne 2006-04-27 13:37:23 UTC
b9 00 00 00 00          mov    ecx,0x0
0f 33                   rdpmc  
89 45 f8                mov    DWORD PTR [ebp-8],eax
Comment 1 Julian Seward 2006-05-03 16:34:21 UTC
Not an easy instruction to simulate.
Comment 2 akruppa 2016-09-22 15:31:11 UTC
The RDTSC/RDTSCP instructions are supported - is RDPMC really that much more difficult?

Would it be sufficient to pass the RDPMC instruction through to the physical cpu, with the same ECX value? If that produces SIGILL (because PCE in CR4 is off), kill the client program, otherwise return in EDX:EAX whatever the native RDPMC returned. Event counts will be inflated, yes, but imo that is a well-understood effect of running programs under Valgrind.
Comment 3 Vince Weaver 2019-02-25 16:26:42 UTC
Created attachment 118357 [details]
patch to enable AMD64 rdpmc instruction support

Patch to provide rdpmc support.

This has been tested with recent versions of the PAPI performance measurement library which use rdpmc
Comment 4 Julian Seward 2019-02-26 09:04:47 UTC
Vince, thanks for the patch.  I have to say I am a little concerned
about landing it as-is, due to capability-check issues.

IIUC (and correct me if I am wrong): rdpmc support is something that
is either enabled or disabled at a process granularity (or larger), but
doesn't change whilst a process runs.

My concern here is that if we execute (as a guest insn, that is,
simulate) a rdpmc, then with your patch we'll simply pass that through
to the host.  If the host doesn't support that then we'll get a SIGILL
on the host, which isn't what we want -- we need to deliver a SIGILL in
the guest context.

What this amounts to is that V needs to test at startup whether the 
host can do rdpmc, and if it can't, then we simply decline to decode
the insn in the normal way and synthesise a SIGILL for the guest.

Is the above a correct understanding?  If so I can try to hack up the
relevant hwcaps checking stuff.  A framework for hwcaps is already in
place, so adding this one feature shouldn't be too much hassle.
Comment 5 Vince Weaver 2019-02-26 17:31:27 UTC
(In reply to Julian Seward from comment #4) 
> IIUC (and correct me if I am wrong): rdpmc support is something that
> is either enabled or disabled at a process granularity (or larger), but
> doesn't change whilst a process runs.

It's a bit more complicated than that :(

If you try to do a rdpmc and rdpmc is disabled in the CR4 register, you actually get a GPF:

[419335.853268] traps: rdpmc_invalid[31138] general protection fault ip:56259dd2d056 sp:7fffed404ac0 error:0 in rdpmc_invalid[56259dd2d000+1000]

The CR4 register status can change while the process is running.  There's actually 3 states (I think) that happen on Linux, and this is configurable via /sys/devices/cpu/rdpmc

0. rdpmc disabled for everyone
1. rdpmc only enabled if a perf_event_open() enabled hardware event is active
2. rdpmc globally enabled for everyone

I think "1" is currently the default on modern Linux kernels.  With this setting, the CR4 bit value might change while the program is running, it is enabled at perf_event_open() and disabled after close() of that event.
Comment 6 Vince Weaver 2019-02-26 17:46:47 UTC
a further note, had to re-verify some of this because it's been a while since I dealt with the kernel side.

The CR4 value is context switched, so when rdpmc is auto-enabled it is only done so for the process using perf_event.

Also running perf_event_open() isn't enough to enable it, you also have to mmap() the fd returned by perf_event_open().  Only then will the PCE bit in CR4 be set.

Linux was actually buggy with this code for a while and the PCE bit state could get out-of-sync related to mmap() status, but that was eventually fixed.
Comment 7 Vince Weaver 2019-02-26 17:54:20 UTC
If you want some tests to test this out
Comment 8 Vince Weaver 2019-02-26 17:56:15 UTC
(In reply to Vince Weaver from comment #7)
> If you want some tests to test this out

sorry about that, hit enter at the wrong location in the browser.

If you want tests to test this out, see:
    https://github.com/deater/perf_event_tests
in the tests/rdpmc directory

specifically
  rdpmc_validation
and
  rdpmc_invalid
Comment 9 Mark Wielaard 2019-08-09 10:37:11 UTC
So do we need two dirty helpers here? One that check CR4 to see if PCE is enabled and one to do the actual rdpmc. Then we can generate the SIGILL if the first one fails. But we might also have to check the ECX value to see whether the counter is actually valid. That might be trickier?