b9 00 00 00 00 mov ecx,0x0
0f 33 rdpmc
89 45 f8 mov DWORD PTR [ebp-8],eax
Not an easy instruction to simulate.
The RDTSC/RDTSCP instructions are supported - is RDPMC really that much more difficult?
Would it be sufficient to pass the RDPMC instruction through to the physical cpu, with the same ECX value? If that produces SIGILL (because PCE in CR4 is off), kill the client program, otherwise return in EDX:EAX whatever the native RDPMC returned. Event counts will be inflated, yes, but imo that is a well-understood effect of running programs under Valgrind.
Created attachment 118357 [details]
patch to enable AMD64 rdpmc instruction support
Patch to provide rdpmc support.
This has been tested with recent versions of the PAPI performance measurement library which use rdpmc
Vince, thanks for the patch. I have to say I am a little concerned
about landing it as-is, due to capability-check issues.
IIUC (and correct me if I am wrong): rdpmc support is something that
is either enabled or disabled at a process granularity (or larger), but
doesn't change whilst a process runs.
My concern here is that if we execute (as a guest insn, that is,
simulate) a rdpmc, then with your patch we'll simply pass that through
to the host. If the host doesn't support that then we'll get a SIGILL
on the host, which isn't what we want -- we need to deliver a SIGILL in
the guest context.
What this amounts to is that V needs to test at startup whether the
host can do rdpmc, and if it can't, then we simply decline to decode
the insn in the normal way and synthesise a SIGILL for the guest.
Is the above a correct understanding? If so I can try to hack up the
relevant hwcaps checking stuff. A framework for hwcaps is already in
place, so adding this one feature shouldn't be too much hassle.
(In reply to Julian Seward from comment #4)
> IIUC (and correct me if I am wrong): rdpmc support is something that
> is either enabled or disabled at a process granularity (or larger), but
> doesn't change whilst a process runs.
It's a bit more complicated than that :(
If you try to do a rdpmc and rdpmc is disabled in the CR4 register, you actually get a GPF:
[419335.853268] traps: rdpmc_invalid general protection fault ip:56259dd2d056 sp:7fffed404ac0 error:0 in rdpmc_invalid[56259dd2d000+1000]
The CR4 register status can change while the process is running. There's actually 3 states (I think) that happen on Linux, and this is configurable via /sys/devices/cpu/rdpmc
0. rdpmc disabled for everyone
1. rdpmc only enabled if a perf_event_open() enabled hardware event is active
2. rdpmc globally enabled for everyone
I think "1" is currently the default on modern Linux kernels. With this setting, the CR4 bit value might change while the program is running, it is enabled at perf_event_open() and disabled after close() of that event.
a further note, had to re-verify some of this because it's been a while since I dealt with the kernel side.
The CR4 value is context switched, so when rdpmc is auto-enabled it is only done so for the process using perf_event.
Also running perf_event_open() isn't enough to enable it, you also have to mmap() the fd returned by perf_event_open(). Only then will the PCE bit in CR4 be set.
Linux was actually buggy with this code for a while and the PCE bit state could get out-of-sync related to mmap() status, but that was eventually fixed.
If you want some tests to test this out
(In reply to Vince Weaver from comment #7)
> If you want some tests to test this out
sorry about that, hit enter at the wrong location in the browser.
If you want tests to test this out, see:
in the tests/rdpmc directory
So do we need two dirty helpers here? One that check CR4 to see if PCE is enabled and one to do the actual rdpmc. Then we can generate the SIGILL if the first one fails. But we might also have to check the ECX value to see whether the counter is actually valid. That might be trickier?