Bug 374596 - inconsistent RDTSCP support on x86_64
Summary: inconsistent RDTSCP support on x86_64
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (other bugs)
Version First Reported In: 3.12.0
Platform: RedHat Enterprise Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-05 14:31 UTC by bugzilla
Modified: 2023-04-18 09:26 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
g++ source for testing RDTSCP support (407 bytes, text/x-csrc)
2017-01-05 14:31 UTC, bugzilla
Details
modified with RDTCSP in separate non-inlined function (467 bytes, text/x-csrc)
2017-01-06 21:21 UTC, bugzilla
Details
valgrind -v --tool=memcheck ./rdtscp2 (12.34 KB, text/plain)
2023-02-08 18:23 UTC, wojnilowicz
Details
/proc/cpuinfo of Intel Core2Duo (1.86 KB, text/plain)
2023-02-08 18:24 UTC, wojnilowicz
Details
[PATCH] Don't use SSE4.2 on Core2Duo (763 bytes, patch)
2023-02-08 19:59 UTC, wojnilowicz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bugzilla 2017-01-05 14:31:35 UTC
Created attachment 103212 [details]
g++ source for testing RDTSCP support

The attached test program attempts to determine support for, and then use, RDTSCP instruction. On CPU that does not support RDTSCP is correctly reports that the instruction is not supported and does not execute it. Otherwise it does, without error.

Under valgrind, on a CPU that does not support RDTSCP the opcode is reported as unsupported even though the program never executes it:

vex amd64->IR: unhandled instruction bytes: 0xF 0x1 0xF9 0xBE 0xD8 0x9 0x40 0x0 0xBF 0x80

Strangely (due to VEX simulating a different CPU stepping) if we comment out the code that executes RDTCSP, the program under valgrind then reports the instruction as being supported.

Expected behavior: under valgrind on a CPU that does not support RDTSCP the program should not crash. valgrind (vex) should simulate the instruction successfully since it advertises support for it.
Comment 1 Tom Hughes 2017-01-05 14:42:49 UTC
As far as I can see RDTSCP was implemented in VEX r2701 for BZ#251569.

Are you trying to use it in 32 bit code?
Comment 2 bugzilla 2017-01-05 20:17:37 UTC
No, this is on a 12-core 64-bit system, apparently running under libvirt.

/etc/redhat-release = CentOS release 6.6

/proc/cpuinfo = 

processor  : 0
vendor_id  : GenuineIntel
cpu family  : 6
model  : 13
model name  : QEMU Virtual CPU version (cpu64-rhel6)
stepping  : 3
microcode  : 1
cpu MHz  : 2933.436
cache size  : 4096 KB
physical id  : 0
siblings  : 1
core id  : 0
cpu cores  : 1
apicid  : 0
initial apicid  : 0
fpu  : yes
fpu_exception  : yes
cpuid level  : 4
wp  : yes
flags  : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm unfair_spinlock pni cx16 hypervisor lahf_lm
bogomips  : 5866.87
clflush size  : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management:

...etc.
Comment 3 Tom Hughes 2017-01-05 20:21:39 UTC
Sure, but is the program compiled as 64 bit or 32 bit? It's using 32 bit register names in the assembly but that might be normal for RDTSCP which is why I asked how you were compiling it.
Comment 4 Tom Hughes 2017-01-05 20:30:47 UTC
Ah sorry I misunderstood your original report...

You're saying that valgrind aborts on the instruction even though you don't try and execute it. My guess is that it's happening because that will be reported at translation time and valgrind translates instructions in blocks so it may translate an instruction that never gets executed.
Comment 5 bugzilla 2017-01-06 03:06:46 UTC
(In reply to Tom Hughes from comment #3)
> Sure, but is the program compiled as 64 bit or 32 bit? It's using 32 bit
> register names in the assembly but that might be normal for RDTSCP which is
> why I asked how you were compiling it.

64 bit platform and tools.
The register names are only specifying "clobbers" to the assembler template.
The 'e' prefix for CPUID is appropriate (CPUID clobbers ecx).
The 'r' prefix on a register name indicates 64 bit.
Comment 6 bugzilla 2017-01-06 03:32:52 UTC
(In reply to Tom Hughes from comment #4)
> Ah sorry I misunderstood your original report...
> 
> You're saying that valgrind aborts on the instruction even though you don't
> try and execute it. My guess is that it's happening because that will be
> reported at translation time and valgrind translates instructions in blocks
> so it may translate an instruction that never gets executed.

This is contrary to how the processor works.

A program can have potentially any number of regions in the code segment that do not contain valid opcodes and are never executed (despite routinely making their way into the processor's prefetch/decode queue.) An illegal instruction exception only arises from an actual attempted execution.

But let's suppose your guess about valgrind's behavior is correct. How would one rewrite this test program to ensure that the inclusion (but not execution) of the RDTSCP opcode would not provoke this problem under valgrind?
Comment 7 Tom Hughes 2017-01-06 07:12:05 UTC
I'm not saying it isn't a bug, just explaining what I think is causing it.

What I do know is it's not likely to be easy to fix, but it probably needs Julian to comment in more detail about whether it might be fixable and whether there is any way to word around it.

I would guess that putting the RDTSCP in a separate function from the check might work, so long as the compiler doesn't optimise them back together...
Comment 8 bugzilla 2017-01-06 21:20:47 UTC
(In reply to Tom Hughes from comment #7)
> I'm not saying it isn't a bug, just explaining what I think is causing it.
> 
> What I do know is it's not likely to be easy to fix, but it probably needs
> Julian to comment in more detail about whether it might be fixable and
> whether there is any way to word around it.
> 
> I would guess that putting the RDTSCP in a separate function from the check
> might work, so long as the compiler doesn't optimise them back together...

In the second attachment rdtscp2.cpp, the instruction is relegated to a separate function rdtcsp(), with inlining disabled. Execution proceeds through the negative flow path (the output "RDTCSP not supported" proves this), meaning we never call that function. But we still get the SIGILL from valgrind:

RDTSCP not supported
3:28pm mrec-build2.812 ~/dev% ~/bin/valgrind ./a.out
==14720== Memcheck, a memory error detector
==14720== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==14720== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==14720== Command: ./a.out
==14720==
vex amd64->IR: unhandled instruction bytes: 0xF 0x1 0xF9 0xC9 0xC3 0x55 0x48 0x89 0xE5 0x48
vex amd64->IR:  REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:  VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:  PFX.66=0 PFX.F2=0 PFX.F3=0
==14720== valgrind: Unrecognised instruction at address 0x400818.
==14720==  at 0x400818: rdtscp() (in /home/joev/dev/a.out)
==14720==  by 0x400848: main (in /home/joev/dev/a.out)
==14720== Your program just tried to execute an instruction that Valgrind
==14720== did not recognise.  There are two possible reasons for this.
==14720== 1. Your program has a bug and erroneously jumped to a non-code
==14720==  location.  If you are running Memcheck and you just saw a
==14720==  warning about a bad jump, it's probably your program's fault.
==14720== 2. The instruction is legitimate but Valgrind doesn't handle it,
==14720==  i.e. it's Valgrind's fault.  If you think this is the case or
==14720==  you are not sure, please let us know and we'll try to fix it.
==14720== Either way, Valgrind will now raise a SIGILL signal which will
==14720== probably kill your program.
==14720==
==14720== Process terminating with default action of signal 4 (SIGILL)
==14720==  Illegal opcode at address 0x400818
==14720==  at 0x400818: rdtscp() (in /home/joev/dev/a.out)
==14720==  by 0x400848: main (in /home/joev/dev/a.out)
==14720==
==14720== HEAP SUMMARY:
==14720==  in use at exit: 0 bytes in 0 blocks
==14720==  total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==14720==
==14720== All heap blocks were freed -- no leaks are possible
==14720==
==14720== For counts of detected and suppressed errors, rerun with: -v
==14720== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
Illegal instruction
Comment 9 bugzilla 2017-01-06 21:21:53 UTC
Created attachment 103244 [details]
modified with RDTCSP in separate non-inlined function
Comment 10 Philippe Waroquiers 2017-01-07 17:10:07 UTC
I guess that the problem is because VEX (somewhat) examines the 
cpu it is running on, to advertise to the guest program another model of
cpu, chosen in a limited nr of predefined models : see guest_amd64_toIR.c
handling of the CPUID instruction.
I am however wondering what VEX advertises on this qemu cpu.
According to the VEX code, in your case, it should advertise a basic cpu
that has no RDTSCP.

Can you run
valgrind --trace-flags=10000000 --trace-notbelow=1 --tool=none cpuid|&grep -i 'dirty.*cpuid'
and see what this gives ?

I am also wondering if m_machine.c sets have_rdtscp to True.
Can you also do:
valgrind --tool=none -v -v -v -d -d -d date|&grep 'arch ='


(for me, these 2 commands give:
DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_avx_and_cx16{0x3817bd10}(BBPTR)

--7119:1:    main ... arch = AMD64, hwcaps = amd64-cx16-lzcnt-rdtscp-sse3-avx-avx2-bmi

I find the above bizarre: the reported arch has sse3/cx16/avx2 but the called dirty helper
is amd64g_dirtyhelper_CPUID_avx_and_cx16, while I was expecting amd64g_dirtyhelper_CPUID_avx2
Comment 11 Tom Hughes 2017-01-07 17:42:59 UTC
No that's not the problem at all. Yes we may sometimes advertise different flags from the real CPU but the issue here is that we advertise that we don't support an instruction and the client program acts on that but valgrind still tries to translate the instruction (because it is translating a whole block) and faults on translating it because it thinks it is emulating a CPU that doesn't have it.

So the issue is that valgrind is translating (and faulting an instruction) that is never doing to be executed. At least that is the conclusion I came to.
Comment 12 Philippe Waroquiers 2017-01-07 17:51:28 UTC
(In reply to Tom Hughes from comment #11)
> No that's not the problem at all. Yes we may sometimes advertise different
> flags from the real CPU but the issue here is that we advertise that we
> don't support an instruction
Do we ?
See below extract from comment 1:
> Strangely (due to VEX simulating a different CPU stepping) if we comment out 
> the code that executes RDTCSP, the program under valgrind then reports the 
> instruction as being supported.
So, I am wondering what Valgrind really detects and reports.

Maybe there is something strange there (as in my case, even if my 
strange case is avx2 related, not rdtscp related : for me, it reports
an avx2 flag, but calls a non avx2 dirty helper.
Comment 13 bugzilla 2017-01-11 21:33:28 UTC
(In reply to Philippe Waroquiers from comment #10)
> I guess that the problem is because VEX (somewhat) examines the 
> cpu it is running on, to advertise to the guest program another model of
> cpu, chosen in a limited nr of predefined models : see guest_amd64_toIR.c
> handling of the CPUID instruction.
> I am however wondering what VEX advertises on this qemu cpu.
> According to the VEX code, in your case, it should advertise a basic cpu
> that has no RDTSCP.
> 
> Can you run
> valgrind --trace-flags=10000000 --trace-notbelow=1 --tool=none cpuid|&grep
> -i 'dirty.*cpuid'
> and see what this gives ?
> 
> I am also wondering if m_machine.c sets have_rdtscp to True.
> Can you also do:
> valgrind --tool=none -v -v -v -d -d -d date|&grep 'arch ='
> 
> 
> (for me, these 2 commands give:
> DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) :::
> amd64g_dirtyhelper_CPUID_avx_and_cx16{0x3817bd10}(BBPTR)
> 
> --7119:1:    main ... arch = AMD64, hwcaps =
> amd64-cx16-lzcnt-rdtscp-sse3-avx-avx2-bmi
> 
> I find the above bizarre: the reported arch has sse3/cx16/avx2 but the
> called dirty helper
> is amd64g_dirtyhelper_CPUID_avx_and_cx16, while I was expecting
> amd64g_dirtyhelper_CPUID_avx2

There was no cpuid utility available on our host, so we substituted an internal 'procinfo' utility that emits similar details; I hope that it gives you the information you wanted for that case:

% ./valgrind --trace-flags=10000000 --trace-notbelow=1 --tool=none procinfo |& grep -i 'dirty.*cpuid'
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
              DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8) WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
Illegal instruction
4:24pm mrec-build2.873 ~/bin% ./valgrind --tool=none -v -v -v -d -d -d date | & grep 'arch ='
--28512:1:    main ... arch = AMD64, hwcaps = amd64-cx16-sse3
% ./valgrind --version
valgrind-3.12.0
% lsb_release -i
Distributor ID: CentOS
% lsb_release -r
Release:        6.6
% uname -a
Linux mrec-build2 2.6.32-504.12.2.el6.x86_64 #1 SMP Wed Mar 11 22:03:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Comment 14 Philippe Waroquiers 2017-01-11 23:05:12 UTC
(In reply to bugzilla from comment #13)
> (In reply to Philippe Waroquiers from comment #10)

> There was no cpuid utility available on our host, so we substituted an
> internal 'procinfo' utility that emits similar details; I hope that it gives
> you the information you wanted for that case:
> 
> % ./valgrind --trace-flags=10000000 --trace-notbelow=1 --tool=none procinfo
> |& grep -i 'dirty.*cpuid'
>               DIRTY 1:I1 MoFX-gst(16,8) WrFX-gst(40,8) MoFX-gst(24,8)
> WrFX-gst(32,8) ::: amd64g_dirtyhelper_CPUID_sse42_and_cx16{0x380eea60}(BBPTR)
So, valgrind pretends to your program to be an sse42/cx16 machine, having RDTSCP


> 4:24pm mrec-build2.873 ~/bin% ./valgrind --tool=none -v -v -v -d -d -d date
> | & grep 'arch ='
> --28512:1:    main ... arch = AMD64, hwcaps = amd64-cx16-sse3
But has not detected RDTSCP on the 'real cpu' hwcaps.
And when it decodes the instruction, it examines the hwcaps, and not what
it has pretended to be to the guest application.

In other words, when your application calls the CPUID instruction,
valgrind executes amd64g_dirtyhelper_CPUID_sse42_and_cx16, which tells RDTSCP is available.
Then your application (correctly) assumes it can call RDTSCP, but then Valgrind refuses to
decode it, because the hwcaps it has derived from cpuid call
indicates there is no RDTSCP (which is the case:
your QEMU simulated cpu does not have RDTSCP).

What I still do not understand is that valgrind calls 
amd64g_dirtyhelper_CPUID_sse42_and_cx16
only if hwcaps contains SSE3 and CX16.
These 2 flags are reported by by the '.... | grep 'arch =' command.
However, your cat /proc/cpuinfo shows a cx16 flag but does not show an sse3 flag.

So, I am wondering by which miracle m_machine.c has found the sse3 indicator by calling
cpuid. Maybe there is a bug in QEMU cpuid instruction ? 
What is your procinfo procedure giving ?
Does this report the same flags as cat /proc/cpuinfo ?
In particular, does it tell that sse3 is available ?


It would be nice if you could install the cpuid rpm : as far as I can see,
it should be available under centos.
Then we can check the consistency between
    cat /proc/cpuinfo  (no sse3 found)
    valgrind (that seems to find sse3)
    your procinfo program : ????
    cpuid : ....


If the (wrong) detection of sse3 is really the root cause of wrongly pretending being RDTSCP,
you might bypass the problem in m_machine.c by assigning False to have_sse3,
rather than deriving it from ecx.
So, in the amd64 section, replace
     have_sse3 = (ecx & (1<<0)) != 0;  /* True => have sse3 insns */
by
     have_sse3 = False;

If this solves the problem, then we can be reasonably sure the decoding is not the problem,
but is is purely related to cpu model.
If after that patch, we still have a decoding problem, then we might have both
some cpu model problem and/or a basic problem that valgrind decodes an instruction
that it will not execute.
Comment 15 Christopher Yeleighton 2022-10-19 23:46:24 UTC
In order to reproduce: { valgrind kontact; }
vex amd64->IR: unhandled instruction bytes: 0xF 0x1 0xF9 0x48 0xC1 0xE2 0x20 0x48 0x9 0xD0
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==28292== valgrind: Unrecognised instruction at address 0x389653d2.
==28292==    at 0x389653D2: hwy::platform::TimerResolution() (in /usr/lib64/libhwy.so.1.0.1)

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
stepping	: 11
microcode	: 0xba
cpu MHz		: 2394.207
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm pti dtherm
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 4788.41
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:
Comment 16 wojnilowicz 2023-02-08 18:23:34 UTC
Created attachment 156080 [details]
valgrind -v --tool=memcheck ./rdtscp2

Code from comment #9 compiled with command:
"gcc -lstdc++ rdtscp2.cpp -o rdtscp2"
where:
"gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)"
gives:
"vex amd64->IR: unhandled instruction bytes: 0xF 0x1 0xF9 0x90 0x5D 0xC3 0x55 0x48 0x89 0xE5"
when executed by:
"valgrind -v --tool=memcheck ./rdtscp2"
where:
valgrind-3.20.0
on:
"Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz"

Terminal log attached. Please fix it.
Comment 17 wojnilowicz 2023-02-08 18:24:53 UTC
Created attachment 156081 [details]
/proc/cpuinfo of Intel Core2Duo
Comment 18 wojnilowicz 2023-02-08 19:59:56 UTC
Created attachment 156085 [details]
[PATCH] Don't use SSE4.2 on Core2Duo

Attached patch fixes the bug. Please commit it.
Comment 19 Paul Floyd 2023-04-15 06:52:05 UTC
Shouldn't this be something like

      else if ((archinfo->hwcaps & VEX_HWCAPS_AMD64_SSSE3) &&
               (archinfo->hwcaps & VEX_HWCAPS_AMD64_CX16)
               (archinfo->hwcaps & VEX_HWCAPS_AMD64_RDTSCP)) {
         fName = "amd64g_dirtyhelper_CPUID_sse42_and_cx16";
         fAddr = &amd64g_dirtyhelper_CPUID_sse42_and_cx16;
      }
      else if ((archinfo->hwcaps & VEX_HWCAPS_AMD64_SSSE3) &&
               (archinfo->hwcaps & VEX_HWCAPS_AMD64_CX16)) {
         fName = "amd64g_dirtyhelper_CPUID_sse3_and_cx16";
         fAddr = &amd64g_dirtyhelper_CPUID_sse3_and_cx16;
      }
      else {

As it stands the patch drops sse3 && cx16 && !rdtscp from amd64g_dirtyhelper_CPUID_sse42_and_cx16 to baseline.
Comment 20 wojnilowicz 2023-04-15 08:18:55 UTC
I guess so. Originally I didn't dig deeper to find out that amd64g_dirtyhelper_CPUID_sse3_and_cx16 exists.
Comment 21 Paul Floyd 2023-04-18 09:26:34 UTC
commit 54982ab5c5325a02304eccb0e16a51ad6ef9a0e3 (HEAD -> master, origin/master, origin/HEAD)
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Mon Apr 17 22:57:39 2023 +0200

    Forgot to add the modified file for 374596

and

commit 41a7f59a8838a042813ac20fe1472e55e9bd5697
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Mon Apr 17 21:53:23 2023 +0200

    Bug 374596 - inconsistent RDTSCP support on x86_64