Created attachment 81851 [details] see3 check Hi, I have a small test app (mostly taken from Qt5) which checks for sse3. If I run it without valgrind: $ ./qSSE3Check Yay sse3!!! but with valgrind: $ valgrind ./qSSE3Check no sse3 for you, get out! this only seem to happens on amd cpus without "real" sse3 (only sse4a), but always works on intel cpus, with and without valgrind NOTE: testcase might only work on 64bit cpus checked with valgrind 3.8.1 and current 3.9.0-svn BACKGROUND: A my cpu has "sse3"(sse4a) and I build Qt5 it declares (automagically) sse3 as min requiered cpu feature. When I now run a Qt5 gui app and sse3 is not available it just fails. "Incompatible processor. This Qt build requires the following features: sse3" //----------------- #include <iostream> # define PICreg "%%rbx" static void cpuidFeatures01(unsigned &ecx, unsigned &edx) { void* tmp1; asm ("xchg " PICreg", %2\n" "cpuid\n" "xchg " PICreg", %2\n" : "=&c" (ecx), "=&d" (edx), "=&r" (tmp1) : "a" (1)); } int main(int argc, char** argv) { unsigned cpuid01ECX = 0, cpuid01EDX = 0; cpuidFeatures01(cpuid01ECX, cpuid01EDX); if (cpuid01ECX & (1u)) std::cout << "Yay sse3!!!" << std::endl; else std::cout << "no sse3 for you, get out!" << std::endl; return 0; } //---------------------------------------- Linux kuehlschrank 3.10.9-aufs #1 SMP PREEMPT Wed Aug 21 20:44:02 CEST 2013 x86_64 AMD Phenom(tm) II X4 945 Processor AuthenticAMD GNU/Linux (and 4 times) processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X4 945 Processor stepping : 3 microcode : 0x10000c8 cpu MHz : 800.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save bogomips : 6018.61 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate
Created attachment 81876 [details] amd sse3 check vex IR patch Here a patch which makes it work for me, but I am not sure if this might break other things. Someone who knows what he is doing should carefully check it ;>
The cpuid support looks messy in general. We have the following VEX hardware capabilities defined in pub/libvex.h: /* amd64: baseline capability is SSE2, with cmpxchg8b but not cmpxchg16b. */ #define VEX_HWCAPS_AMD64_SSE3 (1<<5) /* SSE3 support */ #define VEX_HWCAPS_AMD64_CX16 (1<<6) /* cmpxchg16b support */ #define VEX_HWCAPS_AMD64_LZCNT (1<<7) /* SSE4a LZCNT insn */ #define VEX_HWCAPS_AMD64_AVX (1<<8) /* AVX instructions */ #define VEX_HWCAPS_AMD64_RDTSCP (1<<9) /* RDTSCP instruction */ #define VEX_HWCAPS_AMD64_BMI (1<<10) /* BMI1 instructions */ #define VEX_HWCAPS_AMD64_AVX2 (1<<11) /* AVX2 instructions */ And the following CPUID helpers in priv/guest_amd64_helpers.c: extern void amd64g_dirtyhelper_CPUID_baseline ( VexGuestAMD64State* st ); extern void amd64g_dirtyhelper_CPUID_sse3_and_cx16 ( VexGuestAMD64State* st ); extern void amd64g_dirtyhelper_CPUID_sse42_and_cx16 ( VexGuestAMD64State* st ); extern void amd64g_dirtyhelper_CPUID_avx_and_cx16 ( VexGuestAMD64State* st ); And in priv/guest_amd64_toIR.c we map them as follows: case 0xA2: { /* CPUID */ /* Uses dirty helper: void amd64g_dirtyhelper_CPUID ( VexGuestAMD64State* ) declared to mod rax, wr rbx, rcx, rdx */ IRDirty* d = NULL; const HChar* fName = NULL; void* fAddr = NULL; if (haveF2orF3(pfx)) goto decode_failure; if (archinfo->hwcaps == (VEX_HWCAPS_AMD64_SSE3 |VEX_HWCAPS_AMD64_CX16 |VEX_HWCAPS_AMD64_AVX)) { fName = "amd64g_dirtyhelper_CPUID_avx_and_cx16"; fAddr = &amd64g_dirtyhelper_CPUID_avx_and_cx16; /* This is a Core-i5-2300-like machine */ } else if (archinfo->hwcaps == (VEX_HWCAPS_AMD64_SSE3 |VEX_HWCAPS_AMD64_CX16)) { fName = "amd64g_dirtyhelper_CPUID_sse42_and_cx16"; fAddr = &amd64g_dirtyhelper_CPUID_sse42_and_cx16; /* This is a Core-i5-670-like machine */ } else { /* Give a CPUID for at least a baseline machine, SSE2 only, and no CX16 */ fName = "amd64g_dirtyhelper_CPUID_baseline"; fAddr = &amd64g_dirtyhelper_CPUID_baseline; } Note how we don't ever use the amd64g_dirtyhelper_CPUID_sse3_and_cx16. I think the suggested patch is correct and we do want to claim at least sse3 and cx16 support when the underlying hardware claims to support it. But in general the mapping seems "wrong". Note for example how we support AVX2 but never claim it through a CPUID helper. And it looks like we support AVX2 just fine even if the underlying hardware doesn't have support for it. So the general logic should be mapped to the capabilities of VEX and not to the capabilities of the underlying hardware it seems.
I think the idea of the patch is correct, we should at least check the minimum CPU capabilities, not an exact match. I have changed the current code slightly to do that. Currently testing: diff --git a/priv/guest_amd64_toIR.c b/priv/guest_amd64_toIR.c index 38c2556..522bfbe 100644 --- a/priv/guest_amd64_toIR.c +++ b/priv/guest_amd64_toIR.c @@ -20937,15 +20937,17 @@ Long dis_ESC_0F ( const HChar* fName = NULL; void* fAddr = NULL; if (haveF2orF3(pfx)) goto decode_failure; - if (archinfo->hwcaps == (VEX_HWCAPS_AMD64_SSE3 - |VEX_HWCAPS_AMD64_CX16 - |VEX_HWCAPS_AMD64_AVX)) { + /* This isn't entirely correct, CPUID should depend on the VEX + capabilities, not on the underlying CPU. See bug #324882. */ + if ((archinfo->hwcaps & VEX_HWCAPS_AMD64_SSE3) && + (archinfo->hwcaps & VEX_HWCAPS_AMD64_CX16) && + (archinfo->hwcaps & VEX_HWCAPS_AMD64_AVX)) { fName = "amd64g_dirtyhelper_CPUID_avx_and_cx16"; fAddr = &amd64g_dirtyhelper_CPUID_avx_and_cx16; /* This is a Core-i5-2300-like machine */ } - else if (archinfo->hwcaps == (VEX_HWCAPS_AMD64_SSE3 - |VEX_HWCAPS_AMD64_CX16)) { + else if ((archinfo->hwcaps & VEX_HWCAPS_AMD64_SSE3) && + (archinfo->hwcaps & VEX_HWCAPS_AMD64_CX16)) { fName = "amd64g_dirtyhelper_CPUID_sse42_and_cx16"; fAddr = &amd64g_dirtyhelper_CPUID_sse42_and_cx16; /* This is a Core-i5-670-like machine */ That should prevent downgrading to the baseline in more cases if the underlying CPU has more capabilities then just those needed for emulating a specific machine.
Committed patch from comment #3 as VEX r2761. Please close this bug if that solves the issue for you.
Works fine, thx :)