Bug 323893

Summary: SSE3 not available on amd cpus in valgrind
Product: [Developer tools] valgrind Reporter: Bernd Buschinski <b.buschinski>
Component: vexAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: normal CC: mark
Priority: NOR    
Version: 3.9.0.SVN   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: see3 check
amd sse3 check vex IR patch

Description Bernd Buschinski 2013-08-22 13:15:54 UTC
Created attachment 81851 [details]
see3 check

Hi,
I have a small test app (mostly taken from Qt5) which checks for sse3.

If I run it without valgrind:
$ ./qSSE3Check
Yay sse3!!!

but with valgrind:
$ valgrind ./qSSE3Check
no sse3 for you, get out!

this only seem to happens on amd cpus without "real" sse3 (only sse4a), but always works on intel cpus, with and without valgrind
NOTE: testcase might only work on 64bit cpus

checked with valgrind 3.8.1 and current 3.9.0-svn

BACKGROUND:
A my cpu has "sse3"(sse4a) and I build Qt5 it declares (automagically) sse3 as min requiered cpu feature. When I now run a Qt5 gui app and sse3 is not available it just fails.
"Incompatible processor. This Qt build requires the following features: sse3"

//-----------------

#include <iostream>

# define PICreg "%%rbx"

static void cpuidFeatures01(unsigned &ecx, unsigned &edx)
{
    void* tmp1;
    asm ("xchg " PICreg", %2\n"
         "cpuid\n"
         "xchg " PICreg", %2\n"
        : "=&c" (ecx), "=&d" (edx), "=&r" (tmp1)
        : "a" (1));
}

int main(int argc, char** argv)
{
    unsigned cpuid01ECX = 0, cpuid01EDX = 0;
    cpuidFeatures01(cpuid01ECX, cpuid01EDX);

    if (cpuid01ECX & (1u))
        std::cout << "Yay sse3!!!" << std::endl;
    else
        std::cout << "no sse3 for you, get out!" << std::endl;

    return 0;
}

//----------------------------------------

Linux kuehlschrank 3.10.9-aufs #1 SMP PREEMPT Wed Aug 21 20:44:02 CEST 2013 x86_64 AMD Phenom(tm) II X4 945 Processor AuthenticAMD GNU/Linux

(and 4 times)
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 4
model name      : AMD Phenom(tm) II X4 945 Processor
stepping        : 3
microcode       : 0x10000c8
cpu MHz         : 800.000
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save
bogomips        : 6018.61
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
Comment 1 Bernd Buschinski 2013-08-23 16:14:00 UTC
Created attachment 81876 [details]
amd sse3 check vex IR patch

Here a patch which makes it work for me, but I am not sure if this might break other things.
Someone who knows what he is doing should carefully check it ;>
Comment 2 Mark Wielaard 2013-09-13 11:22:22 UTC
The cpuid support looks messy in general.

We have the following VEX hardware capabilities defined in pub/libvex.h:

/* amd64: baseline capability is SSE2, with cmpxchg8b but not
   cmpxchg16b. */
#define VEX_HWCAPS_AMD64_SSE3   (1<<5)  /* SSE3 support */
#define VEX_HWCAPS_AMD64_CX16   (1<<6)  /* cmpxchg16b support */
#define VEX_HWCAPS_AMD64_LZCNT  (1<<7)  /* SSE4a LZCNT insn */
#define VEX_HWCAPS_AMD64_AVX    (1<<8)  /* AVX instructions */
#define VEX_HWCAPS_AMD64_RDTSCP (1<<9)  /* RDTSCP instruction */
#define VEX_HWCAPS_AMD64_BMI    (1<<10) /* BMI1 instructions */
#define VEX_HWCAPS_AMD64_AVX2   (1<<11) /* AVX2 instructions */

And the following CPUID helpers in priv/guest_amd64_helpers.c:

extern void  amd64g_dirtyhelper_CPUID_baseline ( VexGuestAMD64State* st );
extern void  amd64g_dirtyhelper_CPUID_sse3_and_cx16 ( VexGuestAMD64State* st );
extern void  amd64g_dirtyhelper_CPUID_sse42_and_cx16 ( VexGuestAMD64State* st );
extern void  amd64g_dirtyhelper_CPUID_avx_and_cx16 ( VexGuestAMD64State* st );

And in priv/guest_amd64_toIR.c we map them as follows:

   case 0xA2: { /* CPUID */
      /* Uses dirty helper: 
            void amd64g_dirtyhelper_CPUID ( VexGuestAMD64State* )
         declared to mod rax, wr rbx, rcx, rdx
      */
      IRDirty* d     = NULL;
      const HChar*   fName = NULL;
      void*    fAddr = NULL;
      if (haveF2orF3(pfx)) goto decode_failure;
      if (archinfo->hwcaps == (VEX_HWCAPS_AMD64_SSE3
                               |VEX_HWCAPS_AMD64_CX16
                               |VEX_HWCAPS_AMD64_AVX)) {
         fName = "amd64g_dirtyhelper_CPUID_avx_and_cx16";
         fAddr = &amd64g_dirtyhelper_CPUID_avx_and_cx16;
         /* This is a Core-i5-2300-like machine */
      }
      else if (archinfo->hwcaps == (VEX_HWCAPS_AMD64_SSE3
                                    |VEX_HWCAPS_AMD64_CX16)) {
         fName = "amd64g_dirtyhelper_CPUID_sse42_and_cx16";
         fAddr = &amd64g_dirtyhelper_CPUID_sse42_and_cx16;
         /* This is a Core-i5-670-like machine */
      }
      else {
         /* Give a CPUID for at least a baseline machine, SSE2
            only, and no CX16 */
         fName = "amd64g_dirtyhelper_CPUID_baseline";
         fAddr = &amd64g_dirtyhelper_CPUID_baseline;
      }

Note how we don't ever use the amd64g_dirtyhelper_CPUID_sse3_and_cx16.

I think the suggested patch is correct and we do want to claim at least sse3 and cx16 support when the underlying hardware claims to support it.

But in general the mapping seems "wrong". Note for example how we support AVX2 but never claim it through a CPUID helper. And it looks like we support AVX2 just fine even if the underlying hardware doesn't have support for it. So the general logic should be mapped to the capabilities of VEX and not to the capabilities of the underlying hardware it seems.
Comment 3 Mark Wielaard 2013-09-13 12:19:32 UTC
I think the idea of the patch is correct, we should at least check the minimum CPU capabilities, not an exact match. I have changed the current code slightly to do that. Currently testing:

diff --git a/priv/guest_amd64_toIR.c b/priv/guest_amd64_toIR.c
index 38c2556..522bfbe 100644
--- a/priv/guest_amd64_toIR.c
+++ b/priv/guest_amd64_toIR.c
@@ -20937,15 +20937,17 @@ Long dis_ESC_0F (
       const HChar*   fName = NULL;
       void*    fAddr = NULL;
       if (haveF2orF3(pfx)) goto decode_failure;
-      if (archinfo->hwcaps == (VEX_HWCAPS_AMD64_SSE3
-                               |VEX_HWCAPS_AMD64_CX16 
-                               |VEX_HWCAPS_AMD64_AVX)) {
+      /* This isn't entirely correct, CPUID should depend on the VEX
+         capabilities, not on the underlying CPU. See bug #324882. */
+      if ((archinfo->hwcaps & VEX_HWCAPS_AMD64_SSE3) &&
+          (archinfo->hwcaps & VEX_HWCAPS_AMD64_CX16) &&
+          (archinfo->hwcaps & VEX_HWCAPS_AMD64_AVX)) {
          fName = "amd64g_dirtyhelper_CPUID_avx_and_cx16";
          fAddr = &amd64g_dirtyhelper_CPUID_avx_and_cx16;
          /* This is a Core-i5-2300-like machine */
       }
-      else if (archinfo->hwcaps == (VEX_HWCAPS_AMD64_SSE3
-                                    |VEX_HWCAPS_AMD64_CX16)) {
+      else if ((archinfo->hwcaps & VEX_HWCAPS_AMD64_SSE3) &&
+               (archinfo->hwcaps & VEX_HWCAPS_AMD64_CX16)) {
          fName = "amd64g_dirtyhelper_CPUID_sse42_and_cx16";
          fAddr = &amd64g_dirtyhelper_CPUID_sse42_and_cx16;
          /* This is a Core-i5-670-like machine */

That should prevent downgrading to the baseline in more cases if the underlying CPU has more capabilities then just those needed for emulating a specific machine.
Comment 4 Mark Wielaard 2013-09-13 13:31:45 UTC
Committed patch from comment #3 as VEX r2761.

Please close this bug if that solves the issue for you.
Comment 5 Bernd Buschinski 2013-09-14 11:32:52 UTC
Works fine, thx :)