Bug 369053 - AMD64 fma4 instructions missing 256 bit support
Summary: AMD64 fma4 instructions missing 256 bit support
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.9.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on: 369000
Blocks: 339596
  Show dependency treegraph
 
Reported: 2016-09-19 14:37 UTC by Mark Wielaard
Modified: 2016-09-19 14:37 UTC (History)
7 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2016-09-19 14:37:11 UTC
Lets clone this once again. We now have fma4 support, but only for the 128bit/xmm cases. There are already tests (in none/tests/amd64/fma4.c) for the 256bit/ymm cases, but these are disabled for now because those aren't implemented yet. This would need to check getVexL(pfx) and extend the operations to the full 256bits.

+++ This bug was initially created as a clone of Bug #369000 +++

Lets split the fma4 and xop instructions into separate bugs & patches.
(I have already looked at the fma4 ones, but haven't had time for the xop instructions.)

+++ This bug was initially created as a clone of Bug #339596 +++

When running valgrind upon program startup I immediately run into an illegal instruction.  My first action was to try using the latest source from SVN, this however, did not help.  Below are a few extra details on the instruction in question, my system, and anything else I can think of.

vex amd64->IR: unhandled instruction bytes: 0x8F 0xE8 0x78 0xCD 0xC1 0x4 0xC5 0xF9
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==16432== valgrind: Unrecognised instruction at address 0x5adb623.
==16432==    at 0x5ADB623: findChar(QChar const*, int, QChar, int, Qt::CaseSensitivity) (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5AEC0C9: QString::split(QChar, QString::SplitBehavior, Qt::CaseSensitivity) const (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5BC8E61: QStandardPaths::standardLocations(QStandardPaths::StandardLocation) (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5B7D9B7: QStandardPaths::locate(QStandardPaths::StandardLocation, QString const&, QFlags<QStandardPaths::LocateOption>) (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5BB6028: QLoggingRegistry::init() (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5C2BC8A: QCoreApplication::init() (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5C2BEF5: QCoreApplication::QCoreApplication(QCoreApplicationPrivate&) (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5560718: QGuiApplication::QGuiApplication(QGuiApplicationPrivate&) (in /usr/lib64/libQt5Gui.so.5.4.0)
==16432==    by 0x4F8F23C: QApplication::QApplication(int&, char**, int) (in /usr/lib64/libQt5Widgets.so.5.4.0)
==16432==    by 0x40D62D: main (main.cpp:37)

I've run it through GDB and grabbed a disassembly output (More can be provided if requested):
   0x0000000005adb619 <+329>:	mov    %r8,%rax
   0x0000000005adb61c <+332>:	vmovups (%rax),%xmm0
   0x0000000005adb620 <+336>:	mov    %rcx,%r8
=> 0x0000000005adb623 <+339>:	vpcomw $0x4,%xmm1,%xmm0,%xmm0
   0x0000000005adb629 <+345>:	vpmovmskb %xmm0,%esi
   0x0000000005adb62d <+349>:	test   %si,%si
   0x0000000005adb630 <+352>:	je     0x5adb610 <_ZL8findCharPK5QChariS_iN2Qt15CaseSensitivityE+320>
   0x0000000005adb632 <+354>:	bsf    %esi,%esi
   0x0000000005adb635 <+357>:	sub    %rdi,%rax

My system CPU is an AMD FX-8150, and Qt (where the instruction seems to originate) is compiled with GCC 4.8.  I am running a Gentoo based system and have used -march=native, -O2, and -fomit-frame-pointer as my only three default CFLAGS.   

If it helps I can setup a VM with SSH access temporarily to aid in testing/debugging of the problem.  However, for sanity sake I will only do this Valgrind developers.

My final observersations seem to be that the XOP and FM4 instructions introduced in the bulldozer generation AMD processors seem to cause the most trouble.  But that may be beyond the scope of this bug report.

Reproducible: Always