369000 – AMD64 fma4 instructions unsupported. vex amd64->IR: unhandled instruction bytes: 0x8F 0xE8 0x78 0xCD 0xC1 0x4 0xC5 0xF9

Bug 369000 - AMD64 fma4 instructions unsupported. vex amd64->IR: unhandled instruction bytes: 0x8F 0xE8 0x78 0xCD 0xC1 0x4 0xC5 0xF9

Summary: AMD64 fma4 instructions unsupported. vex amd64->IR: unhandled instruction byt...

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	vex (other bugs)
Version First Reported In:	3.9.0
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Duplicates (1):	316382 (view as bug list)
Depends on:
Blocks:	339596 369053
	Show dependency tree / graph

Reported:	2016-09-18 17:04 UTC by Mark Wielaard
Modified:	2021-02-28 18:34 UTC (History)
CC List:	8 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Implement AMD FMA4 instructions. (5.67 KB, patch) 2016-09-18 17:16 UTC, Mark Wielaard	Details
Testcases for fma4 instructions. (214.01 KB, patch) 2016-09-18 17:28 UTC, Mark Wielaard	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Mark Wielaard 2016-09-18 17:04:07 UTC

Lets split the fma4 and xop instructions into separate bugs & patches.
(I have already looked at the fma4 ones, but haven't had time for the xop instructions.)

+++ This bug was initially created as a clone of Bug #339596 +++

When running valgrind upon program startup I immediately run into an illegal instruction.  My first action was to try using the latest source from SVN, this however, did not help.  Below are a few extra details on the instruction in question, my system, and anything else I can think of.

vex amd64->IR: unhandled instruction bytes: 0x8F 0xE8 0x78 0xCD 0xC1 0x4 0xC5 0xF9
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==16432== valgrind: Unrecognised instruction at address 0x5adb623.
==16432==    at 0x5ADB623: findChar(QChar const*, int, QChar, int, Qt::CaseSensitivity) (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5AEC0C9: QString::split(QChar, QString::SplitBehavior, Qt::CaseSensitivity) const (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5BC8E61: QStandardPaths::standardLocations(QStandardPaths::StandardLocation) (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5B7D9B7: QStandardPaths::locate(QStandardPaths::StandardLocation, QString const&, QFlags<QStandardPaths::LocateOption>) (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5BB6028: QLoggingRegistry::init() (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5C2BC8A: QCoreApplication::init() (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5C2BEF5: QCoreApplication::QCoreApplication(QCoreApplicationPrivate&) (in /usr/lib64/libQt5Core.so.5.4.0)
==16432==    by 0x5560718: QGuiApplication::QGuiApplication(QGuiApplicationPrivate&) (in /usr/lib64/libQt5Gui.so.5.4.0)
==16432==    by 0x4F8F23C: QApplication::QApplication(int&, char**, int) (in /usr/lib64/libQt5Widgets.so.5.4.0)
==16432==    by 0x40D62D: main (main.cpp:37)

I've run it through GDB and grabbed a disassembly output (More can be provided if requested):
   0x0000000005adb619 <+329>:	mov    %r8,%rax
   0x0000000005adb61c <+332>:	vmovups (%rax),%xmm0
   0x0000000005adb620 <+336>:	mov    %rcx,%r8
=> 0x0000000005adb623 <+339>:	vpcomw $0x4,%xmm1,%xmm0,%xmm0
   0x0000000005adb629 <+345>:	vpmovmskb %xmm0,%esi
   0x0000000005adb62d <+349>:	test   %si,%si
   0x0000000005adb630 <+352>:	je     0x5adb610 <_ZL8findCharPK5QChariS_iN2Qt15CaseSensitivityE+320>
   0x0000000005adb632 <+354>:	bsf    %esi,%esi
   0x0000000005adb635 <+357>:	sub    %rdi,%rax

My system CPU is an AMD FX-8150, and Qt (where the instruction seems to originate) is compiled with GCC 4.8.  I am running a Gentoo based system and have used -march=native, -O2, and -fomit-frame-pointer as my only three default CFLAGS.   

If it helps I can setup a VM with SSH access temporarily to aid in testing/debugging of the problem.  However, for sanity sake I will only do this Valgrind developers.

My final observersations seem to be that the XOP and FM4 instructions introduced in the bulldozer generation AMD processors seem to cause the most trouble.  But that may be beyond the scope of this bug report.

Reproducible: Always

Comment 1 Mark Wielaard 2016-09-18 17:16:40 UTC

Created attachment 101169 [details]
Implement AMD FMA4 instructions.

This is just the FMA4 part of the patch proposed in bug #339596. The original patch is by p4plus2@gmail.com with some minor formatting cleanups and an explicit clearing of the the upper 128 bits of the YMM registers these instruction operate on. It only handles the 128bit variants (that is the explicit 0==getVexL(pfx) check in the patch).

Comment 2 Mark Wielaard 2016-09-18 17:28:25 UTC

Created attachment 101170 [details]
Testcases for fma4 instructions.

This is an extension of the testcases for the fma4 instructions I wrote for bug #339596.

There are tests for the 128bit and 256 bit instructions, but the 256 bit variants are disabled for now since they aren't yet implemented.

I looked at the testcase (none/tests/arm64/fp_and_simd.c) Julian pointed out and took the various zero/inf/nan/subnormal cases from there. But I don't filter them out of the tests. Instead I explicitly add them and print them out as +/-ZERO, +/-INF, NAN and SUBNORMAL. To do this correctly I had to split the tests into ones that work on floats (ending in S) and those that work on doubles (ending in D).

There is a little bit of duplication because you have to indicate whether the instruction works on floats or doubles both when generating and when calling the testcase (someone with stronger preprocessor foo might be able to "fix" that).

Although the testcase tries to generate positive and negative NAN values the test does not check for that (I don't believe NANs can be positive/negative). If we would test for that the testcase would fail since we sometimes generate a different representation for NAN that is positive/negative according to signbit().

The testcase does produce and test for positive/negative SUBNORMALs, I am not 100% sure that makes sense. It doesn't impact the result though.

Comment 3 Mark Wielaard 2016-09-19 13:15:14 UTC

VEX svn r3249 and valgrind svn r15961

Comment 4 Mark Wielaard 2021-02-28 18:34:00 UTC

*** Bug 316382 has been marked as a duplicate of this bug. ***