Created attachment 171003 [details] Minimal reproducer for the issue SUMMARY I am working on code that uses half-precision floating point numbers as a storage format. However, when I run Valgrind on the application, it fails with Unrecognized instruction on the instructions that do the float to half conversion. STEPS TO REPRODUCE 1. Save the attached reproducer `repro.c` 2. Compile with `gcc repro.c -march=haswell` 3. Run `valgrind ./a.out` OBSERVED RESULT (With Valgrind 3.16 but have confirmed the same issue exists with 3.19): $ valgrind ./a.out ==1257== Memcheck, a memory error detector ==1257== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==1257== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info ==1257== Command: ./a.out ==1257== vex amd64->IR: unhandled instruction bytes: 0xC4 0xE3 0x7D 0x1D 0xC0 0x0 0xC5 0xF9 0x7F 0x44 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=1 VEX.L=1 VEX.nVVVV=0x0 ESC=0F3A vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0 ==1257== valgrind: Unrecognised instruction at address 0x109182. ==1257== at 0x109182: FloatToHalf (in /data/vagrant/valgrind/a.out) ==1257== by 0x10922A: main (in /data/vagrant/valgrind/a.out) ==1257== Your program just tried to execute an instruction that Valgrind ==1257== did not recognise. There are two possible reasons for this. ==1257== 1. Your program has a bug and erroneously jumped to a non-code ==1257== location. If you are running Memcheck and you just saw a ==1257== warning about a bad jump, it's probably your program's fault. ==1257== 2. The instruction is legitimate but Valgrind doesn't handle it, ==1257== i.e. it's Valgrind's fault. If you think this is the case or ==1257== you are not sure, please let us know and we'll try to fix it. ==1257== Either way, Valgrind will now raise a SIGILL signal which will ==1257== probably kill your program. ==1257== ==1257== Process terminating with default action of signal 4 (SIGILL) ==1257== Illegal opcode at address 0x109182 ==1257== at 0x109182: FloatToHalf (in /data/vagrant/valgrind/a.out) ==1257== by 0x10922A: main (in /data/vagrant/valgrind/a.out) ==1257== ==1257== HEAP SUMMARY: ==1257== in use at exit: 0 bytes in 0 blocks ==1257== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==1257== ==1257== All heap blocks were freed -- no leaks are possible ==1257== ==1257== For lists of detected and suppressed errors, rerun with: -s ==1257== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Illegal instruction EXPECTED RESULT The instruction should be correctly handled SOFTWARE/OS VERSIONS Windows: macOS: Linux/KDE Plasma: (available in About System) KDE Plasma Version: KDE Frameworks Version: Qt Version: ADDITIONAL INFORMATION
The m256 variant is part of AVX512F
It's a dupe of 383010 then. *** This bug has been marked as a duplicate of bug 383010 ***
According to Intel, this is not part of AVX512F. The repro targets Haswell, which is over 10 years old and does not support any AVX512 at all but still supports this as it is part of F16C: Synopsis __m128i _mm256_cvtps_ph (__m256 a, int imm8) #include <immintrin.h> Instruction: vcvtps2ph xmm, ymm, imm8 CPUID Flags: F16C Description Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. Rounding is done according to the imm8[2:0] parameter, which can be one of: _MM_FROUND_TO_NEAREST_INT // round to nearest _MM_FROUND_TO_NEG_INF // round down _MM_FROUND_TO_POS_INF // round up _MM_FROUND_TO_ZERO // truncate _MM_FROUND_CUR_DIRECTION // use MXCSR.RC; see _MM_SET_ROUNDING_MODE Hence, I am not convinced that it is a duplicate of bug 383010 as that seems to be AVX512-specific? Of course, if it gets done, I don't really mind that much!
Sorry, looks like you are right. I was confused with the m512 variant, which is AVX512F. The m256 variant is part of F16C.
This was implemented in: commit 472b067e39a11a47ae3fa7cd7d3142558f78969d Author: Julian Seward <jseward@acm.org> Date: Sun Mar 17 21:41:42 2019 +0100 amd64: Implement RDRAND, VCVTPH2PS and VCVTPS2PH. Bug 398870 - Please add support for instruction vcvtps2ph Bug 353370 - RDRAND amd64->IR: unhandled instruction bytes: 0x48 0xF 0xC7 0xF0 This commit implements: * amd64 RDRAND instruction, on hosts that have it. * amd64 VCVTPH2PS and VCVTPS2PH, on hosts that have it. The presence/absence of these on the host is now reflected in the CPUID results returned to the guest. So code that tests for these features in CPUID and acts accordingly should "just work". * New test cases, none/tests/amd64/rdrand and none/tests/amd64/f16c. These are built if the host's assembler can handle them, in the usual way. And on my machine the reproducer works just fine under valgrind. If it still doesn't for you with latest valgrind could you run with valgrind -v and check the hwcaps detect f16c: --411301-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-bmi-f16c-rdrand-rdseed-fma (and check your machine does actually have f16c set in /proc/cpuinfo)
Argh! You are right. I am running inside a VirtualBox VM and, although the instructions *seem* to work just fine, the feature flag is not set. It looks to be due to this longstanding feature request to add F16C (and FMA/BMI) support to VBox: https://www.virtualbox.org/ticket/15471 Thanks for your help, Steve.
Is it OK to close this item then?
Yes, from my side this looks like a VirtualBox issue, not a Valgrind issue, so can be closed here. Thanks!