Bug 456948 - Unrecognized instruction CLFLUSHOPT in Intel oneAPI MPI 2021.6 library
Summary: Unrecognized instruction CLFLUSHOPT in Intel oneAPI MPI 2021.6 library
Status: RESOLVED DUPLICATE of bug 424248
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.19.0
Platform: RedHat Enterprise Linux Other
: NOR critical
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-20 16:13 UTC by f.roeser@magmasoft.de
Modified: 2022-08-03 09:47 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description f.roeser@magmasoft.de 2022-07-20 16:13:00 UTC
Hi,

with the new Intel oneAPI MPI library 2021.6 (release mode) we run into the following problem:

vex amd64->IR: unhandled instruction bytes: 0x66 0xF 0xAE 0x3B 0x49 0x81 0xC6 0x0 0xFC 0xFF
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=1 PFX.F2=0 PFX.F3=0
==2714078== valgrind: Unrecognised instruction at address 0xf24853f.
==2714078==    at 0xF24853F: I_MPI_memcpy_multipage_flush_src_avx2 (in /net/aws1de027/data/repo_cache/mature/intelmpi_rt/2021.6.0/LINUX64_217/lib/libmpi.so.12)
==2714078==    by 0xF707593: ??? (in /net/aws1de027/data/repo_cache/mature/intelmpi_rt/2021.6.0/LINUX64_217/lib/libmpi.so.12)
==2714078==    by 0xF7036F5: ??? (in /net/aws1de027/data/repo_cache/mature/intelmpi_rt/2021.6.0/LINUX64_217/lib/libmpi.so.12)
==2714078==    by 0xF700DFC: ??? (in /net/aws1de027/data/repo_cache/mature/intelmpi_rt/2021.6.0/LINUX64_217/lib/libmpi.so.12)
==2714078==    by 0xF034940: MPID_Progress_wait (in /net/aws1de027/data/repo_cache/mature/intelmpi_rt/2021.6.0/LINUX64_217/lib/libmpi.so.12)
==2714078==    by 0xF5C8B7D: MPIR_Wait_impl (in /net/aws1de027/data/repo_cache/mature/intelmpi_rt/2021.6.0/LINUX64_217/lib/libmpi.so.12)
==2714078==    by 0xF4CA613: PMPI_Recv (in /net/aws1de027/data/repo_cache/mature/intelmpi_rt/2021.6.0/LINUX64_217/lib/libmpi.so.12)
....
==2714078== Your program just tried to execute an instruction that Valgrind
==2714078== did not recognise.  There are two possible reasons for this.
==2714078== 1. Your program has a bug and erroneously jumped to a non-code
==2714078==    location.  If you are running Memcheck and you just saw a
==2714078==    warning about a bad jump, it's probably your program's fault.
==2714078== 2. The instruction is legitimate but Valgrind doesn't handle it,
==2714078==    i.e. it's Valgrind's fault.  If you think this is the case or
==2714078==    you are not sure, please let us know and we'll try to fix it.
==2714078== Either way, Valgrind will now raise a SIGILL signal which will
==2714078== probably kill your program.

Platform and valgrind version
RHEL7.9 (devtoolset 7)
valgrind 3.19

Best regards
Frank
Comment 1 Mark Wielaard 2022-07-20 18:47:02 UTC
This is CLFLUSHOPT which valgrind indeed doesn't support.
But valgrind also makes sure the cpuid CLFLUSHOPT bit isn't set.
So the program really shouldn't use CLFLUSHOPT without checking cpuid says it is supported.
Comment 2 Mark Wielaard 2022-07-20 18:48:26 UTC
*** This bug has been marked as a duplicate of bug 424248 ***
Comment 3 f.roeser@magmasoft.de 2022-07-21 08:25:51 UTC
Thank you for your answer, is there a possible workaround to get valgrind running?
Is there a time schedule for a new valgrind build which fixes this issue?

Best regards
Frank
Comment 4 Mark Wielaard 2022-07-23 11:01:07 UTC
(In reply to f.roeser@magmasoft.de from comment #3)
> Thank you for your answer, is there a possible workaround to get valgrind
> running?
> Is there a time schedule for a new valgrind build which fixes this issue?

Note that this is a bug in your program or the library you are using. Valgrind clearly indicates it doesn't implement CLFLUSHOPT. So your program/library shouldn't use that instruction. Your program will also crash on a processor that doesn't implement that instruction.

Valgrind does support CLFLUSH. It looks like CLFLUSHOPT is similar. So it might not be too hard to support it.
Comment 5 f.roeser@magmasoft.de 2022-07-25 12:20:51 UTC
I found a solution : )
https://github.com/pmem/valgrind/tree/pmem-3.19
With this version we can use valgrind with new Intel MPI 2021.6

Best regards
Frank
Comment 6 f.roeser@magmasoft.de 2022-07-27 15:56:32 UTC
Is this also a forbidden instruction?
vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0x7C 0x48 0x10 0x2 0x49 0x81 0xC0 0x0
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==1112423== valgrind: Unrecognised instruction at address 0x1e74d9c0.
==1112423==    at 0x1E74D9C0: I_MPI_memcpy_nontemporal_avx512 (in /net/aws1de027/data/repo_cache/mature/intelmpi_rt/2021.6.0/LINUX64_217/lib/libmpi.so.12)

I took https://github.com/pmem/valgrind/tree/pmem-3.19 and it worked better than original 3.19 but also fails on some runs : (
Comment 7 Tom Hughes 2022-07-27 16:09:05 UTC
No instructions are "forbidden" but some are not supported yet.
Comment 8 Mark Wielaard 2022-08-01 11:48:40 UTC
(In reply to f.roeser@magmasoft.de from comment #6)
> Is this also a forbidden instruction?
> vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0x7C 0x48 0x10 0x2
> 0x49 0x81 0xC0 0x0
> vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
> vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
> vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
> ==1112423== valgrind: Unrecognised instruction at address 0x1e74d9c0.
> ==1112423==    at 0x1E74D9C0: I_MPI_memcpy_nontemporal_avx512 (in
> /net/aws1de027/data/repo_cache/mature/intelmpi_rt/2021.6.0/LINUX64_217/lib/
> libmpi.so.12)

That looks like a avx52 variant of the MOVUPS instruction.
Valgrind doesn't support that instruction. Yet see https://bugs.kde.org/show_bug.cgi?id=383010
Your program should first check the CPU supports such instructions before use.

> I took https://github.com/pmem/valgrind/tree/pmem-3.19 and it worked better
> than original 3.19 but also fails on some runs : (

Have you contacted the pmem valgrind developers to see if they want to contribute their improvements upstream?
Comment 9 f.roeser@magmasoft.de 2022-08-01 11:54:04 UTC
Hi,

Thank you for your answer. The problem occurs in the Intel 2021.6 MPI library. As far as I understand it runs this code path when cpuid returns positive if avx512 is available. Could valgrind emulate cpuid and give a fake response that avy512 is not available?

Best regards
Frank
Comment 10 Mark Wielaard 2022-08-01 12:13:43 UTC
(In reply to f.roeser@magmasoft.de from comment #9)
> Thank you for your answer. The problem occurs in the Intel 2021.6 MPI
> library. As far as I understand it runs this code path when cpuid returns
> positive if avx512 is available. Could valgrind emulate cpuid and give a
> fake response that avy512 is not available?

valgrind does emulate cpuid to say that avx512 isn't available. So it must be a bug in the Intel 2021.6 MPI library.
Comment 11 Tom Hughes 2022-08-01 16:25:37 UTC
That is exactly what we do and why Mark said your program should be checking the CPU capabilities.
Comment 12 f.roeser@magmasoft.de 2022-08-03 09:47:58 UTC
Hi,

Just as an Info:
Luckily one can overcome the behavior of the mpi library with a environment variable I_MPI_SHM=bdw_sse or I_MPI_SHM=bdw_avx2 for non avx512 memcopy.
I found out searching google someone else had a similar problem with valgrind and avx512 instructions in the Intel mpi library. There the cpu was avx512 ready but it was disabled in bios and somehow the cpuid delivers wrong? outcome so that the Intel mpi lib took the avx512 path.

Best regards
Frank