Bug 460203

Summary: Valgrind crashes when skylake-avx512 is set (using libmkl)
Product: [Developer tools] valgrind Reporter: Paola <paola_arc>
Component: generalAssignee: Julian Seward <jseward>
Status: RESOLVED DUPLICATE    
Severity: crash CC: gabravier, mark, pjfloyd
Priority: NOR    
Version: 3.19 GIT   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Paola 2022-10-10 15:51:46 UTC
SUMMARY
Valgrind crashes when skylake-avx512 is set. 

Log error:
vex amd64->IR: unhandled instruction bytes: 0x62 0xE1 0xF7 0x8 0x5E 0xC0 0x62 0xE1 0xFD 0x8
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==127428== valgrind: Unrecognised instruction at address 0x4a2dab.

Following the instructions reported here https://bugs.kde.org/show_bug.cgi?id=383010, I solved the "unrecognised instruction" but still valgrind crashes.

Valgrind version:  Valgrind-3.19.0.GIT
GCC: 8.3.0

CPU:
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz


Log: 

==349954== Memcheck, a memory error detector
==349954== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==349954== Using Valgrind-3.19.0.GIT and LibVEX; rerun with -h for copyright info

==349954== WARNING: valgrind ignores shmget(shmflg) SHM_HUGETLB

Memcheck: mc_translate.c:7243 (gen_store_b): the 'impossible' happened.

host stacktrace:
==349954==    at 0x58048642: show_sched_status_wrk (m_libcassert.c:406)
==349954==    by 0x58048747: report_and_quit (m_libcassert.c:477)
==349954==    by 0x580488D0: vgPlain_assert_fail (m_libcassert.c:543)
==349954==    by 0x580279C9: gen_store_b (mc_translate.c:7243)
==349954==    by 0x580399F8: do_origins_Store_plain (mc_translate.c:7640)
==349954==    by 0x580399F8: schemeS (mc_translate.c:7724)
==349954==    by 0x580399F8: vgMemCheck_instrument (mc_translate.c:8831)
==349954==    by 0x5805ECAF: tool_instrument_then_gdbserver_if_needed (m_translate.c:241)
==349954==    by 0x5813D2A7: LibVEX_FrontEnd (main_main.c:679)
==349954==    by 0x5813D772: LibVEX_Translate (main_main.c:1239)
==349954==    by 0x58061695: vgPlain_translate (m_translate.c:1836)
==349954==    by 0x580A2C5B: handle_chain_me (scheduler.c:1177)
==349954==    by 0x580A5437: vgPlain_scheduler (scheduler.c:1522)
==349954==    by 0x580EFFE6: thread_wrapper (syswrap-linux.c:101)
==349954==    by 0x580EFFE6: run_a_thread_NORETURN (syswrap-linux.c:154)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 349954)
==349954==    at 0x300BB831: mkl_dft_avx512_ownsInitDftPrimeFact_32f_32f (in libmkl_avx512.so)
==349954==    by 0x30080A52: mkl_dft_avx512_ippsDFTInit_C_32fc (in libmkl_avx512.so)
==349954==    by 0x30082E24: mkl_dft_avx512_ownsInitDftConv_32f (in libmkl_avx512.so)
==349954==    by 0x3007DE27: mkl_dft_avx512_ippsDFTInit_R_32f (in libmkl_avx512.so)
[...]

Do you have any hints about this problem?
Thanks for your time. 

Paola
Comment 1 Paul Floyd 2022-10-11 15:13:15 UTC
Probably a duplicate of https://bugs.kde.org/show_bug.cgi?id=383010

You can try the patchset from that bugzilla item.
Comment 2 Paola 2022-10-21 14:53:10 UTC
As I reported, I followed the instructions in https://bugs.kde.org/show_bug.cgi?id=383010 and I still got some errors

Valgrind version:  Valgrind-3.19.0.GIT
GCC: 8.3.0

CPU:
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz


Log: 

==349954== Memcheck, a memory error detector
==349954== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==349954== Using Valgrind-3.19.0.GIT and LibVEX; rerun with -h for copyright info

==349954== WARNING: valgrind ignores shmget(shmflg) SHM_HUGETLB

Memcheck: mc_translate.c:7243 (gen_store_b): the 'impossible' happened.

host stacktrace:
==349954==    at 0x58048642: show_sched_status_wrk (m_libcassert.c:406)
==349954==    by 0x58048747: report_and_quit (m_libcassert.c:477)
==349954==    by 0x580488D0: vgPlain_assert_fail (m_libcassert.c:543)
==349954==    by 0x580279C9: gen_store_b (mc_translate.c:7243)
==349954==    by 0x580399F8: do_origins_Store_plain (mc_translate.c:7640)
==349954==    by 0x580399F8: schemeS (mc_translate.c:7724)
==349954==    by 0x580399F8: vgMemCheck_instrument (mc_translate.c:8831)
==349954==    by 0x5805ECAF: tool_instrument_then_gdbserver_if_needed (m_translate.c:241)
==349954==    by 0x5813D2A7: LibVEX_FrontEnd (main_main.c:679)
==349954==    by 0x5813D772: LibVEX_Translate (main_main.c:1239)
==349954==    by 0x58061695: vgPlain_translate (m_translate.c:1836)
==349954==    by 0x580A2C5B: handle_chain_me (scheduler.c:1177)
==349954==    by 0x580A5437: vgPlain_scheduler (scheduler.c:1522)
==349954==    by 0x580EFFE6: thread_wrapper (syswrap-linux.c:101)
==349954==    by 0x580EFFE6: run_a_thread_NORETURN (syswrap-linux.c:154)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 349954)
==349954==    at 0x300BB831: mkl_dft_avx512_ownsInitDftPrimeFact_32f_32f (in libmkl_avx512.so)
==349954==    by 0x30080A52: mkl_dft_avx512_ippsDFTInit_C_32fc (in libmkl_avx512.so)
==349954==    by 0x30082E24: mkl_dft_avx512_ownsInitDftConv_32f (in libmkl_avx512.so)
==349954==    by 0x3007DE27: mkl_dft_avx512_ippsDFTInit_R_32f (in libmkl_avx512.so)
[...]
Comment 3 Paul Floyd 2022-10-26 09:10:04 UTC
Sorry but work on AVX512 is currently on hiatus as the Intel dev working on this is based in St Petersburg.
Comment 4 Mark Wielaard 2023-04-20 11:43:46 UTC

*** This bug has been marked as a duplicate of bug 383010 ***