On NVIDIA's Grace CPU valgrind fails to run a binary (that otherwise runs fine) failing to handle an instruction: ``` ==45347== Memcheck, a memory error detector ==45347== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==45347== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info ==45347== Command: ./stockfish bench ==45347== Stockfish dev-20240329-ec598b38 by the Stockfish developers (see AUTHORS file) Position: 1/48 (rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1) info string NNUE evaluation using nn-1ceb1ade0001.nnue info string NNUE evaluation using nn-baff1ede1f90.nnue disInstr(arm64): unhandled instruction 0x4E9096B7 disInstr(arm64): 0100'1110 1001'0000 1001'0110 1011'0111 ==45347== valgrind: Unrecognised instruction at address 0x40f684. ==45347== at 0x40F684: Stockfish::Eval::NNUE::Network<Stockfish::Eval::NNUE::NetworkArchitecture<2560u, 15, 32>, Stockfish::Eval::NNUE::FeatureTransformer<2560u, &Stockfish::StateInfo::accumulatorBig> >::evaluate(Stockfish::Position const&, bool, int*, bool) const [clone .constprop.0] (in /users/vjoost/Stockfish/src/stockfish) ==45347== by 0x40E667: Stockfish::Eval::evaluate(Stockfish::Eval::NNUE::Networks const&, Stockfish::Position const&, int) (in /users/vjoost/Stockfish/src/stockfish) ==45347== by 0x42A3F7: Stockfish::Search::Worker::iterative_deepening() (in /users/vjoost/Stockfish/src/stockfish) ==45347== by 0x4280E7: Stockfish::Search::Worker::start_searching() (in /users/vjoost/Stockfish/src/stockfish) ==45347== by 0x4210EB: Stockfish::Thread::idle_loop() (in /users/vjoost/Stockfish/src/stockfish) ==45347== by 0x42103F: Stockfish::NativeThread::NativeThread<void (Stockfish::Thread::*)(), Stockfish::Thread*>(void (Stockfish::Thread::*&&)(), Stockfish::Thread*&&)::{lambda(void*)#1}::_FUN(void*) (in /users/vjoost/Stockfish/src/stockfish) ==45347== by 0x507875B: start_thread (in /lib64/libpthread-2.31.so) ==45347== by 0x54BFEEB: thread_start (in /lib64/libc-2.31.so) ``` Linux OS. The program is compiled using `gcc version 12.3.0`, with target: `-march=armv8.2-a+dotprod`. /proc/cpuinfo gives: ``` Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd4f CPU revision : 0 ```
Also reproduces on a Raspberry Pi 5: Raspberry Pi 5 $ cat /proc/cpuinfo processor : 0 BogoMIPS : 108.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x4 CPU part : 0xd0b CPU revision : 1
ok, seems to be related to the +dotprod part of the isa (which we detect based on the asimddp flag). Probably acc = vdotq_s32(acc, a0, b0); or output[i] = vaddvq_s32(sum);
I tried on FreeBSD but couldn't get stockfish to build I'll give it a go on ubuntu next time I boot it.
OK on ubuntu Raspberry Pi 5 I get info string NNUE evaluation using nn-1ceb1ade0001.nnue info string NNUE evaluation using nn-baff1ede1f90.nnue disInstr(arm64): unhandled instruction 0x4E9196DF disInstr(arm64): 0100'1110 1001'0001 1001'0110 1101'1111 ==51681== valgrind: Unrecognised instruction at address 0x118f6c. ==51681== at 0x118F6C: Stockfish::Eval::NNUE::Network<Stockfish::Eval::NNUE::NetworkArchitecture<2560u, 15, 32>, Stockfish::Eval::NNUE::FeatureTransformer<2560u, &Stockfish::StateInfo::accumulatorBig> >::evaluate(Stockfish::Position const&, bool, int*, bool) const [clone .constprop.0] (in /home/paulf/scratch/Stockfish/src/stockfish) with ubuntu's Valgrind 3.21 NS1_19NetworkArchitectureILj2560ELi15ELi32EEENS1_18FeatureTransformerILj2560EXadL_ZNS_9StateInfo14accumulatorBigEEEEEE8evaluateERKNS_8PositionEbPib.constprop.0+3692> sdot v31.4s, v22.16b, v17.16b But no problem with Valgrind built from source. Looks like this was fixed with Author: William Ashley <wash@amazon.com> 2023-11-10 17:51:12 Committer: Mark Wielaard <mark@klomp.org> 2023-11-10 17:55:22 Parent: aa3432229dff78dbbe95aeb0604215d3d588c4a4 (regtest: bug401284.c should never cast the return of malloc in C) Child: 242d8881e10328ff98c37ceb7fd31955a29cad82 (Bug 476787 - Build of Valgrind 3.21.0 fails when SOLARIS_PT_SUNDWTRACE_THRP is defined) Branches: master, remotes/origin/master, remotes/origin/users/paulf/try-bug484480, remotes/origin/users/paulf/try-carry, remotes/origin/users/paulf/try-sem_clockwait_np Follows: VALGRIND_3_22_0 Precedes: Bug 460616 - Add support for aarch64 dotprod instructions This change adds support for the FEAT_DotProd instructions SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>] SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb> UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>] UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb> Please build your own Valgrind from source, or wait for Valgrind 3.23 which is duu out sometime this month (April 2024). *** This bug has been marked as a duplicate of bug 460616 ***