On aarch64 (64-bit ARM) memcheck complains "unhandled instruction 0xD50B7425" which is "dc zva, x5". This is a data cache control instruction; it [allocates and] clears the cache line whose address corresponds to the contents of register x5. Optimized code uses "dc zva," when it will be writing the entire cache line. Therefore the hardware need not read the line from memory (which otherwise would be required upon the first write to the line), which saves time. Other related instructions which are used soon afterwards: "dc cvau," [Clean data cache line by Virtual Address to point of Unification] to force the cache line to be written to memory; and "ic ivau," [Invalidate icache line by Virtual Address to point of Unification] to force the icache to forget the old contents of a newly-overwritten instruction stream.
(In reply to John Reiser from comment #0) > instruction; it [allocates and] clears the cache line whose address > corresponds to the contents of register x5. Yes, ppc has something similar iirc (dcbz). The main difficulty is to know how big the cache line is. Any ideas? And on a big.Little configuration?
You can read the DCZID_EL0 register, which will tell you both (a) whether it's permitted to use DC ZVA and (b) the size of the cache line affected. A big.Little system will always report the same cache line size for this purpose regardless of core -- it would be impossible to use DC ZVA reliably otherwise. (There is at least one SoC with a system integration bug where it reports different values on different cores, but the fix for this is at the Linux kernel level where the kernel can check for such errata and set the register up to trap so the kernel can fix up the reported values.)
(In reply to Julian Seward from comment #1) > The main difficulty is to know how big the cache line is. The value (and validity) is reported by the instruction "mrs reg, dczid_el0". So valgrind could add that to its configuration data for the virtual CPU. As default please use "valid, 64-bytes" (represented as 4, which is (-2 + log2(size)) ) which matches RaspberryPi3. Other choices: 1. The value reported by the host, if host and target have the same architecture. 2. A command-line option. Until more experience, the use of "dc zva," probably deserves a once-per-run information message which states the value that valgrind chooses (and the pc at which valgrind is forced to choose.)
Good point -- there's no inherent reason why valgrind would need to use the same cache line size that the host is using for DC ZVA. You can just pick an arbitrary value as long as you're consistent with the value of DCZID_EL0 you report to the guest.
Hi, I too am getting the complaint about the missing DC ZVA instruction on some ARM64 Android target: ==13172== Memcheck, a memory error detector ==13172== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==13172== Using Valgrind-3.17.0.GIT and LibVEX; rerun with -h for copyright info ==13172== Command: ... ==13172== ARM64 front end: branch_etc disInstr(arm64): unhandled instruction 0xD50B7428 disInstr(arm64): 1101'0101 0000'1011 0111'0100 0010'1000 ==13172== valgrind: Unrecognised instruction at address 0x4077500. ==13172== at 0x4077500: __dl_memset (in /system/bin/linker64) ... And objdump says: ... 77500: d50b7428 dc zva, x8 77504: d50b7429 dc zva, x9 ... Any chance this can be added? Thanks, Antonio
BTW in my case the unhandled instruction was in an optimized version of memset, I was able to revert to a non-optimized version and progress further in my debugging session. I am mentioning this because looking up optimized version of memset could give a hint about how to make a reproducible test case.
Warning: set address range perms: large range
valgrind ./test_nng ==66046== Memcheck, a memory error detector ==66046== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==66046== Using Valgrind-3.18.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info ==66046== Command: ./test_nng ==66046== ==66046== Warning: set address range perms: large range [0x7fff20187000, 0x80001ff87000) (defined) ==66046== Warning: set address range perms: large range [0x7fff2045b000, 0x7fff7fe4f000) (defined) ==66046== Warning: set address range perms: large range [0x7fff8e3db000, 0x7fffc0187000) (noaccess) ==66046== Warning: set address range perms: large range [0x7fffc0187000, 0x7fffe2fa7000) (defined) ==66046== Warning: set address range perms: large range [0x7fffe2fa7000, 0x7fffffe00000) (noaccess) --66046-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option --66046-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times) --66046-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times) --66046-- WARNING: unhandled amd64-darwin syscall: unix:228 --66046-- You may be able to write your own handler. --66046-- Read the file README_MISSING_SYSCALL_OR_IOCTL. --66046-- Nevertheless we consider this a bug. Please report --66046-- it at http://valgrind.org/support/bug_reports.html. --66046-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 8 times) past compress chan tests before decode read bytes vex amd64->IR: unhandled instruction bytes: 0x62 0xD3 0x45 0x8 0x1E 0xC8 0x6 0x62 0xD3 0x6D vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==66046== valgrind: Unrecognised instruction at address 0x10001ca6b. ==66046== at 0x10001CA6B: HuffmanTree_makeFromLengths2 (lodepng.c:1034) ==66046== by 0x100005805: lodepng_inflatev (lodepng.c:1216) ==66046== by 0x1000095A3: lodepng_zlib_decompressv (lodepng.c:1927) ==66046== by 0x100012C94: lodepng_decode (lodepng.c:2878) ==66046== by 0x100017C4E: decode_wrapper (lodepng.c:7565) ==66046== by 0x100033381: test_nng_helper (test_nng_helper.c:37) ==66046== by 0x10000252A: main (test.c:14) ==66046== Your program just tried to execute an instruction that Valgrind ==66046== did not recognise. There are two possible reasons for this. ==66046== 1. Your program has a bug and erroneously jumped to a non-code ==66046== location. If you are running Memcheck and you just saw a ==66046== warning about a bad jump, it's probably your program's fault. ==66046== 2. The instruction is legitimate but Valgrind doesn't handle it, ==66046== i.e. it's Valgrind's fault. If you think this is the case or ==66046== you are not sure, please let us know and we'll try to fix it. ==66046== Either way, Valgrind will now raise a SIGILL signal which will ==66046== probably kill your program. ==66046== ==66046== Process terminating with default action of signal 4 (SIGILL) ==66046== Illegal opcode at address 0x10001CA6B ==66046== at 0x10001CA6B: HuffmanTree_makeFromLengths2 (lodepng.c:1034) ==66046== by 0x100005805: lodepng_inflatev (lodepng.c:1216) ==66046== by 0x1000095A3: lodepng_zlib_decompressv (lodepng.c:1927) ==66046== by 0x100012C94: lodepng_decode (lodepng.c:2878) ==66046== by 0x100017C4E: decode_wrapper (lodepng.c:7565) ==66046== by 0x100033381: test_nng_helper (test_nng_helper.c:37) ==66046== by 0x10000252A: main (test.c:14) ==66046== ==66046== HEAP SUMMARY: ==66046== in use at exit: 0 bytes in 0 blocks ==66046== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==66046== ==66046== All heap blocks were freed -- no leaks are possible ==66046== ==66046== For lists of detected and suppressed errors, rerun with: -s ==66046== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) [1] 66046 illegal hardware instruction valgrind ./test_nng
From https://github.com/LouisBrunner/valgrind-macos/issues/56 : A quick hack to implement the instruction. It does not address other points raised, like reading DCZID_EL0 maybe should flag the instruction as supported and indicate the size. I felt it safer to discourage software from using this instruction by keeping the current implementation reporting it as not supported. Patch for reference for anyone who wants to work on something clean enough to merge: diff --git a/VEX/priv/guest_arm64_toIR.c b/VEX/priv/guest_arm64_toIR.c index 51c949def..9d04c4303 100644 --- a/VEX/priv/guest_arm64_toIR.c +++ b/VEX/priv/guest_arm64_toIR.c @@ -7948,6 +7948,30 @@ Bool dis_ARM64_branch_etc(/*MB_OUT*/DisResult* dres, UInt insn, return True; } + /* ------------------ DC_ZVA ------------------ */ + /* D5 0B 74 001 Rt dc zva, rT + */ + if ((INSN(31,0) & 0xFFFFFFE0) == 0xD50B7420) { + /* Exactly the same scheme as for IC IVAU, except we observe the + dMinLine size. */ + /* We will always be provided with a valid dMinLine value. */ + vassert(archinfo->arm64_dMinLine_lg2_szB >= 2 + && archinfo->arm64_dMinLine_lg2_szB <= 17); + /* Round the requested address, in rT, down to the start of the + containing block. */ + UInt tt = INSN(4,0); + ULong lineszB = 1ULL << archinfo->arm64_dMinLine_lg2_szB; + IRTemp addr = newTemp(Ity_I64); + assign( addr, binop( Iop_And64, + getIReg64orZR(tt), + mkU64(~(lineszB - 1))) ); + for (ULong o = 0; o < lineszB; o += 8) + { + storeLE(binop(Iop_Add64,mkexpr(addr),mkU64(o)), mkU64(0)); + } + DIP("dc zva, %s\n", nameIReg64orZR(tt)); + return True; + } /* ------------------ DC_CVAU ------------------ */ /* D5 0B 7B 001 Rt dc cvau, rT D5 0B 7E 001 Rt dc civac, rT
In addition to the opcode this will also require that MRS DCZID_EL0 reads the DZP flag correctly - currently it is hard coded to 1 (which prohibits fc zva). https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/DCZID-EL0--Data-Cache-Zero-ID-register?lang=en
commit 8b6c36ecf14530d7f50f65697c2a039e3d410092 (HEAD -> master, origin/master, origin/HEAD) Author: Paul Floyd <pjfloyd@wanadoo.fr> Date: Sat May 11 18:10:03 2024 +0200 Bug 377966 - arm64 unhandled instruction dc zva With contributions from Louis Brunner https://github.com/LouisBrunner/valgrind-macos