Bug 377966

Summary: arm64 unhandled instruction dc zva
Product: [Developer tools] valgrind Reporter: John Reiser <jreiser>
Component: memcheckAssignee: Paul Floyd <pjfloyd>
Status: RESOLVED FIXED    
Severity: normal CC: ao2, jgardi, kde, mark, peter.maydell, pjfloyd
Priority: NOR    
Version: 3.15 SVN   
Target Milestone: ---   
Platform: Android   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description John Reiser 2017-03-23 00:37:59 UTC
On aarch64 (64-bit ARM) memcheck complains "unhandled instruction 0xD50B7425" which is "dc zva, x5".  This is a data cache control instruction; it [allocates and] clears the cache line whose address corresponds to the contents of register x5.  Optimized code uses "dc zva," when it will be writing the entire cache line.  Therefore the hardware need not read the line from memory (which otherwise would be required upon the first write to the line), which saves time.

Other related instructions which are used soon afterwards: "dc cvau," [Clean data cache line by Virtual Address to point of Unification] to force the cache line to be written to memory; and "ic ivau," [Invalidate icache line by Virtual Address to point of Unification] to force the icache to forget the old contents of a newly-overwritten instruction stream.
Comment 1 Julian Seward 2017-03-23 07:07:04 UTC
(In reply to John Reiser from comment #0)
> instruction; it [allocates and] clears the cache line whose address
> corresponds to the contents of register x5.

Yes, ppc has something similar iirc (dcbz).  The main difficulty is to
know how big the cache line is.  Any ideas?  And on a big.Little
configuration?
Comment 2 Peter Maydell 2017-03-23 10:03:56 UTC
You can read the DCZID_EL0 register, which will tell you both (a) whether it's permitted to use DC ZVA and (b) the size of the cache line affected. A big.Little system will always report the same cache line size for this purpose regardless of core -- it would be impossible to use DC ZVA reliably otherwise. (There is at least one SoC with a system integration bug where it reports different values on different cores, but the fix for this is at the Linux kernel level where the kernel can check for such errata and set the register up to trap so the kernel can fix up the reported values.)
Comment 3 John Reiser 2017-03-23 13:02:40 UTC
(In reply to Julian Seward from comment #1)
> The main difficulty is to know how big the cache line is.

The value (and validity) is reported by the instruction "mrs reg, dczid_el0".  So valgrind could add that to its configuration data for the virtual CPU.  As default please use "valid, 64-bytes" (represented as 4, which is (-2 + log2(size)) ) which matches RaspberryPi3.  Other choices:
1. The value reported by the host, if host and target have the same architecture.
2. A command-line option.

Until more experience, the use of "dc zva," probably deserves a once-per-run information message which states the value that valgrind chooses (and the pc at which valgrind is forced to choose.)
Comment 4 Peter Maydell 2017-03-23 13:12:26 UTC
Good point -- there's no inherent reason why valgrind would need to use the same cache line size that the host is using for DC ZVA. You can just pick an arbitrary value as long as you're consistent with the value of DCZID_EL0 you report to the guest.
Comment 5 Antonio Ospite 2020-09-18 11:02:49 UTC
Hi,

I too am getting the complaint about the missing DC ZVA instruction on
some ARM64 Android target:

==13172== Memcheck, a memory error detector
==13172== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==13172== Using Valgrind-3.17.0.GIT and LibVEX; rerun with -h for copyright info
==13172== Command: ...
==13172== 
ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD50B7428
disInstr(arm64): 1101'0101 0000'1011 0111'0100 0010'1000
==13172== valgrind: Unrecognised instruction at address 0x4077500.
==13172==    at 0x4077500: __dl_memset (in /system/bin/linker64)
...


And objdump says:

...
   77500:	d50b7428 	dc	zva, x8
   77504:	d50b7429 	dc	zva, x9
...


Any chance this can be added?

Thanks,
Antonio
Comment 6 Antonio Ospite 2021-03-11 14:33:56 UTC
BTW in my case the unhandled instruction was in an optimized version of memset, I was able to revert to a non-optimized version and progress further in my debugging session.

I am mentioning this because looking up optimized version of memset could give a hint about how to make a reproducible test case.
Comment 7 Joseph 2021-07-26 03:21:16 UTC
Warning: set address range perms: large range
Comment 8 Joseph 2021-07-26 03:21:55 UTC
valgrind ./test_nng
==66046== Memcheck, a memory error detector
==66046== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==66046== Using Valgrind-3.18.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==66046== Command: ./test_nng
==66046==
==66046== Warning: set address range perms: large range [0x7fff20187000, 0x80001ff87000) (defined)
==66046== Warning: set address range perms: large range [0x7fff2045b000, 0x7fff7fe4f000) (defined)
==66046== Warning: set address range perms: large range [0x7fff8e3db000, 0x7fffc0187000) (noaccess)
==66046== Warning: set address range perms: large range [0x7fffc0187000, 0x7fffe2fa7000) (defined)
==66046== Warning: set address range perms: large range [0x7fffe2fa7000, 0x7fffffe00000) (noaccess)
--66046-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--66046-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--66046-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
--66046-- WARNING: unhandled amd64-darwin syscall: unix:228
--66046-- You may be able to write your own handler.
--66046-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--66046-- Nevertheless we consider this a bug.  Please report
--66046-- it at http://valgrind.org/support/bug_reports.html.
--66046-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 8 times)
past compress chan tests
before decode
read bytes
vex amd64->IR: unhandled instruction bytes: 0x62 0xD3 0x45 0x8 0x1E 0xC8 0x6 0x62 0xD3 0x6D
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==66046== valgrind: Unrecognised instruction at address 0x10001ca6b.
==66046==    at 0x10001CA6B: HuffmanTree_makeFromLengths2 (lodepng.c:1034)
==66046==    by 0x100005805: lodepng_inflatev (lodepng.c:1216)
==66046==    by 0x1000095A3: lodepng_zlib_decompressv (lodepng.c:1927)
==66046==    by 0x100012C94: lodepng_decode (lodepng.c:2878)
==66046==    by 0x100017C4E: decode_wrapper (lodepng.c:7565)
==66046==    by 0x100033381: test_nng_helper (test_nng_helper.c:37)
==66046==    by 0x10000252A: main (test.c:14)
==66046== Your program just tried to execute an instruction that Valgrind
==66046== did not recognise.  There are two possible reasons for this.
==66046== 1. Your program has a bug and erroneously jumped to a non-code
==66046==    location.  If you are running Memcheck and you just saw a
==66046==    warning about a bad jump, it's probably your program's fault.
==66046== 2. The instruction is legitimate but Valgrind doesn't handle it,
==66046==    i.e. it's Valgrind's fault.  If you think this is the case or
==66046==    you are not sure, please let us know and we'll try to fix it.
==66046== Either way, Valgrind will now raise a SIGILL signal which will
==66046== probably kill your program.
==66046==
==66046== Process terminating with default action of signal 4 (SIGILL)
==66046==  Illegal opcode at address 0x10001CA6B
==66046==    at 0x10001CA6B: HuffmanTree_makeFromLengths2 (lodepng.c:1034)
==66046==    by 0x100005805: lodepng_inflatev (lodepng.c:1216)
==66046==    by 0x1000095A3: lodepng_zlib_decompressv (lodepng.c:1927)
==66046==    by 0x100012C94: lodepng_decode (lodepng.c:2878)
==66046==    by 0x100017C4E: decode_wrapper (lodepng.c:7565)
==66046==    by 0x100033381: test_nng_helper (test_nng_helper.c:37)
==66046==    by 0x10000252A: main (test.c:14)
==66046==
==66046== HEAP SUMMARY:
==66046==     in use at exit: 0 bytes in 0 blocks
==66046==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==66046==
==66046== All heap blocks were freed -- no leaks are possible
==66046==
==66046== For lists of detected and suppressed errors, rerun with: -s
==66046== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[1]    66046 illegal hardware instruction  valgrind ./test_nng
Comment 9 Reimar Döffinger 2024-04-24 20:03:12 UTC
From https://github.com/LouisBrunner/valgrind-macos/issues/56 :
A quick hack to implement the instruction.
It does not address other points raised, like reading DCZID_EL0 maybe should flag the instruction as supported and indicate the size.
I felt it safer to discourage software from using this instruction by keeping the current implementation reporting it as not supported.
Patch for reference for anyone who wants to work on something clean enough to merge:

diff --git a/VEX/priv/guest_arm64_toIR.c b/VEX/priv/guest_arm64_toIR.c
index 51c949def..9d04c4303 100644
--- a/VEX/priv/guest_arm64_toIR.c
+++ b/VEX/priv/guest_arm64_toIR.c
@@ -7948,6 +7948,30 @@ Bool dis_ARM64_branch_etc(/*MB_OUT*/DisResult* dres, UInt insn,
       return True;
    }
 
+   /* ------------------ DC_ZVA ------------------ */
+   /* D5 0B 74 001 Rt  dc zva, rT
+   */
+   if ((INSN(31,0) & 0xFFFFFFE0) == 0xD50B7420) {
+      /* Exactly the same scheme as for IC IVAU, except we observe the
+         dMinLine size. */
+      /* We will always be provided with a valid dMinLine value. */
+      vassert(archinfo->arm64_dMinLine_lg2_szB >= 2
+              && archinfo->arm64_dMinLine_lg2_szB <= 17);
+      /* Round the requested address, in rT, down to the start of the
+         containing block. */
+      UInt   tt      = INSN(4,0);
+      ULong  lineszB = 1ULL << archinfo->arm64_dMinLine_lg2_szB;
+      IRTemp addr    = newTemp(Ity_I64);
+      assign( addr, binop( Iop_And64,
+                           getIReg64orZR(tt),
+                           mkU64(~(lineszB - 1))) );
+      for (ULong o = 0; o < lineszB; o += 8)
+      {
+          storeLE(binop(Iop_Add64,mkexpr(addr),mkU64(o)), mkU64(0));
+      }
+      DIP("dc zva, %s\n", nameIReg64orZR(tt));
+      return True;
+   }
    /* ------------------ DC_CVAU ------------------ */
    /* D5 0B 7B 001 Rt  dc cvau, rT
       D5 0B 7E 001 Rt  dc civac, rT
Comment 10 Paul Floyd 2024-05-11 13:36:58 UTC
In addition to the opcode this will also require that MRS DCZID_EL0 reads the DZP flag correctly - currently it is hard coded to 1 (which prohibits fc zva).

https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/DCZID-EL0--Data-Cache-Zero-ID-register?lang=en
Comment 11 Paul Floyd 2024-05-11 16:24:01 UTC
commit 8b6c36ecf14530d7f50f65697c2a039e3d410092 (HEAD -> master, origin/master, origin/HEAD)
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Sat May 11 18:10:03 2024 +0200

    Bug 377966 - arm64 unhandled instruction dc zva
    
    With contributions from
            Louis Brunner https://github.com/LouisBrunner/valgrind-macos