I ran valgrind 3.13.0 with my AVX program and it died with this message. vex amd64->IR: unhandled instruction bytes: 0xC5 0x79 0xD6 0xC9 0xC4 0xE3 0x7D 0x18 0xC1 0x1 vex amd64->IR: REX=0 REX.W=0 REX.R=1 REX.X=0 REX.B=0 vex amd64->IR: VEX=1 VEX.L=0 VEX.nVVVV=0x0 ESC=0F vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0 ==17131== valgrind: Unrecognised instruction at address 0xe888c95. I found the instruction was vmovq between xmm. => 0x000000000e888c95 <+213>: vmovq %xmm9,%xmm1 I can reproduce this problem by this code: int main() { asm("vmovq %xmm9, %xmm1"); return 0; } And found that "%xmm8, %xmm0" and "%xmm15, %xmm7" killed valgrind with similar message but "%xmm15, %xmm8" didn't. I also tried "%xmm0, %xmm8" and "%xmm8, %xmm9" and valgrind worked with them. It seems valgrind cannot handle vmovq from xmm8-15 to xmm0-7.
The same issue exists from at least version 3.11.0 (on Ubuntu 16.04.4 LTS)
In effect, within version 3.13 of Valgrind having vmovq xmm[8-15], xmm[0-7] causes it to throw the previously said error.
Created attachment 114656 [details] Patch for the issue Tracking the issue I noted that hex dissasembly for vmovq xmm[8-15], xmm[0-7] is matches the following pattern: 0xC5 0x79 0xD6 0xC[0-7] The last value states which register is the source, being the register xmm[value+8]. After analyzing the problem I noted that it was caused as the VEX standard was upgraded to add new XMM registers (from xmm8 to xmm15) and as Intel standard defined only 3bits to indicate both source and destination register they had to add a new VEX opcode in order to operate with the new registers. Whenever you perform a vmovq xmm[0-7], xmm[0-7] instruction the opcode is 0x7E but instead If you perform vmovq xmm[8-15], xmm[0-7] the opcode is 0xD6. Thing is D6 VEX opcode apparently were previously used but not having an XMM register as a source, so within the code that was not handled as by that time there was no test case. So the solution is as easy as implement this specific case , but adding 8 to the source register. I attach a patch for the issue. It would be great If you could tell me If it's consistent. So far the patch is working for me.
Comment on attachment 114656 [details] Patch for the issue >diff --git a/VEX/priv/guest_amd64_toIR.c b/VEX/priv/guest_amd64_toIR.c >index 9073e1d..9229e53 100644 >--- a/VEX/priv/guest_amd64_toIR.c >+++ b/VEX/priv/guest_amd64_toIR.c >@@ -26876,15 +26876,19 @@ Long dis_ESC_0F__VEX ( > UChar modrm = getUChar(delta); > UInt rG = gregOfRexRM(pfx,modrm); > if (epartIsReg(modrm)) { >- /* fall through, awaiting test case */ >- /* dst: lo half copied, hi half zeroed */ >+ // In this case is VEX.128.66.0F.WIG D6 /r = VMOVQ xmm8-15/m64, xmm0-7 >+ UInt rE = eregOfRexRM(pfx,modrm) + 8; >+ DIP("vmovq %s,%s\n", nameXMMReg(rG), nameIReg64(rE)); >+ putIReg64(rE, getXMMRegLane64(rG, 0)); >+ delta += 1; > } else { > addr = disAMode ( &alen, vbi, pfx, delta, dis_buf, 0 ); > storeLE( mkexpr(addr), getXMMRegLane64( rG, 0 )); > DIP("vmovq %s,%s\n", nameXMMReg(rG), dis_buf ); > delta += alen; >- goto decode_success; > } >+ >+ goto decode_success; > } > break; >
commit 10a22445d747817932692b1c1ee3faa726121cb4 Author: Mark Wielaard <mark@klomp.org> Date: Sun Jun 30 20:17:32 2024 +0200 Implement VMOVQ xmm1, xmm2/m64 We implemented the memory variant already, but not the reg variant. Add a separate avx-vmovq testcase, because avx-1 is already really big. https://bugs.kde.org/show_bug.cgi?id=391148 https://bugs.kde.org/show_bug.cgi?id=417572 https://bugs.kde.org/show_bug.cgi?id=489088