I have written and tested a patch that adds these instructions to VEX guest amd64. Reproducible: Always Steps to Reproduce: Run a program under valgrind that executes an ADCX or ADOX instruction. Actual Results: $ valgrind VEX/test/test-amd64 >/dev/null ==30659== Memcheck, a memory error detector ==30659== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==30659== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==30659== Command: VEX/test/test-amd64 ==30659== vex amd64->IR: unhandled instruction bytes: 0x66 0x48 0xF 0x38 0xF6 0xD1 0x9C 0x58 vex amd64->IR: REX=1 REX.W=1 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38 vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0 ==30659== valgrind: Unrecognised instruction at address 0x404713. ==30659== at 0x404713: exec_adcxq (test-amd64.h:69) ==30659== by 0x4048E6: exec_adcx (test-amd64.h:123) ==30659== by 0x404981: test_adcx (test-amd64.h:166) ==30659== by 0x410EA5: main (test-amd64.c:1699) ==30659== Your program just tried to execute an instruction that Valgrind ==30659== did not recognise. There are two possible reasons for this. ==30659== 1. Your program has a bug and erroneously jumped to a non-code ==30659== location. If you are running Memcheck and you just saw a ==30659== warning about a bad jump, it's probably your program's fault. ==30659== 2. The instruction is legitimate but Valgrind doesn't handle it, ==30659== i.e. it's Valgrind's fault. If you think this is the case or ==30659== you are not sure, please let us know and we'll try to fix it. ==30659== Either way, Valgrind will now raise a SIGILL signal which will ==30659== probably kill your program. ==30659== ==30659== Process terminating with default action of signal 4 (SIGILL) ==30659== Illegal opcode at address 0x404713 ==30659== at 0x404713: exec_adcxq (test-amd64.h:69) ==30659== by 0x4048E6: exec_adcx (test-amd64.h:123) ==30659== by 0x404981: test_adcx (test-amd64.h:166) ==30659== by 0x410EA5: main (test-amd64.c:1699) ==30659== ==30659== HEAP SUMMARY: ==30659== in use at exit: 0 bytes in 0 blocks ==30659== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==30659== ==30659== All heap blocks were freed -- no leaks are possible ==30659== ==30659== For counts of detected and suppressed errors, rerun with: -v ==30659== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Illegal instruction Expected Results: $ diff -s <(VEX/test/test-amd64) <(./vg-in-place VEX/test/test-amd64) ==9408== Memcheck, a memory error detector ==9408== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==9408== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info ==9408== Command: VEX/test/test-amd64 ==9408== ==9408== ==9408== HEAP SUMMARY: ==9408== in use at exit: 0 bytes in 0 blocks ==9408== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==9408== ==9408== All heap blocks were freed -- no leaks are possible ==9408== ==9408== For counts of detected and suppressed errors, rerun with: -v ==9408== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Files /dev/fd/63 and /dev/fd/62 are identical MULX is already implemented since it is part of BMI2.
Created attachment 97845 [details] Adds ADCX and ADOX to VEX.
jacobly.alt, what context (library, use case, etc) do these instructions appear in? I am surprised they don't get complained-about more.
Julian Seward, I think these instructions are only on Intel processors, so the only way to get them is to compile with -march=native or similar. Since I compiled my glibc with -march=native, it contains these instructions in some initialization sequence, so I was unable to run any program under valgrind.
I'm working from the SVN sources, and the issue still exists. It appears something is amiss with the patch. The program was built with Clang on a Skylake machine. Its not in the big integer part of the library. Clang is using it for regular math. ********** Here's what GDB shows: Breakpoint 2, ECP::EncodedPointSize (this=0x7fffffffb3f8, compressed=0x0) at ./ecp.h:75 75 {return 1 + (compressed?1:2)*GetField().MaxElementByteLength();} (gdb) disass Dump of assembler code for function ECP::EncodedPointSize(bool) const: => 0x0000000000594750 <+0>: push %rbx 0x0000000000594751 <+1>: xor %eax,%eax 0x0000000000594753 <+3>: cmp $0x1,%sil 0x0000000000594757 <+7>: mov $0x1,%ebx 0x000000000059475c <+12>: adcx %eax,%ebx 0x0000000000594761 <+17>: callq 0x475eb0 <ECP::GetField() const> 0x0000000000594766 <+22>: mov %rax,%rdi 0x0000000000594769 <+25>: callq 0x521020 <ModularArithmetic::MaxElementByteLength() const> 0x000000000059476e <+30>: imul %ebx,%eax 0x0000000000594771 <+33>: add $0x1,%eax 0x0000000000594774 <+36>: pop %rbx 0x0000000000594775 <+37>: retq End of assembler dump. ********** Here's what Valgrind shows: valgrind ./cryptest.exe v ... HMQV validation suite running... vex amd64->IR: unhandled instruction bytes: 0x66 0xF 0x38 0xF6 0xD8 0xE8 0x4A 0x17 0xEE 0xFF vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38 vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0 ==12079== valgrind: Unrecognised instruction at address 0x59475c. ==12079== at 0x59475C: ECP::EncodedPointSize(bool) const (ecp.h:75) ==12079== by 0x591A3B: ECP::DecodePoint(ECPPoint&, BufferedTransformation&, unsigned long) const (ecp.cpp:107) ==12079== by 0x5917DD: ECP::DecodePoint(ECPPoint&, unsigned char const*, unsigned long) const (ecp.cpp:69) ==12079== by 0x591D66: ECP::BERDecodePoint(BufferedTransformation&) const (ecp.cpp:151) ==12079== by 0x54E3EA: DL_GroupParameters_EC<ECP>::BERDecode(BufferedTransformation&) (eccrypto.cpp:534) ==12079== by 0x4AAF77: Test::ValidateHMQV() (validat2.cpp:392) ==12079== by 0x48201A: Test::ValidateAll(bool) (validat1.cpp:162) ==12079== by 0x4369B6: Validate(int, bool, char const*) (test.cpp:916) ==12079== by 0x432FE4: main (test.cpp:406) ==12079== Your program just tried to execute an instruction that Valgrind ==12079== did not recognise. There are two possible reasons for this. ==12079== 1. Your program has a bug and erroneously jumped to a non-code ==12079== location. If you are running Memcheck and you just saw a ==12079== warning about a bad jump, it's probably your program's fault. ==12079== 2. The instruction is legitimate but Valgrind doesn't handle it, ==12079== i.e. it's Valgrind's fault. If you think this is the case or ==12079== you are not sure, please let us know and we'll try to fix it. ==12079== Either way, Valgrind will now raise a SIGILL signal which will ==12079== probably kill your program. ==12079== ==12079== Process terminating with default action of signal 4 (SIGILL) ==12079== Illegal opcode at address 0x59475C ==12079== at 0x59475C: ECP::EncodedPointSize(bool) const (ecp.h:75) ==12079== by 0x591A3B: ECP::DecodePoint(ECPPoint&, BufferedTransformation&, unsigned long) const (ecp.cpp:107) ==12079== by 0x5917DD: ECP::DecodePoint(ECPPoint&, unsigned char const*, unsigned long) const (ecp.cpp:69) ==12079== by 0x591D66: ECP::BERDecodePoint(BufferedTransformation&) const (ecp.cpp:151) ==12079== by 0x54E3EA: DL_GroupParameters_EC<ECP>::BERDecode(BufferedTransformation&) (eccrypto.cpp:534) ==12079== by 0x4AAF77: Test::ValidateHMQV() (validat2.cpp:392) ==12079== by 0x48201A: Test::ValidateAll(bool) (validat1.cpp:162) ==12079== by 0x4369B6: Validate(int, bool, char const*) (test.cpp:916) ==12079== by 0x432FE4: main (test.cpp:406) ==12079== ==12079== HEAP SUMMARY: ==12079== in use at exit: 69,547 bytes in 538 blocks ==12079== total heap usage: 210,211 allocs, 209,673 frees, 67,196,780 bytes allocated ==12079== ==12079== LEAK SUMMARY: ==12079== definitely lost: 0 bytes in 0 blocks ==12079== indirectly lost: 0 bytes in 0 blocks ==12079== possibly lost: 0 bytes in 0 blocks ==12079== still reachable: 69,547 bytes in 538 blocks ==12079== suppressed: 0 bytes in 0 blocks ==12079== Rerun with --leak-check=full to see details of leaked memory ==12079== ==12079== For counts of detected and suppressed errors, rerun with: -v ==12079== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Illegal instruction (core dumped)
Created attachment 105193 [details] Fixed 32-bit instructions in test-amd64. I disassembled the test program and noticed that 32-bit instructions weren't being assembled correctly. The new patch fixes the test, but the tests still pass and I can't reproduce this issue on 64-bit linux, even with a test program that contains: 400490: 66 0f 38 f6 d8 adcx %eax,%ebx I would expect that error for a 32-bit program, where it is unimplemented, but your valgrind output says it's using the 64-bit backend. On a side note, what compiler/version did you compile valgrind with?
(In reply to jacobly.alt from comment #5) > Created attachment 105193 [details] > Fixed 32-bit instructions in test-amd64. > > I disassembled the test program and noticed that 32-bit instructions weren't > being assembled correctly. The new patch fixes the test, but the tests still > pass and I can't reproduce this issue on 64-bit linux, even with a test > program that contains: > > 400490: 66 0f 38 f6 d8 adcx %eax,%ebx > > I would expect that error for a 32-bit program, where it is unimplemented, > but your valgrind output says it's using the 64-bit backend. On a side note, > what compiler/version did you compile valgrind with? Thanks Jacob. It looks like there are two (maybe more) instructions there. First is the ADCX: 0x66 0xF 0x38 0xF6 0xD8 And then there's a JUMP: 0xE8 0x4A I'm not sure what this is: 0x17 0xEE 0xFF > ... On a side note, > what compiler/version did you compile valgrind with? I should have provided this earlier. Sorry about that. The system details are as follows. Everything is fully patched. $ lsb_release -a LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: Fedora Description: Fedora release 25 (Twenty Five) Release: 25 Codename: TwentyFive $ clang++ --version clang version 3.9.1 (tags/RELEASE_391/final) Target: x86_64-unknown-linux-gnu Thread model: posix $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 94 model name : Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz stepping : 3 microcode : 0x9e cpu MHz : 900.109 cache size : 6144 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5424.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ... Here's the reproducer (for me on this F25 on Skylark machine): git clone https://github.com/weidai11/cryptopp cd cryptopp CXX=clang++ CXXFLAGS="-DNDEBUG -g3 -O1 -std=c++03" make -j 9
(In reply to Jeffrey Walton from comment #6) > (In reply to jacobly.alt from comment #5) > > Created attachment 105193 [details] > > Fixed 32-bit instructions in test-amd64. > > > > ... > > > ... On a side note, > > what compiler/version did you compile valgrind with? > > ... Crap... I still did not answer that question. Valgrind was compiled with: $ gcc --version gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1) Copyright (C) 2016 Free Software Foundation, Inc. This may be relevant... The code in question is at https://github.com/weidai11/cryptopp/blob/master/ecp.h#L73. The issue only arises under Clang with -std=c++03 or -std=c++11.
(In reply to jacobly.alt from comment #5) > Created attachment 105193 [details] > Fixed 32-bit instructions in test-amd64. > > I disassembled the test program and noticed that 32-bit instructions weren't > being assembled correctly. The new patch fixes the test, but the tests still > pass and I can't reproduce this issue on 64-bit linux, even with a test > program that contains: > > 400490: 66 0f 38 f6 d8 adcx %eax,%ebx > > I would expect that error for a 32-bit program, where it is unimplemented, > but your valgrind output says it's using the 64-bit backend. On a side note, > what compiler/version did you compile valgrind with? Confirmed. The patch cleared the issue. Thank you very much.
Considering the new patch didn't change any valgrind code, I'm going to assume the patch was just too old to apply properly to trunk.
(In reply to jacobly.alt from comment #9) > Considering the new patch didn't change any valgrind code, I'm going to > assume the patch was just too old to apply properly to trunk. Oh, my bad... I did not apply the patch from the 97845 attachment. I assumed it was already applied to SVN master since a year had passed. I did apply the patch from the 105193 attachment.
Here's the reduced test case. $ cat test.c #include <stdint.h> int main(int argc, char* argv[]) { uint32_t a = argc*4, b = argc*8; asm ("adcxl %1, %0" : "+r"(a) : "rm"(b)); return a & 0xf; } A bug was filed with Fedora at https://bugzilla.redhat.com/show_bug.cgi?id=1445829.
This question came up on the Fedora bug report... Why was the patch not accepted?
*** Bug 372828 has been marked as a duplicate of this bug. ***
(In reply to jacobly.alt from comment #5) > Created attachment 105193 [details] > Fixed 32-bit instructions in test-amd64. Committed, with modifications, vex r3365. Thanks for the patch. Note that the instructions have been gated on running on an AVX2 platform, since in fact any processor that implements them will also implement AVX2 (afaics).
Test cases in valgrind r16372.
Running the latest Valgrind on regression test fb_test_amd64 built with gcc 5.4.0 gives the following crash: vex amd64->IR: unhandled instruction bytes: 0x66 0x4D 0xF 0x38 0xF6 0xC5 0x9C 0x58 0x48 0x83 vex amd64->IR: REX=1 REX.W=1 REX.R=1 REX.X=0 REX.B=1 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38 vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0 ==1119== valgrind: Unrecognised instruction at address 0x409768. ==1119== at 0x409768: exec_adcx (fb_test_amd64.h:69) ==1119== by 0x409806: test_adcx (fb_test_amd64.h:166) ==1119== by 0x411652: main (fb_test_amd64.c:1201) Disassembly indeed shows it is adcx: 0x0000000000409767 <+405>: popfq => 0x0000000000409768 <+406>: adcx %r13,%r8 0x000000000040976e <+412>: pushfq (gdb) x/16xb 0x409768 0x409768 <exec_adcx+406>: 0x66 0x4d 0x0f 0x38 0xf6 0xc5 0x9c 0x58 0x409770 <exec_adcx+414>: 0x48 0x83 0xec 0x08 0x25 0xd5 0x08 0x00 ./vg-in-place --version -v valgrind-3.13.0.SVN-16374-vex-3369 Is it possible that some paths in adcx decoding are not handled properly?
Ah yes, I remembered about this and then forgot it again. The test case uses ADOX etc, and runs unconditionally, but the fix only decodes them on AVX2 capable machines and above. So the test case needs to be conditionalised to only run on AVX2. I'll fix it.
Follow up fix in SVN r16393.
And follow up VEX commit in SVN r3376.
The implementation may be buggy. Details in bug 384930.