Bug 360415 - amd64 instructions ADCX and ADOX are not implemented in VEX
Summary: amd64 instructions ADCX and ADOX are not implemented in VEX
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.12 SVN
Platform: unspecified Unspecified
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 372828 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-03-11 15:12 UTC by jacobly.alt
Modified: 2018-02-14 18:02 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Adds ADCX and ADOX to VEX. (26.01 KB, patch)
2016-03-11 15:15 UTC, jacobly.alt
Details
Fixed 32-bit instructions in test-amd64. (27.68 KB, patch)
2017-04-25 19:23 UTC, jacobly.alt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jacobly.alt 2016-03-11 15:12:54 UTC
I have written and tested a patch that adds these instructions to VEX guest amd64.

Reproducible: Always

Steps to Reproduce:
Run a program under valgrind that executes an ADCX or ADOX instruction.

Actual Results:  
 $ valgrind VEX/test/test-amd64 >/dev/null
==30659== Memcheck, a memory error detector
==30659== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==30659== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==30659== Command: VEX/test/test-amd64
==30659== 
vex amd64->IR: unhandled instruction bytes: 0x66 0x48 0xF 0x38 0xF6 0xD1 0x9C 0x58
vex amd64->IR:   REX=1 REX.W=1 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38
vex amd64->IR:   PFX.66=1 PFX.F2=0 PFX.F3=0
==30659== valgrind: Unrecognised instruction at address 0x404713.
==30659==    at 0x404713: exec_adcxq (test-amd64.h:69)
==30659==    by 0x4048E6: exec_adcx (test-amd64.h:123)
==30659==    by 0x404981: test_adcx (test-amd64.h:166)
==30659==    by 0x410EA5: main (test-amd64.c:1699)
==30659== Your program just tried to execute an instruction that Valgrind
==30659== did not recognise.  There are two possible reasons for this.
==30659== 1. Your program has a bug and erroneously jumped to a non-code
==30659==    location.  If you are running Memcheck and you just saw a
==30659==    warning about a bad jump, it's probably your program's fault.
==30659== 2. The instruction is legitimate but Valgrind doesn't handle it,
==30659==    i.e. it's Valgrind's fault.  If you think this is the case or
==30659==    you are not sure, please let us know and we'll try to fix it.
==30659== Either way, Valgrind will now raise a SIGILL signal which will
==30659== probably kill your program.
==30659== 
==30659== Process terminating with default action of signal 4 (SIGILL)
==30659==  Illegal opcode at address 0x404713
==30659==    at 0x404713: exec_adcxq (test-amd64.h:69)
==30659==    by 0x4048E6: exec_adcx (test-amd64.h:123)
==30659==    by 0x404981: test_adcx (test-amd64.h:166)
==30659==    by 0x410EA5: main (test-amd64.c:1699)
==30659== 
==30659== HEAP SUMMARY:
==30659==     in use at exit: 0 bytes in 0 blocks
==30659==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==30659== 
==30659== All heap blocks were freed -- no leaks are possible
==30659== 
==30659== For counts of detected and suppressed errors, rerun with: -v
==30659== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction


Expected Results:  
 $ diff -s <(VEX/test/test-amd64) <(./vg-in-place VEX/test/test-amd64)
==9408== Memcheck, a memory error detector
==9408== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==9408== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==9408== Command: VEX/test/test-amd64
==9408== 
==9408== 
==9408== HEAP SUMMARY:
==9408==     in use at exit: 0 bytes in 0 blocks
==9408==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==9408== 
==9408== All heap blocks were freed -- no leaks are possible
==9408== 
==9408== For counts of detected and suppressed errors, rerun with: -v
==9408== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Files /dev/fd/63 and /dev/fd/62 are identical


MULX is already implemented since it is part of BMI2.
Comment 1 jacobly.alt 2016-03-11 15:15:33 UTC
Created attachment 97845 [details]
Adds ADCX and ADOX to VEX.
Comment 2 Julian Seward 2016-07-20 17:33:01 UTC
jacobly.alt, what context (library, use case, etc) do these instructions appear in?
I am surprised they don't get complained-about more.
Comment 3 jacobly.alt 2016-11-11 09:28:26 UTC
Julian Seward, I think these instructions are only on Intel processors, so the only way to get them is to compile with -march=native or similar.  Since I compiled my glibc with -march=native, it contains these instructions in some initialization sequence, so I was unable to run any program under valgrind.
Comment 4 Jeffrey Walton 2017-04-25 12:53:29 UTC
I'm working from the SVN sources, and the issue still exists. It appears something is amiss with the patch.

The program was built with Clang on a Skylake machine. Its not in the big integer part of the library. Clang is using it for regular math.

**********

Here's what GDB shows:

Breakpoint 2, ECP::EncodedPointSize (this=0x7fffffffb3f8,
    compressed=0x0) at ./ecp.h:75
75                      {return 1 + (compressed?1:2)*GetField().MaxElementByteLength();}
(gdb) disass
Dump of assembler code for function ECP::EncodedPointSize(bool) const:
=> 0x0000000000594750 <+0>:     push   %rbx
   0x0000000000594751 <+1>:     xor    %eax,%eax
   0x0000000000594753 <+3>:     cmp    $0x1,%sil
   0x0000000000594757 <+7>:     mov    $0x1,%ebx
   0x000000000059475c <+12>:    adcx   %eax,%ebx
   0x0000000000594761 <+17>:    callq  0x475eb0 <ECP::GetField() const>
   0x0000000000594766 <+22>:    mov    %rax,%rdi
   0x0000000000594769 <+25>:    callq  0x521020 <ModularArithmetic::MaxElementByteLength() const>
   0x000000000059476e <+30>:    imul   %ebx,%eax
   0x0000000000594771 <+33>:    add    $0x1,%eax
   0x0000000000594774 <+36>:    pop    %rbx
   0x0000000000594775 <+37>:    retq
End of assembler dump.


**********

Here's what Valgrind shows:

valgrind ./cryptest.exe v
...

HMQV validation suite running...

vex amd64->IR: unhandled instruction bytes: 0x66 0xF 0x38 0xF6 0xD8 0xE8 0x4A 0x17 0xEE 0xFF
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38
vex amd64->IR:   PFX.66=1 PFX.F2=0 PFX.F3=0
==12079== valgrind: Unrecognised instruction at address 0x59475c.
==12079==    at 0x59475C: ECP::EncodedPointSize(bool) const (ecp.h:75)
==12079==    by 0x591A3B: ECP::DecodePoint(ECPPoint&, BufferedTransformation&, unsigned long) const (ecp.cpp:107)
==12079==    by 0x5917DD: ECP::DecodePoint(ECPPoint&, unsigned char const*, unsigned long) const (ecp.cpp:69)
==12079==    by 0x591D66: ECP::BERDecodePoint(BufferedTransformation&) const (ecp.cpp:151)
==12079==    by 0x54E3EA: DL_GroupParameters_EC<ECP>::BERDecode(BufferedTransformation&) (eccrypto.cpp:534)
==12079==    by 0x4AAF77: Test::ValidateHMQV() (validat2.cpp:392)
==12079==    by 0x48201A: Test::ValidateAll(bool) (validat1.cpp:162)
==12079==    by 0x4369B6: Validate(int, bool, char const*) (test.cpp:916)
==12079==    by 0x432FE4: main (test.cpp:406)
==12079== Your program just tried to execute an instruction that Valgrind
==12079== did not recognise.  There are two possible reasons for this.
==12079== 1. Your program has a bug and erroneously jumped to a non-code
==12079==    location.  If you are running Memcheck and you just saw a
==12079==    warning about a bad jump, it's probably your program's fault.
==12079== 2. The instruction is legitimate but Valgrind doesn't handle it,
==12079==    i.e. it's Valgrind's fault.  If you think this is the case or
==12079==    you are not sure, please let us know and we'll try to fix it.
==12079== Either way, Valgrind will now raise a SIGILL signal which will
==12079== probably kill your program.
==12079==
==12079== Process terminating with default action of signal 4 (SIGILL)
==12079==  Illegal opcode at address 0x59475C
==12079==    at 0x59475C: ECP::EncodedPointSize(bool) const (ecp.h:75)
==12079==    by 0x591A3B: ECP::DecodePoint(ECPPoint&, BufferedTransformation&, unsigned long) const (ecp.cpp:107)
==12079==    by 0x5917DD: ECP::DecodePoint(ECPPoint&, unsigned char const*, unsigned long) const (ecp.cpp:69)
==12079==    by 0x591D66: ECP::BERDecodePoint(BufferedTransformation&) const (ecp.cpp:151)
==12079==    by 0x54E3EA: DL_GroupParameters_EC<ECP>::BERDecode(BufferedTransformation&) (eccrypto.cpp:534)
==12079==    by 0x4AAF77: Test::ValidateHMQV() (validat2.cpp:392)
==12079==    by 0x48201A: Test::ValidateAll(bool) (validat1.cpp:162)
==12079==    by 0x4369B6: Validate(int, bool, char const*) (test.cpp:916)
==12079==    by 0x432FE4: main (test.cpp:406)
==12079==
==12079== HEAP SUMMARY:
==12079==     in use at exit: 69,547 bytes in 538 blocks
==12079==   total heap usage: 210,211 allocs, 209,673 frees, 67,196,780 bytes allocated
==12079==
==12079== LEAK SUMMARY:
==12079==    definitely lost: 0 bytes in 0 blocks
==12079==    indirectly lost: 0 bytes in 0 blocks
==12079==      possibly lost: 0 bytes in 0 blocks
==12079==    still reachable: 69,547 bytes in 538 blocks
==12079==         suppressed: 0 bytes in 0 blocks
==12079== Rerun with --leak-check=full to see details of leaked memory
==12079==
==12079== For counts of detected and suppressed errors, rerun with: -v
==12079== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction (core dumped)
Comment 5 jacobly.alt 2017-04-25 19:23:44 UTC
Created attachment 105193 [details]
Fixed 32-bit instructions in test-amd64.

I disassembled the test program and noticed that 32-bit instructions weren't being assembled correctly. The new patch fixes the test, but the tests still pass and I can't reproduce this issue on 64-bit linux, even with a test program that contains:

  400490:	66 0f 38 f6 d8       	adcx   %eax,%ebx

I would expect that error for a 32-bit program, where it is unimplemented, but your valgrind output says it's using the 64-bit backend. On a side note, what compiler/version did you compile valgrind with?
Comment 6 Jeffrey Walton 2017-04-25 23:46:47 UTC
(In reply to jacobly.alt from comment #5)
> Created attachment 105193 [details]
> Fixed 32-bit instructions in test-amd64.
> 
> I disassembled the test program and noticed that 32-bit instructions weren't
> being assembled correctly. The new patch fixes the test, but the tests still
> pass and I can't reproduce this issue on 64-bit linux, even with a test
> program that contains:
> 
>   400490:	66 0f 38 f6 d8       	adcx   %eax,%ebx
> 
> I would expect that error for a 32-bit program, where it is unimplemented,
> but your valgrind output says it's using the 64-bit backend. On a side note,
> what compiler/version did you compile valgrind with?

Thanks Jacob.

It looks like there are two (maybe more) instructions there. First is the ADCX:

0x66 0xF 0x38 0xF6 0xD8

And then there's a JUMP:

0xE8 0x4A

I'm not sure what this is:

0x17 0xEE 0xFF

> ... On a side note,
> what compiler/version did you compile valgrind with?

I should have provided this earlier. Sorry about that. The system details are as follows. Everything is fully patched.

$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: Fedora
Description:    Fedora release 25 (Twenty Five)
Release:        25
Codename:       TwentyFive

$ clang++ --version
clang version 3.9.1 (tags/RELEASE_391/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 94
model name      : Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz
stepping        : 3
microcode       : 0x9e
cpu MHz         : 900.109
cache size      : 6144 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs            :
bogomips        : 5424.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
...

Here's the reproducer (for me on this F25 on Skylark machine):

  git clone https://github.com/weidai11/cryptopp
  cd cryptopp
  CXX=clang++ CXXFLAGS="-DNDEBUG -g3 -O1 -std=c++03" make -j 9
Comment 7 Jeffrey Walton 2017-04-26 00:09:53 UTC
(In reply to Jeffrey Walton from comment #6)
> (In reply to jacobly.alt from comment #5)
> > Created attachment 105193 [details]
> > Fixed 32-bit instructions in test-amd64.
> > 
> > ...
> 
> > ... On a side note,
> > what compiler/version did you compile valgrind with?
> 
> ...

Crap... I still did not answer that question. Valgrind was compiled with:

$ gcc --version
gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
Copyright (C) 2016 Free Software Foundation, Inc.

This may be relevant... The code in question is at https://github.com/weidai11/cryptopp/blob/master/ecp.h#L73. The issue only arises under Clang with -std=c++03 or -std=c++11.
Comment 8 Jeffrey Walton 2017-04-26 00:25:34 UTC
(In reply to jacobly.alt from comment #5)
> Created attachment 105193 [details]
> Fixed 32-bit instructions in test-amd64.
> 
> I disassembled the test program and noticed that 32-bit instructions weren't
> being assembled correctly. The new patch fixes the test, but the tests still
> pass and I can't reproduce this issue on 64-bit linux, even with a test
> program that contains:
> 
>   400490:	66 0f 38 f6 d8       	adcx   %eax,%ebx
> 
> I would expect that error for a 32-bit program, where it is unimplemented,
> but your valgrind output says it's using the 64-bit backend. On a side note,
> what compiler/version did you compile valgrind with?

Confirmed. The patch cleared the issue. Thank you very much.
Comment 9 jacobly.alt 2017-04-26 03:35:56 UTC
Considering the new patch didn't change any valgrind code, I'm going to assume the patch was just too old to apply properly to trunk.
Comment 10 Jeffrey Walton 2017-04-26 03:56:57 UTC
(In reply to jacobly.alt from comment #9)
> Considering the new patch didn't change any valgrind code, I'm going to
> assume the patch was just too old to apply properly to trunk.

Oh, my bad...

I did not apply the patch from the 97845 attachment. I assumed it was already applied to SVN master since a year had passed.

I did apply the patch from the 105193 attachment.
Comment 11 Jeffrey Walton 2017-04-26 15:58:19 UTC
Here's the reduced test case.

$ cat test.c
#include <stdint.h>

int main(int argc, char* argv[])
{
    uint32_t a = argc*4, b = argc*8;
    asm ("adcxl %1, %0" : "+r"(a) : "rm"(b));
    return a & 0xf;
}

A bug was filed with Fedora at https://bugzilla.redhat.com/show_bug.cgi?id=1445829.
Comment 12 Jeffrey Walton 2017-04-26 18:13:35 UTC
This question came up on the Fedora bug report... Why was the patch not accepted?
Comment 13 Ivo Raisr 2017-05-09 08:56:54 UTC
*** Bug 372828 has been marked as a duplicate of this bug. ***
Comment 14 Julian Seward 2017-05-12 13:34:02 UTC
(In reply to jacobly.alt from comment #5)
> Created attachment 105193 [details]
> Fixed 32-bit instructions in test-amd64.

Committed, with modifications, vex r3365.  Thanks for the patch.
Note that the instructions have been gated on running on an AVX2
platform, since in fact any processor that implements them will
also implement AVX2 (afaics).
Comment 15 Julian Seward 2017-05-14 07:57:23 UTC
Test cases in valgrind r16372.
Comment 16 Ivo Raisr 2017-05-15 09:24:56 UTC
Running the latest Valgrind on regression test fb_test_amd64 built with gcc 5.4.0 gives the following crash:

vex amd64->IR: unhandled instruction bytes: 0x66 0x4D 0xF 0x38 0xF6 0xC5 0x9C 0x58 0x48 0x83
vex amd64->IR:   REX=1 REX.W=1 REX.R=1 REX.X=0 REX.B=1
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38
vex amd64->IR:   PFX.66=1 PFX.F2=0 PFX.F3=0
==1119== valgrind: Unrecognised instruction at address 0x409768.
==1119==    at 0x409768: exec_adcx (fb_test_amd64.h:69)
==1119==    by 0x409806: test_adcx (fb_test_amd64.h:166)
==1119==    by 0x411652: main (fb_test_amd64.c:1201)

Disassembly indeed shows it is adcx:

   0x0000000000409767 <+405>:   popfq
=> 0x0000000000409768 <+406>:   adcx   %r13,%r8
   0x000000000040976e <+412>:   pushfq

(gdb) x/16xb 0x409768
0x409768 <exec_adcx+406>:       0x66    0x4d    0x0f    0x38    0xf6    0xc5    0x9c    0x58
0x409770 <exec_adcx+414>:       0x48    0x83    0xec    0x08    0x25    0xd5    0x08    0x00

./vg-in-place --version -v
valgrind-3.13.0.SVN-16374-vex-3369

Is it possible that some paths in adcx decoding are not handled properly?
Comment 17 Julian Seward 2017-05-15 09:34:40 UTC
Ah yes, I remembered about this and then forgot it again.

The test case uses ADOX etc, and runs unconditionally, but the
fix only decodes them on AVX2 capable machines and above.  So the
test case needs to be conditionalised to only run on AVX2.
I'll fix it.
Comment 18 Ivo Raisr 2017-05-17 11:07:13 UTC
Follow up fix in SVN r16393.
Comment 19 Ivo Raisr 2017-05-18 09:48:34 UTC
And follow up VEX commit in SVN r3376.
Comment 20 Julian Seward 2018-02-14 18:02:14 UTC
The implementation may be buggy.  Details in bug 384930.