Bug 379525 - Support more x86 nop opcodes
Summary: Support more x86 nop opcodes
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (other bugs)
Version First Reported In: 3.13 SVN
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-04 21:28 UTC by H.J. Lu
Modified: 2017-05-31 12:38 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Proposed patch (3.22 KB, patch)
2017-05-16 16:11 UTC, Tanya
Details
C test with different opcodes (112.33 KB, text/x-csrc)
2017-05-22 16:31 UTC, Tanya
Details
Files for Valgrind test case (8.43 KB, application/x-zip-compressed)
2017-05-22 16:33 UTC, Tanya
Details
Proposed patch for 32-bit architecture (2.19 KB, patch)
2017-05-29 13:40 UTC, Tanya
Details
test without FS or GS prefixes (129.72 KB, text/x-csrc)
2017-05-30 11:49 UTC, Tanya
Details
test without FS or GS prefixes (129.72 KB, text/x-csrc)
2017-05-30 11:53 UTC, Tanya
Details
test with GS prefixes (26.68 KB, text/x-csrc)
2017-05-30 11:54 UTC, Tanya
Details
test with FS prefixes (26.68 KB, text/x-csrc)
2017-05-30 11:55 UTC, Tanya
Details

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2017-05-04 21:28:45 UTC
For x86, the follow opcodes are NOP with a register or memory operand,
which can take regular prefixes:

0f 19
0f 1c
0f 1d
0f 1e
0f 1f

[hjl@gnu-6 tmp]$ cat x.c
#include <stdio.h>

int main ()
{
  asm (".byte 0x67, 0xf2, 0xf3, 0x66, 0x0f, 0x19, 0x44, 0x0,  0x0");
  asm (".byte 0x67, 0xf2, 0xf3, 0x66, 0x0f, 0x1c, 0x44, 0x0,  0x0");
  asm (".byte 0x67, 0xf2, 0xf3, 0x66, 0x0f, 0x1d, 0x44, 0x0,  0x0");
  asm (".byte 0x67, 0xf2, 0xf3, 0x66, 0x0f, 0x1e, 0x44, 0x0,  0x0");
  asm (".byte 0x67, 0xf2, 0xf3, 0x66, 0x0f, 0x1f, 0x44, 0x0,  0x0");
  printf ("hello\n");
  return 0;
}
[hjl@gnu-6 tmp]$ gcc x.c
[hjl@gnu-6 tmp]$ 
(gdb) r
Starting program: /tmp/a.out 
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.24-5.3.fc25.x86_64
hello
[Inferior 1 (process 26135) exited normally]
(gdb) disass/r main
Dump of assembler code for function main:
   0x00000000004004f6 <+0>:	55	push   %rbp
   0x00000000004004f7 <+1>:	48 89 e5	mov    %rsp,%rbp
   0x00000000004004fa <+4>:	67 f2 f3 66 0f 19 44 00 00	repnz repz nopw 0x0(%eax,%eax,1)
   0x0000000000400503 <+13>:	67 f2 f3 66 0f 1c 44 00 00	repnz repz nopw 0x0(%eax,%eax,1)
   0x000000000040050c <+22>:	67 f2 f3 66 0f 1d 44 00 00	repnz repz nopw 0x0(%eax,%eax,1)
   0x0000000000400515 <+31>:	67 f2 f3 66 0f 1e 44 00 00	repnz repz nopw 0x0(%eax,%eax,1)
   0x000000000040051e <+40>:	67 f2 f3 66 0f 1f 44 00 00	repnz repz nopw 0x0(%eax,%eax,1)
   0x0000000000400527 <+49>:	bf d0 05 40 00	mov    $0x4005d0,%edi
   0x000000000040052c <+54>:	e8 bf fe ff ff	callq  0x4003f0 <puts@plt>
   0x0000000000400531 <+59>:	b8 00 00 00 00	mov    $0x0,%eax
   0x0000000000400536 <+64>:	5d	pop    %rbp
   0x0000000000400537 <+65>:	c3	retq   
End of assembler dump.
(gdb) 

[hjl@gnu-6 tmp]$ valgrind ./a.out
==26187== Memcheck, a memory error detector
==26187== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==26187== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==26187== Command: ./a.out
==26187== 
vex amd64->IR: unhandled instruction bytes: 0x67 0xF2 0xF3 0x66 0xF 0x19 0x44 0x0 0x0 0x67
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=1 PFX.F2=1 PFX.F3=1
==26187== valgrind: Unrecognised instruction at address 0x4004fa.
==26187==    at 0x4004FA: main (in /tmp/a.out)
==26187== Your program just tried to execute an instruction that Valgrind
==26187== did not recognise.  There are two possible reasons for this.
==26187== 1. Your program has a bug and erroneously jumped to a non-code
==26187==    location.  If you are running Memcheck and you just saw a
==26187==    warning about a bad jump, it's probably your program's fault.
==26187== 2. The instruction is legitimate but Valgrind doesn't handle it,
==26187==    i.e. it's Valgrind's fault.  If you think this is the case or
==26187==    you are not sure, please let us know and we'll try to fix it.
==26187== Either way, Valgrind will now raise a SIGILL signal which will
==26187== probably kill your program.
==26187== 
==26187== Process terminating with default action of signal 4 (SIGILL)
==26187==  Illegal opcode at address 0x4004FA
==26187==    at 0x4004FA: main (in /tmp/a.out)
==26187== 
==26187== HEAP SUMMARY:
==26187==     in use at exit: 0 bytes in 0 blocks
==26187==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==26187== 
==26187== All heap blocks were freed -- no leaks are possible
==26187== 
==26187== For counts of detected and suppressed errors, rerun with: -v
==26187== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction
[hjl@gnu-6 tmp]$
Comment 1 Ivo Raisr 2017-05-08 07:35:07 UTC
Thank you for the report.
Could you provide a patch which implements the missing functionality?
Comment 2 Julian Seward 2017-05-10 08:53:50 UTC
HJ, what's the use case here?  Where are these coming from?
Not from gcc-7.x AFAICT.  So .. where?
Comment 3 H.J. Lu 2017-05-10 14:52:26 UTC
(In reply to Julian Seward from comment #2)
> HJ, what's the use case here?  Where are these coming from?
> Not from gcc-7.x AFAICT.  So .. where?

NOP opcodes are turned into real instructions on new processors
from time to time. MPX is one example.  CET:

https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

is a new one.  It is safe for Valgrind to treat those NOPs as NOP.
Comment 4 Tanya 2017-05-16 16:11:35 UTC
Created attachment 105595 [details]
Proposed patch

Proposed a patch that implements the missing opcodes and allows them to have any prefixes or operands.
Comment 5 Julian Seward 2017-05-22 08:01:46 UTC
(In reply to Tanya from comment #4)
> Proposed a patch that implements the missing opcodes and allows them to have
> any prefixes or operands.

Tanya, thanks for the patch.  Given that we are very close to freezing
for 3.13 and that this is really our last chance to get decoder fixes in
for the next several months, I'd like to commit this soon -- this week,
if the patch is OK.

Do you have a test case for this?  I think that is important.
Comment 6 Tanya 2017-05-22 16:31:36 UTC
Created attachment 105676 [details]
C test with different opcodes

C application with NOPs with various opcode, prefix and ModRM combinations
Comment 7 Tanya 2017-05-22 16:33:17 UTC
Created attachment 105677 [details]
Files for Valgrind test case

attempt to make Valgrind test case
Comment 8 Tanya 2017-05-22 17:00:19 UTC
(In reply to Julian Seward from comment #5)
> (In reply to Tanya from comment #4)
> > Proposed a patch that implements the missing opcodes and allows them to have
> > any prefixes or operands.
> 
> Tanya, thanks for the patch.  Given that we are very close to freezing
> for 3.13 and that this is really our last chance to get decoder fixes in
> for the next several months, I'd like to commit this soon -- this week,
> if the patch is OK.
> 
> Do you have a test case for this?  I think that is important.

Julian,

Added a C application with different opcode/prefix/ModRM combinations (provided by H.J. Lu). 

I'm not sure how to properly write Valgrind test cases. Attached a test with all opcode/prefix/ModRM combinations; please let me know if I should only leave a few.
Comment 9 Julian Seward 2017-05-24 14:30:32 UTC
Committed, r3383, r16415.  Thanks for the patches.
Comment 10 Ivo Raisr 2017-05-25 12:27:03 UTC
I get the following error output from Valgrind run:

$ ./vg-in-place --tool=none none/tests/amd64/cet_nops
==13161== Nulgrind, the minimal Valgrind tool
==13161== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote.
==13161== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info
==13161== Command: none/tests/amd64/cet_nops
==13161== 
start doing absolutely nothing ..
vex amd64->IR: unhandled instruction bytes: 0x65 0xF 0x19 0xFF 0x66 0x65 0xF 0x19 0xFF 0xF2
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==13161== valgrind: Unrecognised instruction at address 0x401144.
==13161==    at 0x401144: main (cet_nops.c:42)
==13161== Your program just tried to execute an instruction that Valgrind
==13161== did not recognise.  There are two possible reasons for this.
==13161== 1. Your program has a bug and erroneously jumped to a non-code
==13161==    location.  If you are running Memcheck and you just saw a
==13161==    warning about a bad jump, it's probably your program's fault.
==13161== 2. The instruction is legitimate but Valgrind doesn't handle it,
==13161==    i.e. it's Valgrind's fault.  If you think this is the case or
==13161==    you are not sure, please let us know and we'll try to fix it.
==13161== Either way, Valgrind will now raise a SIGILL signal which will
==13161== probably kill your program.
==13161== 
==13161== Process terminating with default action of signal 4 (SIGILL): dumping core
==13161==  Illegal opcode at address 0x401144
==13161==    at 0x401144: main (cet_nops.c:42)
==13161== 
./vg-in-place: line 29: 13161: Illegal instruction(coredump)
Illegal Instruction (core dumped)


./vg-in-place --version -v
valgrind-3.13.0.SVN-16420-vex-3384


$ gcc --version
gcc (GCC) 5.4.0
Copyright (C) 2015 Free Software Foundation, Inc.
Comment 11 Tanya 2017-05-25 14:03:15 UTC
(In reply to Ivo Raisr from comment #10)

Ivo,
I cannot reproduce the issue on the latest Git sources with the 379525 patch (amd64 linux, gcc 6.2.0). I cannot get the latest SVN sources (firewall blocks the access, and the "socat" redirection did not work for me.)
Would it be possible to update the Git repository with VALGRIND_3_13_BRANCH? 

Thank you,
Tanya
Comment 12 Julian Seward 2017-05-25 15:32:19 UTC
Ivo, I am a bit surprised by this.  The new insn decoder bits are not
dependent on the host's hw caps -- this is handled by dis_ESC_0F, which is
available at any hwcaps level (for example, it handles RDTSC, SYSCALL, and
conditional branches).  Are you sure guest_amd64_toIR.c got recompiled here?
Comment 13 Ivo Raisr 2017-05-26 11:41:52 UTC
I've updated GIT repo at sourceware.org.

So the test program under Valgrind crashes at cet_nop.c:42 which is this line:

42      __asm__ __volatile__ (".byte 0xf3, 0x66, 0x64, 0x0f, 0x19, 0xff" :::"cc","memory");

GDB disassembles it as "gs nop %edi". It is actually the first "nop" with 'gs' prefix. Perhaps this is the smoking gun?
Comment 14 Ivo Raisr 2017-05-26 12:05:36 UTC
And this is a minimalistic reproducer with the latest Valgrind sources:

--------------------------
#include <stdio.h>

int main ()
{
__asm__ __volatile__ (".byte 0x65, 0x0f, 0x19, 0xff" :::"cc","memory");
  return 0;
}
Comment 15 Ivo Raisr 2017-05-26 12:24:54 UTC
So the real "culprit" is that on:

amd64/Linux:
guest_amd64_assume_fs_is_const = True
guest_amd64_assume_gs_is_const = True

amd64/Darwin:
guest_amd64_assume_fs_is_const = False
guest_amd64_assume_gs_is_const = True

amd64/Solaris:
guest_amd64_assume_fs_is_const = True
guest_amd64_assume_gs_is_const = True

And amd64 decoder in disInstr_AMD64_WRK(), around lines 32238-32243 fails with:
------------
   /* We have a %fs prefix.  Reject it if there's no evidence in 'vbi'
      that we should accept it. */
   if ((pfx & PFX_FS) && !vbi->guest_amd64_assume_fs_is_const)
      goto decode_failure;

   /* Ditto for %gs prefixes. */
   if ((pfx & PFX_GS) && !vbi->guest_amd64_assume_gs_is_const)
      goto decode_failure;
-----------


This means that the test cases in cet_nops are probably not valid for all OSes.
Only Linux can have both "fs" and "gs" prefixes, OS X only "gs" and Solaris only "fs".

I think this could be easily fixed by simple #ifdefery in cet_ops.c.
Comment 16 H.J. Lu 2017-05-26 14:47:31 UTC
(In reply to Ivo Raisr from comment #14)
> And this is a minimalistic reproducer with the latest Valgrind sources:
> 
> --------------------------
> #include <stdio.h>
> 
> int main ()
> {
> __asm__ __volatile__ (".byte 0x65, 0x0f, 0x19, 0xff" :::"cc","memory");
>   return 0;
> }

Does it run on your machine outside of valgrind?
Comment 17 Tanya 2017-05-26 15:27:06 UTC
(In reply to Julian Seward from comment #9)
> Committed, r3383, r16415.  Thanks for the patches.

Julian,
Would it be possible to patch the 32-bit instruction parser as well?
Comment 18 Ivo Raisr 2017-05-29 10:22:18 UTC
(In reply to H.J. Lu from comment #16)
> (In reply to Ivo Raisr from comment #14)
> > And this is a minimalistic reproducer with the latest Valgrind sources:
> > 
> > --------------------------
> > #include <stdio.h>
> > 
> > int main ()
> > {
> > __asm__ __volatile__ (".byte 0x65, 0x0f, 0x19, 0xff" :::"cc","memory");
> >   return 0;
> > }
> 
> Does it run on your machine outside of valgrind?

Yes, it does.

The root cause is described in comment #15.
I cannot fix this myself because cet_nops.c uses byte sequences and does not use assembler mnemonics. My toolchain is not able to properly decode all byte sequences, even though I (and my CPU) believe they are valid ones.

It would help immensely if you could divide the file (or better split it) in three sections:
- with gs prefix
- with fs prefix
- with no prefix
Comment 19 Ivo Raisr 2017-05-29 10:23:24 UTC
(In reply to Tanya from comment #17)
> (In reply to Julian Seward from comment #9)
> > Committed, r3383, r16415.  Thanks for the patches.
> 
> Julian,
> Would it be possible to patch the 32-bit instruction parser as well?

Sure, no problem. Please provide a patch and test cases.
Comment 20 Julian Seward 2017-05-29 11:58:20 UTC
(In reply to Tanya from comment #17)
> Would it be possible to patch the 32-bit instruction parser as well?

I saw that part in your patch, but didn't land it, because the 32-bit
insn parser only goes up to and including SSSE3.  So I thought that it
didn't need to be patched.

But now I'm not so sure.  Is there any realistic scenario where we try
to run 32 bit code on a CPU that advertises (via CPUID) that it can do
only up until SSSE3, but also contains these NOP instructions?  I think
that's the key question.
Comment 21 H.J. Lu 2017-05-29 13:14:22 UTC
(In reply to Julian Seward from comment #20)
> (In reply to Tanya from comment #17)
> > Would it be possible to patch the 32-bit instruction parser as well?
> 
> I saw that part in your patch, but didn't land it, because the 32-bit
> insn parser only goes up to and including SSSE3.  So I thought that it
> didn't need to be patched.
> 
> But now I'm not so sure.  Is there any realistic scenario where we try
> to run 32 bit code on a CPU that advertises (via CPUID) that it can do
> only up until SSSE3, but also contains these NOP instructions?  I think
> that's the key question.

All CPUs which support SSSE3 also support these NOPs.
Comment 22 Tanya 2017-05-29 13:40:37 UTC
Created attachment 105753 [details]
Proposed patch for 32-bit architecture
Comment 23 Tanya 2017-05-29 13:44:16 UTC
(In reply to Ivo Raisr from comment #19)
> (In reply to Tanya from comment #17)
> > (In reply to Julian Seward from comment #9)
> > > Committed, r3383, r16415.  Thanks for the patches.
> > 
> > Julian,
> > Would it be possible to patch the 32-bit instruction parser as well?
> 
> Sure, no problem. Please provide a patch and test cases.

Ivo, Julian,
Attached an updated 32-bit version of the patch. 32-bit part of the previous patch sets "temp_sorb" variable incorrectly.
As far as I understand, the test case should be the same for 32-bit architecture.
Comment 24 Julian Seward 2017-05-30 11:29:03 UTC
(In reply to Ivo Raisr from comment #18)
> It would help immensely if you could divide the file (or better split it) in
> three sections:
> - with gs prefix
> - with fs prefix
> - with no prefix

Ivo, would it be enough to move the test case from none/tests/amd64 to
none/tests/amd64-linux?  Will that lose us any test-coverage effectiveness?
I think it would be OK.

We could split it into three tests (with gs, with fs, with none) but
that's some work.  Using #ifdefs inside a single file is tricky because
we'd then need to have 3 different .stdout.exp files, which seems a bit
inelegant.
Comment 25 Tanya 2017-05-30 11:49:27 UTC
Created attachment 105776 [details]
test without FS or GS prefixes
Comment 26 Tanya 2017-05-30 11:53:12 UTC
Created attachment 105777 [details]
test without FS or GS prefixes
Comment 27 Julian Seward 2017-05-30 11:54:50 UTC
(In reply to Tanya from comment #26)
> Created attachment 105777 [details]
> test without FS or GS prefixes

Tanya, is that a tar file?  Can you just attach a plain text file?
Comment 28 Tanya 2017-05-30 11:54:51 UTC
Created attachment 105778 [details]
test with GS prefixes
Comment 29 Tanya 2017-05-30 11:55:10 UTC
Created attachment 105779 [details]
test with FS prefixes
Comment 30 Tanya 2017-05-30 11:56:49 UTC
(In reply to Julian Seward from comment #27)
> (In reply to Tanya from comment #26)
> > Created attachment 105777 [details]
> > test without FS or GS prefixes
> 
> Tanya, is that a tar file?  Can you just attach a plain text file?

It should be a plain text file, it opens correctly for me. It does nor open for you?
Comment 31 Julian Seward 2017-05-30 12:00:03 UTC
(In reply to Tanya from comment #30)
> It should be a plain text file, it opens correctly for me. It does nor open
> for you?

I think the spaces in the file name have confused my emacs :-/  I will
try again.
Comment 32 Julian Seward 2017-05-30 13:18:30 UTC
(In reply to Tanya from comment #23)
> Attached an updated 32-bit version of the patch. 32-bit part of the previous
> patch sets "temp_sorb" variable incorrectly.
> As far as I understand, the test case should be the same for 32-bit
> architecture.

Committed, with some tidying, r3385.  Thanks for the patch.

Also for the split tests.  Ivo very kindly offered to connect them to
the build system.
Comment 33 Tanya 2017-05-30 13:26:57 UTC
(In reply to Julian Seward from comment #32)
> (In reply to Tanya from comment #23)
> > Attached an updated 32-bit version of the patch. 32-bit part of the previous
> > patch sets "temp_sorb" variable incorrectly.
> > As far as I understand, the test case should be the same for 32-bit
> > architecture.
> 
> Committed, with some tidying, r3385.  Thanks for the patch.
> 
> Also for the split tests.  Ivo very kindly offered to connect them to
> the build system.

Ivo, Julian,
Thank you very much!
Comment 34 Ivo Raisr 2017-05-31 06:21:38 UTC
Follow up SVN commit r16421 which splits the tests into three.
Another commit will follow, enabling cet_nops_fs on Solaris and cet_nops_gs on OS X.
Comment 35 Ivo Raisr 2017-05-31 11:29:40 UTC
For Solaris and OS X, tests connected under SVN r16422.
Comment 36 Ivo Raisr 2017-05-31 12:38:03 UTC
32-bit tests connected under SVN r16423.