Bug 110478 - Opteron: vex amd64->IR: unhandled instruction bytes (prefetch)
Summary: Opteron: vex amd64->IR: unhandled instruction bytes (prefetch)
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.0.0
Platform: Gentoo Packages Linux
: NOR crash
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-09 19:06 UTC by Nicholas Jones
Modified: 2005-08-10 17:16 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nicholas Jones 2005-08-09 19:06:28 UTC
Information duplicated and ammended from Gentoo's Bugzilla:
http://bugs.gentoo.org/show_bug.cgi?id=101811

Valgrind bug #110201 and bug #110464 are vaguely related in that
they are all VEX related bugs. The unhandled bytes differ in each
report. Regardless of the application that fails, I consistently
see '0xF 0xD' as the first two unhandled bytes. The given traceback
from valgrind indicates several different places, such as grep,
librecode, and within valgrind's shared objects.

Using 3.0.0 release and also verified the problem exists
against an SVN checkout at 16:30 on August 9th.
svn co svn://svn.valgrind.org/valgrind/trunk valgrind

Problem occurs with and without PIE, varying of CFLAGS
fails to improve anything. gdb stepping with 'where fulls'
follows in the links provided below.

valgrind /usr/bin/true -- Works
valgrind /usr/bin/date -- Works
valgrind /usr/bin/fortune -- SIGILL
valgrind grep localhost /etc/hosts -- SIGILL

Linux bruce 2.6.11-gentoo-r11 #1 SMP Wed Jun 29 08:47:32 EDT 2005
x86_64 AMD Opteron(tm) Processor 246 AuthenticAMD GNU/Linux

gcc (GCC) 3.4.3 20041125 (Gentoo 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7)

[ebuild   R   ] sys-devel/binutils-2.15.92.0.2-r10  -multislot (-multitarget)
-nls -test 0 kB

[ebuild   R   ] sys-libs/glibc-2.3.5-r1  -build -erandom -glibc-compat20
-glibc-omitfp -hardened -linuxthreads-tls (-multilib) -nls -nptl -nptlonly -pic
-profile (-selinux) -userlocales 21 kB

From gentoo-dev (IRC) debugging, it seems this only happens on
Opterons. AMD64 machines do not seem affected. There isn't a
clear set of CPU differences that seem to matter.

The AMD64 boxes versus the Opterons only seemed to differ in
the 'pni' flags (which my Opterons have) and a 'lahf_lm' flags
which one of the AMD64 boxes had. All of my Opterons follow the
same cpuinfo as below except varying in speed.

You can click the URL link above the summary or go to these for
lots of debug output that I scrounged up.

Compressed and uncompressed versions are here. Links that follow
after this link are directly to the uncompressed, plain text versions.
http://www.twobit.net/~carpaski/vg3/

Simple demonstration where 'fortune' cannot be run through valgrind.
It contains the plain execution and then an strace of the output.
http://www.twobit.net/~carpaski/vg3/vg3-illegal-strace.log

A gdb run of continuous step operations breaking in main and single
stepping until Valgrind exits. Might want a quick sed on that.
sed -i '/^(gdb) $/d' vg3-illegal.log
http://www.twobit.net/~carpaski/vg3/vg3-illegal.log

A gdb run with 'where full' interlaced after every step. This is a
lot of output. It's the best info I can provide at the moment.
http://www.twobit.net/~carpaski/vg3/vg3-illegal-full.log

Portage 2.0.51.22-r2 (default-linux/amd64/2005.0, gcc-3.4.3, glibc-2.3.5-r1,
2.6.11-gentoo-r11 x86_64)

gcc (GCC) 3.4.3 20041125 (Gentoo 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7)

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 5
model name      : AMD Opteron(tm) Processor 246
stepping        : 10
cpu MHz         : 2004.595
cache size      : 1024 KB
physical id     : 255
siblings        : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 pni syscall nx mmxext lm 3dnowext 3dnow
bogomips        : 3940.35
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 5
model name      : AMD Opteron(tm) Processor 246
stepping        : 10
cpu MHz         : 2004.595
cache size      : 1024 KB
physical id     : 255
siblings        : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 pni syscall nx mmxext lm 3dnowext 3dnow
bogomips        : 4005.88
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
Comment 1 Nicholas Jones 2005-08-09 19:38:45 UTC
Quick note, in case to clairify:

Athlon64 (939) seems to work fine, but Opterons (940) do not.


Additional bits of insight:

On SMP Opterons, I see the unhandled instructions.

On single-proc, non-SMP kernels, at least two of us see
infinate looping that is very resistant to kill -9.
Comment 2 Tom Hughes 2005-08-09 20:32:50 UTC
Neither of the two bugs you mention seem to be in any way related - both are missing x87 instructions while the instruction that I think you are talking about is a prefetch instruction.

If you do encounter an unsupported instruction then all you really need to report is the last few lines of the valgrind output that report what the unrecognised instruction was - there really is no need to write a minor essay and attach tons of gdb output and things.

Can you confirm exactly what valgrind says when it fails? I think that if I'm reading what you say correctly that the unrecognised instruction bytes start with "0xF 0xD" - is that correct? If so then that is a prefetch instruction.
Comment 3 Nicholas Jones 2005-08-09 20:54:09 UTC
I find it easier and quicker to be overcomplete than delay the process when I'm not conversing interactively. Sorry if it was a bit much.

It's in the 3 files. The top of this one is easiest:
http://www.twobit.net/~carpaski/vg3/vg3-illegal-strace.log

vex amd64->IR: unhandled instruction bytes: 0xF 0xD 0x8 0xF
==20651== 
==20651== Process terminating with default of signal 4 (SIGILL)
==20651==  Illegal opcode at address 0x140EEDED1433
==20651==    at 0x140EEDED1433: recode_new_outer (in /usr/lib64/librecode.so.0.0.0)
==20651==    by 0x4047FE: (within /usr/bin/fortune)
==20651==    by 0x140EEE10ACFF: __libc_start_main (in /lib64/libc-2.3.5.so)
==20651==    by 0x4018B9: (within /usr/bin/fortune)
Comment 4 Tom Hughes 2005-08-09 20:56:42 UTC
Yep. That's a prefetch instruction - presumably there was some difference in the compiler and/or the optimisation flags used that caused it to insert prefetch instructions on some machines.
Comment 5 Nicholas Nethercote 2005-08-09 21:06:55 UTC
> I find it easier and quicker to be overcomplete than delay the process 
> when I'm not conversing interactively. Sorry if it was a bit much.


It's a good idea in general, in this case it was just unfortunate that the 
one piece of crucial information was buried in another web page :)
Comment 6 Julian Seward 2005-08-10 14:28:40 UTC
Fixed in vex r1324.  It would be good if you could verify that the fix works.
Comment 7 Nicholas Jones 2005-08-10 17:16:15 UTC
Works as of: SVN Checkout from Aug 10, 15:14:00 UTC

Thanks. :)