Information duplicated and ammended from Gentoo's Bugzilla: http://bugs.gentoo.org/show_bug.cgi?id=101811 Valgrind bug #110201 and bug #110464 are vaguely related in that they are all VEX related bugs. The unhandled bytes differ in each report. Regardless of the application that fails, I consistently see '0xF 0xD' as the first two unhandled bytes. The given traceback from valgrind indicates several different places, such as grep, librecode, and within valgrind's shared objects. Using 3.0.0 release and also verified the problem exists against an SVN checkout at 16:30 on August 9th. svn co svn://svn.valgrind.org/valgrind/trunk valgrind Problem occurs with and without PIE, varying of CFLAGS fails to improve anything. gdb stepping with 'where fulls' follows in the links provided below. valgrind /usr/bin/true -- Works valgrind /usr/bin/date -- Works valgrind /usr/bin/fortune -- SIGILL valgrind grep localhost /etc/hosts -- SIGILL Linux bruce 2.6.11-gentoo-r11 #1 SMP Wed Jun 29 08:47:32 EDT 2005 x86_64 AMD Opteron(tm) Processor 246 AuthenticAMD GNU/Linux gcc (GCC) 3.4.3 20041125 (Gentoo 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7) [ebuild R ] sys-devel/binutils-2.15.92.0.2-r10 -multislot (-multitarget) -nls -test 0 kB [ebuild R ] sys-libs/glibc-2.3.5-r1 -build -erandom -glibc-compat20 -glibc-omitfp -hardened -linuxthreads-tls (-multilib) -nls -nptl -nptlonly -pic -profile (-selinux) -userlocales 21 kB From gentoo-dev (IRC) debugging, it seems this only happens on Opterons. AMD64 machines do not seem affected. There isn't a clear set of CPU differences that seem to matter. The AMD64 boxes versus the Opterons only seemed to differ in the 'pni' flags (which my Opterons have) and a 'lahf_lm' flags which one of the AMD64 boxes had. All of my Opterons follow the same cpuinfo as below except varying in speed. You can click the URL link above the summary or go to these for lots of debug output that I scrounged up. Compressed and uncompressed versions are here. Links that follow after this link are directly to the uncompressed, plain text versions. http://www.twobit.net/~carpaski/vg3/ Simple demonstration where 'fortune' cannot be run through valgrind. It contains the plain execution and then an strace of the output. http://www.twobit.net/~carpaski/vg3/vg3-illegal-strace.log A gdb run of continuous step operations breaking in main and single stepping until Valgrind exits. Might want a quick sed on that. sed -i '/^(gdb) $/d' vg3-illegal.log http://www.twobit.net/~carpaski/vg3/vg3-illegal.log A gdb run with 'where full' interlaced after every step. This is a lot of output. It's the best info I can provide at the moment. http://www.twobit.net/~carpaski/vg3/vg3-illegal-full.log Portage 2.0.51.22-r2 (default-linux/amd64/2005.0, gcc-3.4.3, glibc-2.3.5-r1, 2.6.11-gentoo-r11 x86_64) gcc (GCC) 3.4.3 20041125 (Gentoo 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7) processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 246 stepping : 10 cpu MHz : 2004.595 cache size : 1024 KB physical id : 255 siblings : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 pni syscall nx mmxext lm 3dnowext 3dnow bogomips : 3940.35 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 246 stepping : 10 cpu MHz : 2004.595 cache size : 1024 KB physical id : 255 siblings : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 pni syscall nx mmxext lm 3dnowext 3dnow bogomips : 4005.88 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp
Quick note, in case to clairify: Athlon64 (939) seems to work fine, but Opterons (940) do not. Additional bits of insight: On SMP Opterons, I see the unhandled instructions. On single-proc, non-SMP kernels, at least two of us see infinate looping that is very resistant to kill -9.
Neither of the two bugs you mention seem to be in any way related - both are missing x87 instructions while the instruction that I think you are talking about is a prefetch instruction. If you do encounter an unsupported instruction then all you really need to report is the last few lines of the valgrind output that report what the unrecognised instruction was - there really is no need to write a minor essay and attach tons of gdb output and things. Can you confirm exactly what valgrind says when it fails? I think that if I'm reading what you say correctly that the unrecognised instruction bytes start with "0xF 0xD" - is that correct? If so then that is a prefetch instruction.
I find it easier and quicker to be overcomplete than delay the process when I'm not conversing interactively. Sorry if it was a bit much. It's in the 3 files. The top of this one is easiest: http://www.twobit.net/~carpaski/vg3/vg3-illegal-strace.log vex amd64->IR: unhandled instruction bytes: 0xF 0xD 0x8 0xF ==20651== ==20651== Process terminating with default of signal 4 (SIGILL) ==20651== Illegal opcode at address 0x140EEDED1433 ==20651== at 0x140EEDED1433: recode_new_outer (in /usr/lib64/librecode.so.0.0.0) ==20651== by 0x4047FE: (within /usr/bin/fortune) ==20651== by 0x140EEE10ACFF: __libc_start_main (in /lib64/libc-2.3.5.so) ==20651== by 0x4018B9: (within /usr/bin/fortune)
Yep. That's a prefetch instruction - presumably there was some difference in the compiler and/or the optimisation flags used that caused it to insert prefetch instructions on some machines.
> I find it easier and quicker to be overcomplete than delay the process > when I'm not conversing interactively. Sorry if it was a bit much. It's a good idea in general, in this case it was just unfortunate that the one piece of crucial information was buried in another web page :)
Fixed in vex r1324. It would be good if you could verify that the fix works.
Works as of: SVN Checkout from Aug 10, 15:14:00 UTC Thanks. :)