249775 – Incorrect scheme for detecting NEON capabilities of host CPU

Bug 249775 - Incorrect scheme for detecting NEON capabilities of host CPU

Summary: Incorrect scheme for detecting NEON capabilities of host CPU

Status:	VERIFIED WAITINGFORINFO

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	vex (show other bugs)
Version:	3.6 SVN
Platform:	Compiled Sources Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2010-09-01 19:38 UTC by Peter Maydell
Modified:	2010-09-28 19:09 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:

Attachments
patch which enables support for BX PC (803 bytes, patch) 2010-09-01 19:38 UTC, Peter Maydell	Details
valgrind -v output for secondary failure (3.27 KB, text/plain) 2010-09-01 19:39 UTC, Peter Maydell	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Peter Maydell 2010-09-01 19:38:20 UTC

Created attachment 51187 [details]
patch which enables support for BX PC

Hi. I've compiled valgrind from svn on Ubuntu maverick for ARM, in order to test the Thumb support that has recently landed in the svn trunk.

This is a pure svn checkout with svn r11315, VEX svn r2025,
built with "./configure CFLAGS='-marm -fno-stack-protector'
&& make".

I'm building on a pegatron board (freescale MX51 based): uname -a says:
Linux linaro-m-10141 2.6.31-008-ER1-lange51 #1 Fri Apr 9 14:06:09 UTC 2010 armv7l GNU/Linux

In Maverick everything is built with Thumb2 by default. Unfortunately valgrind doesn't decode an instruction in the dynamic linker's startup sequence so valgrinding anything fails:

[linaro-m-dev] ubuntu@linaro-m-10141:~/valgrind-svn/trunk$ ./vg-in-place -v /bin/ls
==17674== Memcheck, a memory error detector
==17674== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==17674== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
==17674== Command: /bin/ls
==17674== 
--17674-- Valgrind options:
--17674--    -v
--17674-- Contents of /proc/version:
--17674--   Linux version 2.6.31-008-ER1-lange51 (ubuntu@babbage-davem-1) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu4) ) #1 Fri Apr 9 14:06:09 UTC 2010
--17674-- Arch and hwcaps: ARM, ARMv7-vfp-neon
--17674-- Page sizes: currently 4096, max supported 4096
--17674-- Valgrind library directory: /home/ubuntu/valgrind-svn/trunk/./.in_place
--17674-- Reading syms from /bin/ls (0x8000)
--17674--   Considering /bin/ls ..
--17674--   .. CRC mismatch (computed 017f562f wanted 1a5b4806)
--17674--    object doesn't have a symbol table
--17674-- Reading syms from /lib/ld-2.12.1.so (0x4000000)
--17674--   Considering /lib/ld-2.12.1.so ..
--17674--   .. CRC mismatch (computed 25bac168 wanted b996edb1)
--17674--   Considering /usr/lib/debug/lib/ld-2.12.1.so ..
--17674--   .. CRC is valid
--17674-- Reading syms from /home/ubuntu/valgrind-svn/trunk/memcheck/memcheck-arm-linux (0x38000000)
--17674--    object doesn't have a dynamic symbol table
--17674-- Reading suppressions file: /home/ubuntu/valgrind-svn/trunk/./.in_place/default.supp
--17674-- REDIR: 0x4012180 (memcpy) redirected to 0x38043530 (???)
--17674-- REDIR: 0x4011610 (strlen) redirected to 0x38043504 (???)
disInstr(thumb): unhandled instruction: 0x4778 0x46C0
==17674== valgrind: Unrecognised instruction at address 0x40007a5.
==17674== Your program just tried to execute an instruction that Valgrind
==17674== did not recognise.  There are two possible reasons for this.
==17674== 1. Your program has a bug and erroneously jumped to a non-code
==17674==    location.  If you are running Memcheck and you just saw a
==17674==    warning about a bad jump, it's probably your program's fault.
==17674== 2. The instruction is legitimate but Valgrind doesn't handle it,
==17674==    i.e. it's Valgrind's fault.  If you think this is the case or
==17674==    you are not sure, please let us know and we'll try to fix it.
==17674== Either way, Valgrind will now raise a SIGILL signal which will
==17674== probably kill your program.
==17674== 
==17674== Process terminating with default action of signal 4 (SIGILL)
==17674==  Illegal opcode at address 0x40007A5
==17674==    at 0x40007A5: ??? (in /lib/ld-2.12.1.so)
==17674== 
==17674== HEAP SUMMARY:
==17674==     in use at exit: 0 bytes in 0 blocks
==17674==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==17674== 
==17674== All heap blocks were freed -- no leaks are possible
==17674== 
==17674== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==17674== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction


With the attached patch (which just enables the stubbed out /*atc*/ handling of this case) valgrind proceeds past this point and works at least some of the time. However it then intermittently fails (perhaps one run in three) with

==18028== Process terminating with default action of signal 4 (SIGILL)
==18028==  Illegal opcode at address 0x6285008C
==18028==    at 0x40112F4: __sigsetjmp (setjmp.S:59)
==18028==    by 0x400B040: _dl_catch_error (dl-error.c:175)
==18028==    by 0x4009E66: _dl_map_object_deps (dl-deps.c:249)
==18028== Jump to the invalid address stated on the next line
==18028==    at 0x3E4: ???
==18028==  Address 0x3e4 is not stack'd, malloc'd or (recently) free'd

(I'll attach the full -v output). I haven't yet investigated this secondary failure.

Comment 1 Peter Maydell 2010-09-01 19:39:09 UTC

Created attachment 51188 [details]
valgrind -v output for secondary failure

Comment 2 Julian Seward 2010-09-01 23:36:34 UTC

> With the attached patch (which just enables the stubbed out /*atc*/
> handling of this case)

"atc" stands for "awaiting test case".  It means the handler got
written but no instance of the instruction has so far been seen.

> valgrind proceeds past this point and works at least some of the
> time. However it then intermittently fails (perhaps one run in three) with
> 
> ==18028== Process terminating with default action of signal 4 (SIGILL)
> ==18028==  Illegal opcode at address 0x6285008C
> ==18028==    at 0x40112F4: __sigsetjmp (setjmp.S:59)
> ==18028==    by 0x400B040: _dl_catch_error (dl-error.c:175)
> ==18028==    by 0x4009E66: _dl_map_object_deps (dl-deps.c:249)
> ==18028== Jump to the invalid address stated on the next line
> ==18028==    at 0x3E4: ???
> ==18028==  Address 0x3e4 is not stack'd, malloc'd or (recently) free'd

That's (confusingly) a fault in the JIT generated code, not in the
front end.  An easy way to debug is to rerun with --wait-for-gdb=yes,
which puts V in a minute-ish long spin loop.  In that time, attach gdb
to it from another shell and let it continue.  You might also have to
continue past a few (expected) segfaults.  Eventually you should
get to the SIGILL.

I wonder if this is fallout from mis-handling BX PC, but I can't see
how.  It would be useful to see how the simulator got to this place.
Re-run with --trace-flags=10000000 --trace-notbelow=99999.
Once you see what SB number it's failing on, change the 99999 to that
number (or one or two just below) so you can see what insns the front
end is decoding.

Comment 3 Peter Maydell 2010-09-02 14:57:39 UTC

The secondary problem turns out to be unrelated to BX PC. The problematic
instruction is an fstmiad in __sigsetjmp:

        (arm) 0x40112F4:  fstmiad r12!, {d8-d15}

              ------ IMark(0x40112F4, 4) ------
              t0 = GET:I32(48)
              t1 = Add32(t0,0x40:I32)
              t2 = t0
              STle(Add32(t2,0x0:I32)) = GET:F64(184)
              STle(Add32(t2,0x8:I32)) = GET:F64(192)
              STle(Add32(t2,0x10:I32)) = GET:F64(200)
              STle(Add32(t2,0x18:I32)) = GET:F64(208)
              STle(Add32(t2,0x20:I32)) = GET:F64(216)
              STle(Add32(t2,0x28:I32)) = GET:F64(224)
              STle(Add32(t2,0x30:I32)) = GET:F64(232)
              STle(Add32(t2,0x38:I32)) = GET:F64(240)
              PUT(48) = t1

That gets translated into a code sequence which includes
  vld1.32 {d8} [r9]
  8F 87 29 F4 

VLD1 is a Neon instruction, and this system doesn't have Neon, only VFP, so we SIGILL when we try to execute it.

Valgrind is incorrectly deciding that we do have neon; on startup it says "Arch and hwcaps: ARM, ARMv7-vfp-neon".

I'm not sure why machine_get_hwcaps() is diagnosing the system as having Neon: if I single step through it in gdb then we get a SIGILL on the 'vorr q2,q2,q2' diagnostic insn it is using, and the NEON bit is not set in hwcaps. However if I let gdb run through the function rather than stepping then we do not get a SIGILL, and the NEON bit is set...

(The runs where valgrind works also diagnose the machine as having neon, so it's not that the diagnosis is giving variable results; it's consistently wrong when not being singlestepped, it's just that the wrong diagnosis doesn't always cause a crash later.)

Comment 4 Peter Maydell 2010-09-02 17:06:18 UTC

> I'm not sure why machine_get_hwcaps() is diagnosing the system as having Neon

"Does this neon instruction execute?" is apparently not a valid way to make this check -- the kernel may have had support compiled out or disabled because of hardware issues. I'm told you need to check for HWCAP_NEON in /proc/self/auxv.

I hacked machine_get_hwcaps() to force it to say 'no neon', and the intermittent failures have gone away.

Comment 5 Julian Seward 2010-09-02 17:29:14 UTC

(In reply to comment #4)
> "Does this neon instruction execute?" is apparently not a valid way to make
> this check

Yes, I'd wondered exactly that.  Thanks for chasing it.  Does that go
only for detecting NEON support, or will we have to check all the
features using /proc/self/aux ?

> I hacked machine_get_hwcaps() to force it to say 'no neon', and the
> intermittent failures have gone away.

Good.  So at least the backend instruction selection logic is working
correctly w.r.t. hw capabilities.

Comment 6 Peter Maydell 2010-09-02 18:12:25 UTC

(In reply to comment #5)
> Does that go
> only for detecting NEON support, or will we have to check all the
> features using /proc/self/aux ?

You need to do that for at least Neon and VFP support. I don't think it's as critical for "are we v5/v6/v7?" but I imagine that if you're reading /proc/self/auxv for AT_HWCAP it's as simple to look at AT_PLATFORM to determine v5/v6/v7 as it is to do it by testing for faulting instructions.

Comment 7 Julian Seward 2010-09-02 23:14:40 UTC

> patch which enables support for BX PC

Committed as r2027.  Thanks.

Comment 8 Julian Seward 2010-09-03 12:36:27 UTC

Changed the title to reflect the more serious bug.

Comment 9 Julian Seward 2010-09-28 17:48:27 UTC

Fixed, r11347/r2032.

Comment 10 Peter Maydell 2010-09-28 18:02:06 UTC

r11347 only checks the auxv for Neon; I think you also need to do this for VFP.

Comment 11 Julian Seward 2010-09-28 18:37:02 UTC

Reopening.  (What's the right way to reopen a bug?)  But would prefer to
see a real failure as a result of not correctly detecting VFP before
fixing.

Comment 12 Peter Maydell 2010-09-28 19:08:12 UTC

I think for a test case you'd could try building a kernel with no VFP support and running it on an A8 or similar. But I suspect that it's basically that at the moment you're relying on something that happens to work but which isn't guaranteed to do so. (We just don't happen to currently have any hardware with broken VFP the way we do Neon.)

Comment 13 Peter Maydell 2010-09-28 19:09:19 UTC

Sorry, I didn't mean to change the status fields there.