Bug 254556 - ARM: valgrinding anything fails with SIGSEGV for 0xFFFF0FA0 (kernel helper page)
Summary: ARM: valgrinding anything fails with SIGSEGV for 0xFFFF0FA0 (kernel helper page)
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.6 SVN
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-18 17:07 UTC by Peter Maydell
Modified: 2010-10-20 18:05 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Maydell 2010-10-18 17:07:58 UTC
This is on a Beagle xM board running Linaro. Kernel:
Linux localhost.localdomain 2.6.35-1006-linaro-omap #12-Ubuntu Tue Sep 21 20:09:17 UTC 2010 armv7l GNU/Linux

root@localhost:~/valgrind# ./vg-in-place -v --trace-signals=yes /bin/true
==11039== Memcheck, a memory error detector
==11039== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==11039== Using Valgrind-3.7.0.SVN and LibVEX; rerun with -h for copyright info
==11039== Command: /bin/true
==11039== 
--11039-- Valgrind options:
--11039--    -v
--11039--    --trace-signals=yes
--11039-- Contents of /proc/version:
--11039--   Linux version 2.6.35-1006-linaro-omap (buildd@gourd) (gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu4) ) #12-Ubuntu Tue Sep 21 20:09:17 UTC 2010
--11039-- Arch and hwcaps: ARM, ARMv7-vfp-neon
--11039-- Page sizes: currently 4096, max supported 4096
--11039-- Valgrind library directory: /root/valgrind/./.in_place
--11039-- Reading syms from /bin/true (0x8000)
--11039--   Considering /bin/true ..
--11039--   .. CRC mismatch (computed e5637a2c wanted e7a66a8c)
--11039--    object doesn't have a symbol table
--11039-- Reading syms from /lib/ld-2.12.1.so (0x4000000)
--11039--   Considering /lib/ld-2.12.1.so ..
--11039--   .. CRC mismatch (computed d4ae0631 wanted 05dd9b97)
--11039--   Considering /usr/lib/debug/lib/ld-2.12.1.so ..
--11039--   .. CRC is valid
--11039-- Reading syms from /root/valgrind/memcheck/memcheck-arm-linux (0x38000000)
--11039--    object doesn't have a dynamic symbol table
--11039-- Max kernel-supported signal is 64
--11039-- Reading suppressions file: /root/valgrind/./.in_place/default.supp
--11039-- REDIR: 0x4012180 (memcpy) redirected to 0x380440c4 (???)
--11039-- REDIR: 0x4011610 (strlen) redirected to 0x38044098 (???)
--11039-- sync signal handler: signal=11, si_code=1, EIP=0x4005c4b, eip=0x62931238, from kernel
--11039-- SIGSEGV: si_code=1 faultaddr=0xbdf1de38 tid=1 ESP=0xbdf1de10 seg=0xbd71f000-0xbdf1dfff
--11039--        -> extended stack base to 0xbdf1d000
--11039-- Reading syms from /root/valgrind/coregrind/vgpreload_core-arm-linux.so (0x4821000)
--11039-- Reading syms from /root/valgrind/memcheck/vgpreload_memcheck-arm-linux.so (0x482b000)
--11039-- Reading syms from /lib/libc-2.12.1.so (0x483c000)
--11039--   Considering /lib/libc-2.12.1.so ..
--11039--   .. CRC mismatch (computed b78d2c25 wanted ead28fef)
--11039--   Considering /usr/lib/debug/lib/libc-2.12.1.so ..
--11039--   .. CRC is valid
==11039== Jump to the invalid address stated on the next line
==11039==    at 0xFFFF0FA0: ???
==11039==  Address 0xffff0fa0 is not stack'd, malloc'd or (recently) free'd
==11039== 
--11039-- translations not allowed here (0xffff0fa0) - throwing SEGV
--11039-- delivering signal 11 (SIGSEGV):2 to thread 1
--11039-- delivering 11 (code 2) to default handler; action: terminate+core
==11039== 
==11039== Process terminating with default action of signal 11 (SIGSEGV)
==11039==  Bad permissions for mapped region at address 0xFFFF0FA0
==11039==    at 0xFFFF0FA0: ???
--11039-- sys_sigaction: sigNo 11, new 0x62902204, old 0x0, new flags 0x0
--11039-- sys_sigaction: sigNo 7, new 0x62902204, old 0x0, new flags 0x0
--11039-- sys_sigaction: sigNo 4, new 0x62902204, old 0x0, new flags 0x0
--11039-- sys_sigaction: sigNo 8, new 0x62902204, old 0x0, new flags 0x0
==11039== Jump to the invalid address stated on the next line
==11039==    at 0x388: ???
==11039==  Address 0x388 is not stack'd, malloc'd or (recently) free'd
==11039== 
--11039-- translations not allowed here (0x388) - throwing SEGV
--11039-- delivering signal 11 (SIGSEGV):2 to thread 1
--11039-- delivering 11 (code 2) to default handler; action: terminate+core
==11039== 
==11039== Process terminating with default action of signal 11 (SIGSEGV)
==11039==  Bad permissions for mapped region at address 0x388
==11039==    at 0x388: ???
==11039== 
==11039== HEAP SUMMARY:
==11039==     in use at exit: 0 bytes in 0 blocks
==11039==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==11039== 
==11039== All heap blocks were freed -- no leaks are possible
==11039== 
==11039== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
==11039== 
==11039== 1 errors in context 1 of 2:
==11039== Jump to the invalid address stated on the next line
==11039==    at 0x388: ???
==11039==  Address 0x388 is not stack'd, malloc'd or (recently) free'd
==11039== 
==11039== 
==11039== 1 errors in context 2 of 2:
==11039== Jump to the invalid address stated on the next line
==11039==    at 0xFFFF0FA0: ???
==11039==  Address 0xffff0fa0 is not stack'd, malloc'd or (recently) free'd
==11039== 
==11039== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Segmentation fault


What seems to be happening here is that we're failing to give the client process access to the memory at 0xFFFF0000 (which is a page of userspace helper functions mapped into every process by the kernel). I think that on this board the mapping appears in valgrind's own address space. In gdb I see that VG_(am_find_segment) finds the segment but it is SkAnonV:
(gdb) print /x nsegments[i]
$5 = {kind = 0x2, start = 0xffff0000, end = 0xffff0fff, smode = 0x1, dev = 0x0, 
  ino = 0x0, offset = 0x0, mode = 0x0, fnIdx = 0xffffffff, hasR = 0x1, hasW = 0x0, 
  hasX = 0x1, hasT = 0x0, isCH = 0x0, mark = 0x0}

and so translations_allowable_from_seg() disallows translation because the segment isn't SkAnonC/SkFileC/SkShmC.

If you look at /proc/self/maps the relevant segment appears like this:
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

I have another machine (a pegatron) which has an older kernel. On this system there's no entry in /proc/self/maps for [vectors], and valgrind runs things OK. (Interestingly if you trace what valgrind is translating on that system the client process seems to successfully run code from this block:
 ==== SB 794 [tid 1] (0xffff0fa0) SBs exec'd 8230 ====
)

Presumably valgrind needs to special-case this fixed-address helper page somehow.
Comment 1 Julian Seward 2010-10-18 17:45:10 UTC
This is the same as 250631, right?

I'd assumed that the page-of-kernel-made-up-code-in-userspace
thing was some nasty kludge that had been done anway with in more
recent kernels, which is why I got rid of support for this in
r10973 (see https://bugs.kde.org/show_bug.cgi?id=250631#c5),
and that seemed to be ok for Ubuntu 10.04 and 10.10.

If it's still alive in a Linaro kernel, are you saying effectively
that it's officially blessed by Linaro, and needs to be supported
indefinitely?
Comment 2 Peter Maydell 2010-10-18 18:05:17 UTC
Actually, having now read coregrind/m_aspacemgr/aspacemgr-linux.c we already do special-case this, and the problem is just that the kernel now does expose this magic page in /proc/*/maps and this confuses the workaround valgrind had for that kernel bug.

I think the fix is simply to change the value of ARM_LINUX_FAKE_COMMPAGE_END1 to match what the kernel is actually providing:

Index: coregrind/m_aspacemgr/aspacemgr-linux.c
===================================================================
--- coregrind/m_aspacemgr/aspacemgr-linux.c     (revision 11449)
+++ coregrind/m_aspacemgr/aspacemgr-linux.c     (working copy)
@@ -344,7 +344,7 @@
    we have to fake it up, in parse_procselfmaps. */
 #if defined(VGP_arm_linux)
 #  define ARM_LINUX_FAKE_COMMPAGE_START 0xFFFF0000
-#  define ARM_LINUX_FAKE_COMMPAGE_END1  0xFFFFF000
+#  define ARM_LINUX_FAKE_COMMPAGE_END1  0xFFFF1000
 #endif
 
 
With this fix I can valgrind /bin/ls on both the machine where [vectors] does not appear in maps and the machine where it does.

You might also want to update the comments complaining about this kernel bug to note that it is fixed in newer kernels...
Comment 3 Peter Maydell 2010-10-18 18:10:44 UTC
(That comment was a mid-air collision so doesn't take account of your #1.)

> I got rid of support for this in r10973 

I don't think this is the same as that bug. The problem here is that the attempt in aspacemgr-linux.c to special case this mapping (which is still present in trunk) is not firing because the guess it makes about the length of the mapping is wrong.
Comment 4 Peter Maydell 2010-10-18 18:26:30 UTC
> I'd assumed that the page-of-kernel-made-up-code-in-userspace
> thing was some nasty kludge that had been done away with in more
> recent kernels

I'm told it's a standard part of the ARM kernel ABI for userspace. This isn't Linaro specific at all.

Binaries compiled for eg armv7 may well not need to use as many of the helpers as armv4-compatible binaries, but as you see even they will still use some of them.

Do any of them do anything that prevents us from translating and executing like any other bit of userspace code?
Comment 5 Peter Maydell 2010-10-18 19:11:05 UTC
> Do any of them do anything that prevents us from translating and executing
> like any other bit of userspace code?

Having looked at the kernel implementations for 2.6.35 I think the answer is "no" for __kernel_dmb(), "no if armv6 or better" for __kernel_cmpxchg() and "maybe" for __kernel_get_tls(). [The tls code is patched at kernel startup to just do a coprocessor register access if the cpu supports it and you haven't set some odd config options; so I think the usual armv7 case will be 'no'.]

(Technically the kernel could change to break us without advance warning because the ABI is only the entry points and their results, and the actual code executed could change...)
Comment 6 Julian Seward 2010-10-20 10:21:06 UTC
(In reply to comment #2)
> With this fix I can valgrind /bin/ls on both the machine where [vectors] does
> not appear in maps and the machine where it does.

There are two different but related issues here:

(1) whether the commpage is properly mapped/permission'd in m_aspacemgr

(2) whether guest attempts to set/get its TLS pointer work correctly.

Your patch addresses (1).  As it stands, Valgrind assumes that the
guest keeps its TLS in TPIDRURO (guest_TPIDRURO in the ARM guest
state).  Guest syscalls to ARM_set_tls are intercepted, the specified
value is parked in guest_TPIDRURO, and Valgrind doesn't let the call
go through to the kernel, because that would result in the kernel
setting the host TPIDRURO, which isn't what we want.

I am concerned that allowing the commpage to be executable could
result in mishandling of TLS values.  That said, if all the commpage
does in response to a call __kernel_get_tls is to read TPIDRURO, then
that's OK because it will merely read the value parked there by
sys_set_tls in syswrap-arm-linux.c.  If the TLS is stored anywhere
else then we're in deep trouble.

Re the patch, I would be more convinced (given the above) if you are
successfully able to run, with --tool=memcheck, non-trivial
threaded apps on both kernel configurations.  (viz, Firefox and
OpenOffice).  A useful small test case might be
helgrind/tests/tc17_sembar: 
gcc -g -o tc17 helgrind/tests/tc17_sembar.c -DVGO_linux
Comment 7 Peter Maydell 2010-10-20 15:00:38 UTC
I can confirm that firefox, openoffice and the tc17 test case all work correctly with this patch. (This is as expected, since the patch simply makes kernels which report [vectors] in maps behave the same way as kernels which don't report [vectors], and we already know from the pegatron that "just execute the commpage" works for armv7 kernels not configured for software TLS.)

Things will go wrong on kernels not configured for s/w TLS, but they do already: that's bug 250631. (You will note that the reported failure mode there is not "tried to execute in the commpage but could not", but "executed in the commpage and got the wrong answer and failed later on". We can deduce that the submitter's kernel config doesn't report [vectors] in maps and has s/w TLS.)

I agree that it would be cleaner for valgrind to intercept all three of the commpage entry points and emulate/wrap them the same way it does system calls. If you do that then this patch would become moot; if you're not going to do that for 3.6.0 then I think we should definitely put this patch in for the moment.
Comment 8 Peter Maydell 2010-10-20 15:52:26 UTC
> Things will go wrong on kernels not configured for s/w TLS

Gah. I meant "on kernels configured for s/w TLS". Sorry.
Comment 9 Julian Seward 2010-10-20 16:21:23 UTC
Thanks for the extra testing.  I'll commit the patch shortly.

(The following is clarification)  I thought more about the TLS issue.
The difficulty lies in the asymmetry between how it is set and how it
is fetched.

To get it, the guest either reads TPIDRURO directly, or calls the
commpage, which does whatever it does to produce the value.

To set it, the guest does a ARM_set_tls syscall.  Problem is that
Valgrind doesn't know what that syscall does, and so it doesn't know
what to do with the supplied value.  In this case we assume that the
kernel merely parks it in TPIDRURO, so we copy the value into the
simulated TPIDRURO.

But the kernel _could_ do absolutely anything with the value.  It
could use it to alter the number of pink elephants in orbit around the
moon, provided only that the get_tls routine in the commpage somehow
manages to come up with the right value when called.

If the TLS were both read and written via syscalls, fine -- then the
required value would go back and forth across the kernel boundary,
directly into guest registers or memory (don't care which).

If the TLS was handled entirely in userspace, by setting some
register, also fine .. we'd just simulate the register like all the
rest, and reads and writes of it would be handed naturally.

In the asymmetric case, we have to make some assumption about how
ARM_set_tls alters the guest state, and if we get that wrong then it
fails a la bug 250631.
Comment 10 Tom Hughes 2010-10-20 16:53:01 UTC
Well so long as we can assume that a value set by set_tls is only ever fetched by calling the commpage routine then we can stash the value wherever we like when set_tls is called so long as we also replace the commpage routine with one which fetches it from the same place.

We already replace some of the commpage routines on x86 so we have the technology to do that.
Comment 11 Julian Seward 2010-10-20 17:42:42 UTC
Committed w/ further comments, r11460.  Thanks for the analysis
and testing.
Comment 12 Peter Maydell 2010-10-20 18:05:42 UTC
> Well so long as we can assume that a value set by set_tls is only ever fetched
> by calling the commpage routine

I believe that the ABI allows a program to also read the TLS by executing the MRC instruction to read the register directly, so valgrind would need to support both commpage call and MRC to read the TLS value. (The idea being that if you're compiling for armv6k or better you go ahead and use the MRC, if you're compiling for armv4/v5 for back-compatibility then you emit a call to the commpage, and either way will work on a kernel that's compiled for v6k or better.) Doesn't invalidate the approach, just a wrinkle to remember.