Bug 384442 - ARM: bad pc in complaint if instruction changes pc
Summary: ARM: bad pc in complaint if instruction changes pc
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.13 SVN
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-06 18:17 UTC by John Reiser
Modified: 2018-07-25 07:26 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Reiser 2017-09-06 18:17:29 UTC
On ARM(32-bit) the instruction "ldmdb r11, {0xAFF0}" writes r15(==pc), and if the instruction generates a complaint (such as by fetching from below the stack pointer) then memcheck reports the new value of pc in its "at 0x..." complaint.  Instead, the complaint should specify the pc of the ldmdb.  This can be fixed by adding a new thread-state variable "pc_original" which changes only at the beginning of each instruction.

"ldmdb r11, {0xAFF0}" also writes r13(==sp), and that is the source of another problem.  The instruction is atomic, and in effect all memory fetches occur before sp is written, yet memcheck uses the new value of sp for checking fetches from the stack for subsequent registers.  This leads to spurious "Invalid read of size 4" at address (sp - 20).

This problem is seen on OpenWrt Chaos Calmer 15.05 using gcc-5.3.0 and/or gcc-4.8-Linaro.  The compiler could avoid both problems by generating better code, but until then memcheck should point to the actual offender.
Comment 1 John Reiser 2017-09-07 15:20:19 UTC
Both problems can be solved by a special case for 'ldmdb' (LoaD Multiple registers, Decrementing the address Before each fetch).  If either r15(==pc) or r13(==sp) is to be loaded, then assign the new value to a temporary, and perform the final assign-from-temporary-to-pc-or-sp only after all other registers have been fetched and assigned.  This analysis can be done at translation time.  The same analysis works for any 'ldm', although ldmdb has the most exposure because r15 and r13 appear before most other registers in the serial processing.
Comment 2 John Reiser 2017-09-07 15:34:45 UTC
Context: [valgrind-users] by Kacper Kowalski on 2017-08-31 and replies through the following week.

*****

processor    : 0
model name    : ARMv7 Processor rev 10 (v7l)
BogoMIPS    : 132.00
Features    : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x2
CPU part    : 0xc09
CPU revision    : 10

Hardware    : Freescale i.MX6 Quad/DualLite (Device Tree)
Revision    : 0000
Serial        : 0000000000000000

Operating system info:
cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='15.05'
DISTRIB_REVISION='r48153'
DISTRIB_CODENAME='chaos_calmer'
DISTRIB_TARGET='imx6/generic'
DISTRIB_DESCRIPTION='OpenWrt Chaos Calmer 15.05'
DISTRIB_TAINTS='no-all busybox'

cat /proc/version
Linux version 3.18.23 (gcc version 5.3.0 (OpenWrt GCC 5.3.0 r48153) ) #6 SMP Tue Jul 11 16:35:20 CEST 2017

Source code could be downloaded from:
https://uclibc.org/downloads/uClibc-0.9.33.2.tar.bz2

Extracted chunks from trace output can be downloaded from:
https://github.com/KKoovalsky/Valgrind-problems

*****

(gdb) target remote | /usr/lib/valgrind/../../bin/vgdb --pid=8269
Remote debugging using | /usr/lib/valgrind/../../bin/vgdb --pid=8269
relaying data between gdb and process 8269
Reading symbols from /lib/ld-uClibc.so.0...done.
Loaded symbols for /lib/ld-uClibc.so.0
0x04000e68 in _start () from /lib/ld-uClibc.so.0
(gdb) c
Continuing.
Reading in symbols for ldso/ldso/ldso.c...done.
Reading symbols from /usr/lib/valgrind/vgpreload_core-arm-linux.so...done.
Loaded symbols for /usr/lib/valgrind/vgpreload_core-arm-linux.so
Reading symbols from /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so...done.
Loaded symbols for /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so
Reading symbols from /usr/lib/libboost_thread.so.1.64.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libboost_thread.so.1.64.0
Reading symbols from /usr/lib/libboost_system.so.1.64.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libboost_system.so.1.64.0
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/libm.so.0...done.
Loaded symbols for /lib/libm.so.0
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.0...done.
Loaded symbols for /lib/libc.so.0
Reading symbols from /usr/lib/libboost_atomic.so.1.64.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libboost_atomic.so.1.64.0
Reading symbols from /lib/librt.so.0...done.
Loaded symbols for /lib/librt.so.0
Reading symbols from /lib/libdl.so.0...done.
Loaded symbols for /lib/libdl.so.0

*****

==8269== Invalid read of size 4
==8269==    at 0x40058F8: _dl_get_ready_to_run (ldso.c:1396)
==8269==  Address 0x7d80c624 is on thread 1's stack
==8269==  20 bytes below stack pointer
==8269== 
Program received signal SIGTRAP, Trace/breakpoint trap.
_dl_get_ready_to_run (Reading in symbols for libc/sysdeps/linux/arm/crti.S...done.
tpnt=0x4007ac0, load_addr=<optimized out>, auxvt=0x7d80ca48, envp=<optimized out>, argv=0x0) at ldso/ldso/ldso.c:1396
1396    ldso/ldso/ldso.c: No such file or directory.
Current language:  auto
The current source language is "auto; currently c".

(gdb) bt
#0  _dl_get_ready_to_run (tpnt=0x4007ac0, load_addr=<optimized out>, auxvt=0x7d80ca48, envp=<optimized out>, argv=0x0) at ldso/ldso/ldso.c:1396
#1  0x049be918 in _init () at libc/sysdeps/linux/arm/crti.S:22
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

*****

==8269== Invalid read of size 4
==8269==    at 0x4A16E70: __uClibc_fini (__uClibc_main.c:304)
==8269==    by 0x4A134BB: exit (_atexit.c:336)
==8269==    by 0x4A1718F: __uClibc_main (__uClibc_main.c:511)
==8269==    by 0xFFFFFFFF: ???
==8269==  Address 0x7d80ca0c is on thread 1's stack
==8269==  20 bytes below stack pointer
==8269==
Program received signal SIGTRAP, Trace/breakpoint trap.
__GI___uClibc_fini () at libc/misc/internals/__uClibc_main.c:304
304    in libc/misc/internals/__uClibc_main.c

(gdb) bt
Reading in symbols for libc/stdlib/exit.c...done.
#0  __GI___uClibc_fini () at libc/misc/internals/__uClibc_main.c:304
#1  0x04a134bc in __GI_exit (rv=0) at libc/stdlib/_atexit.c:336
#2  0x04a17190 in __uClibc_main (main=0x7d80cc04, argc=77770752, argv=0x4a17190 <__uClibc_main+764>, app_init=0x0, app_fini=0x10680 <_fini>,
    rtld_fini=0x4000df0 <_dl_fini>, stack_end=0x7d80cc04) at libc/misc/internals/__uClibc_main.c:511
#3  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

*****
Comment 3 John Reiser 2017-09-07 23:37:08 UTC
Here is the output from "--trace-flags=10000000", condensed and commented:
=====
        (arm) 0x49BD90C:  mov r12, r13
        (arm) 0x49BD910:  stmdb r13!, {0xDFF0}
        (arm) 0x49BD914:  sub r11, r12, #0x4
        (arm) 0x49BD918:  ldmdb r11, {0xAFF0}

              ------ IMark(0x49BD918, 4, 0) ------
              t8 = GET:I32(52 /*r11*/)
              t9 = t8
              PUT(68 /*r15,pc*/) = LDle:I32(Sub32(t9,0x4:I32))  // updated pc no longer points to the 'ldmdb'
              PUT(60 /*r13,sp*/) = LDle:I32(Sub32(t9,0x8:I32))  // updated sp
              PUT(48 /*r10*/) = LDle:I32(Sub32(t9,0x10:I32))  // fetch from wrong side of *updated* sp
              PUT(44 /*r9*/) = LDle:I32(Sub32(t9,0x14:I32))
              PUT(40 /*r8*/) = LDle:I32(Sub32(t9,0x18:I32))
              PUT(36 /*r7*/) = LDle:I32(Sub32(t9,0x1C:I32))
              PUT(32 /*r6*/) = LDle:I32(Sub32(t9,0x20:I32))
              PUT(28 /*r5*/) = LDle:I32(Sub32(t9,0x24:I32))
              PUT(24 /*r4*/) = LDle:I32(Sub32(t9,0x28:I32))
              PUT(52 /*r11*/) = LDle:I32(Sub32(t9,0xC:I32))
              PUT(68 /*r15,pc*/) = GET:I32(68)
              PUT(68 /*r15,pc*/) = GET:I32(68); exit-Boring

GuestBytes 49BD90C 16  0D C0 A0 E1 F0 DF 2D E9 04 B0 4C E2 F0 AF 1B E9  0028F343

VexExpansionRatio 16 952   595 :10

==26904== Invalid read of size 4
==26904==    at 0x40054EC: ??? (in /lib/ld-uClibc-0.9.33.2.so)  ## pc should be 0x49BD918
==26904==  Address 0x7dad0654 is on thread 1's stack
==26904==  20 bytes below stack pointer  ## below updated sp, but above original sp; should be no complaint
==26904==

=====
(Of course the original [user] compiler code generator could do much better.)

The VEX code in mk_ldm_stm() claims
===== priv/guest_arm_toIR.c  near line 14138
   if (bW == 0 && (regList & (1<<rN)) != 0) {
      /* Non-writeback, and basereg is to be transferred.  Do its
         transfer last for a load and first for a store.  Requires
         reordering xOff/xReg. */
=====
but I don't understand why.  The earlier
              t8 = GET:I32(52 /*r11*/)
              t9 = t8
has de-coupled the original value of the base register from any interaction with subsequent code that deals with the register list.

The real problem is that the new values of r15(==pc) and r13(==sp) have been written while the original values still are needed to detect fetching from the wrong side of sp (via LDle:I32() for each remaining register), and for reporting the pc of any complaint.  It is the writing of pc and sp that should be re-ordered; and if both sp and pc are written then the generated code will require another temporary variable to hold the new value of pc until after there is no possibility of a complaint about this instruction.
Comment 4 Julian Seward 2018-07-25 07:26:14 UTC
There's a bunch of related hacks in the arm(32) and arm64 front
ends.  Grep for "earlyWriteback" or some such.  They are probably
related.

The fundamental problem though is that these insns simply aren't
representable properly in the IR -- it splits them into pieces and
it really needs to keep them atomic.

You're right about the original_pc thing, too.  But it would be
a perf hit.