On ARM(32-bit) the instruction "ldmdb r11, {0xAFF0}" writes r15(==pc), and if the instruction generates a complaint (such as by fetching from below the stack pointer) then memcheck reports the new value of pc in its "at 0x..." complaint. Instead, the complaint should specify the pc of the ldmdb. This can be fixed by adding a new thread-state variable "pc_original" which changes only at the beginning of each instruction. "ldmdb r11, {0xAFF0}" also writes r13(==sp), and that is the source of another problem. The instruction is atomic, and in effect all memory fetches occur before sp is written, yet memcheck uses the new value of sp for checking fetches from the stack for subsequent registers. This leads to spurious "Invalid read of size 4" at address (sp - 20). This problem is seen on OpenWrt Chaos Calmer 15.05 using gcc-5.3.0 and/or gcc-4.8-Linaro. The compiler could avoid both problems by generating better code, but until then memcheck should point to the actual offender.
Both problems can be solved by a special case for 'ldmdb' (LoaD Multiple registers, Decrementing the address Before each fetch). If either r15(==pc) or r13(==sp) is to be loaded, then assign the new value to a temporary, and perform the final assign-from-temporary-to-pc-or-sp only after all other registers have been fetched and assigned. This analysis can be done at translation time. The same analysis works for any 'ldm', although ldmdb has the most exposure because r15 and r13 appear before most other registers in the serial processing.
Context: [valgrind-users] by Kacper Kowalski on 2017-08-31 and replies through the following week. ***** processor : 0 model name : ARMv7 Processor rev 10 (v7l) BogoMIPS : 132.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10 Hardware : Freescale i.MX6 Quad/DualLite (Device Tree) Revision : 0000 Serial : 0000000000000000 Operating system info: cat /etc/openwrt_release DISTRIB_ID='OpenWrt' DISTRIB_RELEASE='15.05' DISTRIB_REVISION='r48153' DISTRIB_CODENAME='chaos_calmer' DISTRIB_TARGET='imx6/generic' DISTRIB_DESCRIPTION='OpenWrt Chaos Calmer 15.05' DISTRIB_TAINTS='no-all busybox' cat /proc/version Linux version 3.18.23 (gcc version 5.3.0 (OpenWrt GCC 5.3.0 r48153) ) #6 SMP Tue Jul 11 16:35:20 CEST 2017 Source code could be downloaded from: https://uclibc.org/downloads/uClibc-0.9.33.2.tar.bz2 Extracted chunks from trace output can be downloaded from: https://github.com/KKoovalsky/Valgrind-problems ***** (gdb) target remote | /usr/lib/valgrind/../../bin/vgdb --pid=8269 Remote debugging using | /usr/lib/valgrind/../../bin/vgdb --pid=8269 relaying data between gdb and process 8269 Reading symbols from /lib/ld-uClibc.so.0...done. Loaded symbols for /lib/ld-uClibc.so.0 0x04000e68 in _start () from /lib/ld-uClibc.so.0 (gdb) c Continuing. Reading in symbols for ldso/ldso/ldso.c...done. Reading symbols from /usr/lib/valgrind/vgpreload_core-arm-linux.so...done. Loaded symbols for /usr/lib/valgrind/vgpreload_core-arm-linux.so Reading symbols from /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so...done. Loaded symbols for /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so Reading symbols from /usr/lib/libboost_thread.so.1.64.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libboost_thread.so.1.64.0 Reading symbols from /usr/lib/libboost_system.so.1.64.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libboost_system.so.1.64.0 Reading symbols from /lib/libpthread.so.0...done. Loaded symbols for /lib/libpthread.so.0 Reading symbols from /usr/lib/libstdc++.so.6...done. Loaded symbols for /usr/lib/libstdc++.so.6 Reading symbols from /lib/libm.so.0...done. Loaded symbols for /lib/libm.so.0 Reading symbols from /lib/libgcc_s.so.1...done. Loaded symbols for /lib/libgcc_s.so.1 Reading symbols from /lib/libc.so.0...done. Loaded symbols for /lib/libc.so.0 Reading symbols from /usr/lib/libboost_atomic.so.1.64.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libboost_atomic.so.1.64.0 Reading symbols from /lib/librt.so.0...done. Loaded symbols for /lib/librt.so.0 Reading symbols from /lib/libdl.so.0...done. Loaded symbols for /lib/libdl.so.0 ***** ==8269== Invalid read of size 4 ==8269== at 0x40058F8: _dl_get_ready_to_run (ldso.c:1396) ==8269== Address 0x7d80c624 is on thread 1's stack ==8269== 20 bytes below stack pointer ==8269== Program received signal SIGTRAP, Trace/breakpoint trap. _dl_get_ready_to_run (Reading in symbols for libc/sysdeps/linux/arm/crti.S...done. tpnt=0x4007ac0, load_addr=<optimized out>, auxvt=0x7d80ca48, envp=<optimized out>, argv=0x0) at ldso/ldso/ldso.c:1396 1396 ldso/ldso/ldso.c: No such file or directory. Current language: auto The current source language is "auto; currently c". (gdb) bt #0 _dl_get_ready_to_run (tpnt=0x4007ac0, load_addr=<optimized out>, auxvt=0x7d80ca48, envp=<optimized out>, argv=0x0) at ldso/ldso/ldso.c:1396 #1 0x049be918 in _init () at libc/sysdeps/linux/arm/crti.S:22 Backtrace stopped: previous frame inner to this frame (corrupt stack?) ***** ==8269== Invalid read of size 4 ==8269== at 0x4A16E70: __uClibc_fini (__uClibc_main.c:304) ==8269== by 0x4A134BB: exit (_atexit.c:336) ==8269== by 0x4A1718F: __uClibc_main (__uClibc_main.c:511) ==8269== by 0xFFFFFFFF: ??? ==8269== Address 0x7d80ca0c is on thread 1's stack ==8269== 20 bytes below stack pointer ==8269== Program received signal SIGTRAP, Trace/breakpoint trap. __GI___uClibc_fini () at libc/misc/internals/__uClibc_main.c:304 304 in libc/misc/internals/__uClibc_main.c (gdb) bt Reading in symbols for libc/stdlib/exit.c...done. #0 __GI___uClibc_fini () at libc/misc/internals/__uClibc_main.c:304 #1 0x04a134bc in __GI_exit (rv=0) at libc/stdlib/_atexit.c:336 #2 0x04a17190 in __uClibc_main (main=0x7d80cc04, argc=77770752, argv=0x4a17190 <__uClibc_main+764>, app_init=0x0, app_fini=0x10680 <_fini>, rtld_fini=0x4000df0 <_dl_fini>, stack_end=0x7d80cc04) at libc/misc/internals/__uClibc_main.c:511 #3 0x00000000 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) *****
Here is the output from "--trace-flags=10000000", condensed and commented: ===== (arm) 0x49BD90C: mov r12, r13 (arm) 0x49BD910: stmdb r13!, {0xDFF0} (arm) 0x49BD914: sub r11, r12, #0x4 (arm) 0x49BD918: ldmdb r11, {0xAFF0} ------ IMark(0x49BD918, 4, 0) ------ t8 = GET:I32(52 /*r11*/) t9 = t8 PUT(68 /*r15,pc*/) = LDle:I32(Sub32(t9,0x4:I32)) // updated pc no longer points to the 'ldmdb' PUT(60 /*r13,sp*/) = LDle:I32(Sub32(t9,0x8:I32)) // updated sp PUT(48 /*r10*/) = LDle:I32(Sub32(t9,0x10:I32)) // fetch from wrong side of *updated* sp PUT(44 /*r9*/) = LDle:I32(Sub32(t9,0x14:I32)) PUT(40 /*r8*/) = LDle:I32(Sub32(t9,0x18:I32)) PUT(36 /*r7*/) = LDle:I32(Sub32(t9,0x1C:I32)) PUT(32 /*r6*/) = LDle:I32(Sub32(t9,0x20:I32)) PUT(28 /*r5*/) = LDle:I32(Sub32(t9,0x24:I32)) PUT(24 /*r4*/) = LDle:I32(Sub32(t9,0x28:I32)) PUT(52 /*r11*/) = LDle:I32(Sub32(t9,0xC:I32)) PUT(68 /*r15,pc*/) = GET:I32(68) PUT(68 /*r15,pc*/) = GET:I32(68); exit-Boring GuestBytes 49BD90C 16 0D C0 A0 E1 F0 DF 2D E9 04 B0 4C E2 F0 AF 1B E9 0028F343 VexExpansionRatio 16 952 595 :10 ==26904== Invalid read of size 4 ==26904== at 0x40054EC: ??? (in /lib/ld-uClibc-0.9.33.2.so) ## pc should be 0x49BD918 ==26904== Address 0x7dad0654 is on thread 1's stack ==26904== 20 bytes below stack pointer ## below updated sp, but above original sp; should be no complaint ==26904== ===== (Of course the original [user] compiler code generator could do much better.) The VEX code in mk_ldm_stm() claims ===== priv/guest_arm_toIR.c near line 14138 if (bW == 0 && (regList & (1<<rN)) != 0) { /* Non-writeback, and basereg is to be transferred. Do its transfer last for a load and first for a store. Requires reordering xOff/xReg. */ ===== but I don't understand why. The earlier t8 = GET:I32(52 /*r11*/) t9 = t8 has de-coupled the original value of the base register from any interaction with subsequent code that deals with the register list. The real problem is that the new values of r15(==pc) and r13(==sp) have been written while the original values still are needed to detect fetching from the wrong side of sp (via LDle:I32() for each remaining register), and for reporting the pc of any complaint. It is the writing of pc and sp that should be re-ordered; and if both sp and pc are written then the generated code will require another temporary variable to hold the new value of pc until after there is no possibility of a complaint about this instruction.
There's a bunch of related hacks in the arm(32) and arm64 front ends. Grep for "earlyWriteback" or some such. They are probably related. The fundamental problem though is that these insns simply aren't representable properly in the IR -- it splits them into pieces and it really needs to keep them atomic. You're right about the original_pc thing, too. But it would be a perf hit.