Bug 360571 - Error about the Android Runtime reading below the stack pointer on ARM
Summary: Error about the Android Runtime reading below the stack pointer on ARM
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.11.0
Platform: Android Unspecified
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-15 16:52 UTC by Anton Kirilov
Modified: 2016-10-19 10:13 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Proposed fix (lacks documentation, but seems to work) (8.85 KB, patch)
2016-10-07 17:13 UTC, Julian Seward
Details
A simple test program. (765 bytes, patch)
2016-10-07 17:15 UTC, Julian Seward
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Anton Kirilov 2016-03-15 16:52:43 UTC
The Android Runtime (ART) uses reads below the stack pointer (combined with a guard memory region) to detect stack overflow, and as a result when an application is run in the virtual machine, Valgrind outputs a lot of warnings such as:

==29768== Invalid read of size 4
==29768==    at 0x739AEBE0: ??? (in /data/dalvik-cache/arm/system@framework@boot.oat)
==29768==  Address 0xfe96fdb0 is on thread 1's stack
==29768==  8192 bytes below stack pointer
==29768==

The assembly language code that is emitted by ART at the beginning of each method, and that causes the warning, looks like this:

sub r12, sp, #8192
ldr.w r12, [r12, #0]

The --workaround-gcc296-bugs command-line parameter is supposed to help, but it has some problems - it forces Valgrind to ignore memory accesses that are up to 1 KB below SP, but on ARM the distance is 8 KB.

I have seen the issue with a build of the master branch of the Android Open Source Project, but is reproducible on older versions as well (e.g. the 6.0 release).

Reproducible: Always

Steps to Reproduce:
1. Install Valgrind on the Android device as documented in the README.android file; assume that the installation directory is /data/local/tmp/valgrind.
2. Get an APK or a DEX file; let its name be Test.dex, and let the name of the class containing the main() method be Test as well.
3. adb push Test.dex /data/local/tmp
4. adb shell /data/local/tmp/valgrind/bin/valgrind --workaround-gcc296-bugs dalvikvm -cp /data/local/tmp/Test.dex Test

Actual Results:  
Warnings similar to the one above are displayed.

Expected Results:  
No warnings of that type are shown.

Currently, the workaround that I have been using is manually patching the Valgrind source code (in particular, the VG_GCC296_BUG_STACK_SLOP constant in memcheck/mc_errors.c), but a better solution is to introduce a way to configure that value at runtime. Furthermore, I'd like to suggest an addition to the README.android file - mentioning that people who want to run apps in the virtual machine with Valgrind should use the --workaround-gcc296-bugs parameter. Finally, that argument should perhaps be renamed to something more accurate.
Comment 1 Julian Seward 2016-08-03 12:31:50 UTC
(In reply to Anton Kirilov from comment #0)

> The assembly language code that is emitted by ART at the beginning of each
> method, and that causes the warning, looks like this:
> 
> sub r12, sp, #8192
> ldr.w r12, [r12, #0]

My understanding is as follows.  The loaded value is not actually used.  The load is
only performed so as to see if it is actually possible, or whether it takes a segfault
on a no-access guard page placed at the lowest address in the stack area.  Is that
correct?
Comment 2 Anton Kirilov 2016-08-03 18:05:29 UTC
Yes, that is correct. ART also sets up a signal handler for SIGSEGV.
Comment 3 Julian Seward 2016-08-09 10:37:51 UTC
(In reply to Anton Kirilov from comment #0)
> Currently, the workaround that I have been using is manually patching the
> Valgrind source code (in particular, the VG_GCC296_BUG_STACK_SLOP constant
> in memcheck/mc_errors.c), but a better solution is to introduce a way to
> configure that value at runtime. [..]

That would work, but I think we can do something a bit better here.
Using a renamed and parameterised version of --workaround-gcc296-bugs
will have the effect that all accesses below SP, down to the relevant
offset -- in this case, 8192 -- will be ignored, and so any other
invalid accesses below SP due to real bugs will get ignored.  An
alternative is to introduce a new flag, something like this (eg)

  --ignore-accesses-below-sp-range=<firstoffset>-<lastoffset>

so that just the specified range is ignored.  This would work since
(AFAICS) your accesses are always at a fixed offset, -8192, from SP.
Hence you could use:

  --ignore-accesses-below-sp-range=8189,8192

Then --workaround-gcc296-bugs=yes can be deprecated and declared to be
a synonym for 

  --ignore-accesses-below-sp-range=0,1023

Mark: would that also work for your use case(s)?
Comment 4 Anton Kirilov 2016-08-09 14:59:57 UTC
(In reply to Julian Seward from comment #3)
> An alternative is to introduce a new flag, something like this (eg)
> 
>   --ignore-accesses-below-sp-range=<firstoffset>-<lastoffset>
> 
> so that just the specified range is ignored.
Yes, that will work as well.

Just a quick idea - do you think it is worth it supporting more than one range (e.g. for running code generated by a compiler that misbehaves in the same way as GCC 2.96 in an environment similar to Android)?
Comment 5 Julian Seward 2016-08-09 15:40:18 UTC
(In reply to Anton Kirilov from comment #4)
> Just a quick idea - do you think it is worth it supporting more than one
> range (e.g. for running code generated by a compiler that misbehaves in the
> same way as GCC 2.96 in an environment similar to Android)?

Well, not really.  The gcc-2.96 style read/write below SP errors are real bugs
and I'd prefer to make it hard to hide those :-).  In the worst case you could
simply use
  --ignore-accesses-below-sp-range=0,8191

and then they would all disappear.

Do you have an actual use case like this?  Or is it more of a theoretical question?
Comment 6 Anton Kirilov 2016-08-09 15:52:26 UTC
Just a theoretical question.
Comment 7 Julian Seward 2016-10-07 17:13:29 UTC
Created attachment 101476 [details]
Proposed fix (lacks documentation, but seems to work)

For example, to keep the test program (next attachment) happy:

  ./vg-in-place --ignore-range-below-sp=8192-8189 ./access_below_sp 

Note that the first specified offset must be larger than the second, so as to
imply a non-wraparound address range.  This is because they are really
negative offsets relative to the stack pointer.
Comment 8 Julian Seward 2016-10-07 17:15:08 UTC
Created attachment 101477 [details]
A simple test program.
Comment 9 Julian Seward 2016-10-07 17:17:10 UTC
Anton, can you perhaps try this on aarch64 ?  Would this work for you?
(Apologies .. there's one line in the test program you'll have to change.)
Comment 10 Anton Kirilov 2016-10-12 17:27:17 UTC
I tried the fix both on Aarch32 and Aarch64, and it worked. I just modified lines 35-36 of the test program to either:

 __asm__ __volatile__("sub r0, sp, #8192\n\tldr %0, [r0, #0]"
                                     : "=r"(res) : : "r0", "memory","cc");

for Aarch32, or:

 __asm__ __volatile__("sub x0, sp, #8192\n\tldr %w0, [x0, #0]"
                                     : "=r"(res) : : "r0", "memory","cc");

for Aarch64.

I also ran a simple test application with ART, and this particular issue disappeared (although Valgrind complained about other things).
Comment 11 Julian Seward 2016-10-18 17:16:35 UTC
Committed on trunk, r16073.