Bug 338977 - port vgdb 'ptrace' invoker on android to allow vgdb to connect to a process blocked in a syscall
Summary: port vgdb 'ptrace' invoker on android to allow vgdb to connect to a process b...
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.9.0
Platform: Android Android 4.x
: NOR wishlist
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-10 13:26 UTC by Ayberk Özgür
Modified: 2014-09-11 19:59 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ayberk Özgür 2014-09-10 13:26:35 UTC
On Android, running valgrind with `--vgdb=yes` creates a FIFO pipe that should listen to vgdb commands (along with another pipe for the reverse direction and a piece of shared memory). However, a command such as `vgdb instrumentation on` just hangs forever. 

This is tested on an armv7 emulator with Android 4.0.3 (which is reported to work) and on a Galaxy Note II with Android 4.3.1 based Cyanogenmod. Both devices were rooted and were running insecure adbd's. Two valgrind builds were tested, one built with ndk-r6 and one with ndk-r9d. The result is the same on all configurations. Both devices are able to run valgrind but neither runs listen to vgdb. 

For reference, the valgrind command and its output are as follows: 

```
# ./valgrind -v -v -v --vgdb=yes sleep 1000
==3640== Memcheck, a memory error detector
==3640== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==3640== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==3640== Command: sleep 1000
==3640== 
--3640-- Valgrind options:
--3640--    -v
--3640--    -v
--3640--    -v
--3640--    --vgdb=yes
--3640-- Contents of /proc/version:
--3640--   Linux version 2.6.29-g46b05b2 (vchtchetkine@vc-irv.irv.corp.google.com) (gcc version 4.4.3 (GCC) ) #28 Thu Nov 17 06:39:36 PST 2011
--3640-- Arch and hwcaps: ARM, ARMv7-vfp-neon
--3640-- Page sizes: currently 4096, max supported 4096
--3640-- Valgrind library directory: /data/local/Inst/lib/valgrind
--3640-- TT/TC: VG_(init_tt_tc) (startup of code management)
--3640-- TT/TC: cache: 6 sectors of 27597024 bytes each = 165582144 total
--3640-- TT/TC: table: 6 tables  of 11531696 bytes each = 69190176 total
--3640-- TT/TC: table: 65521 entries each = 393126 total entries max occupancy 255528 (65%)
--3640-- Reading syms from /system/xbin/busybox
--3640--    svma 0x0000008120, avma 0x0000008120
--3640--    object doesn't have a symbol table
--3640--    object doesn't have a dynamic symbol table
--3640-- Reading syms from /data/local/Inst/lib/valgrind/memcheck-arm-linux
--3640--    svma 0x0038000000, avma 0x0038000000
--3640--    object doesn't have a dynamic symbol table
--3640-- Scheduler: using generic scheduler lock implementation.
--3640-- Reading suppressions file: /data/local/Inst/lib/valgrind/default.supp
==3640== embedded gdbserver: reading from /data/local/Inst/vgdb-pipe-from-vgdb-to-3640-by-???-on-???
==3640== embedded gdbserver: writing to   /data/local/Inst/vgdb-pipe-to-vgdb-from-3640-by-???-on-???
==3640== embedded gdbserver: shared mem   /data/local/Inst/vgdb-pipe-shared-mem-vgdb-3640-by-???-on-???
==3640== 
==3640== TO CONTROL THIS PROCESS USING vgdb (which you probably
==3640== don't want to do, unless you know exactly what you're doing,
==3640== or are doing some strange experiment):
==3640==   /data/local/Inst/lib/valgrind/../../bin/vgdb --pid=3640 ...command...
==3640== 
==3640== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==3640==   /path/to/gdb sleep
==3640== and then give GDB the following command
==3640==   target remote | /data/local/Inst/lib/valgrind/../../bin/vgdb --pid=3640
==3640== --pid is optional if only one valgrind process is running
==3640== 
--3640-- TT/TC: initialise sector 0
```

The vgdb command is (there is no output until it's killed):

```
# ./vgdb instrumentation on
^Csyscall failed: Interrupted system call
error opening /data/local/Inst/vgdb-pipe-to-vgdb-from-3640-by-???-on-??? read cmd result from pid
```

Reproducible: Always

Steps to Reproduce:
1. Build valgrind with `export HWKIND=generic` and `--with-tmpdir=/data/local/Inst`
2. Run `valgrind --vgdb=yes --tool=callgrind sleep 1000`
3. In another terminal, run `vgdb instrumentation on`

Actual Results:  
vgdb hangs forever until killed.

Expected Results:  
vgdb should have reported:
```
sending command instrumentation on to pid 3640
```

Then, valgrind should have reported:
```
...
==3640== Remote side has terminated connection.  GDBserver will reopen the connection.
==3640== embedded gdbserver: reading from /data/local/Inst/vgdb-pipe-from-vgdb-to-3640-by-???-on-???
==3640== embedded gdbserver: writing to   /data/local/Inst/vgdb-pipe-to-vgdb-from-3640-by-???-on-???
==3640== embedded gdbserver: shared mem   /data/local/Inst/vgdb-pipe-shared-mem-vgdb-3640-by-???-on-???
==3640== 
==3640== TO CONTROL THIS PROCESS USING vgdb (which you probably
==3640== don't want to do, unless you know exactly what you're doing,
==3640== or are doing some strange experiment):
==3640==   /usr/lib/valgrind/../../bin/vgdb --pid=3640 ...command...
==3640== 
==3640== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==3640==   /path/to/gdb sleep
==3640== and then give GDB the following command
==3640==   target remote | /usr/lib/valgrind/../../bin/vgdb --pid=3640
==3640== --pid is optional if only one valgrind process is running
==3640== 
```

- On the real device, the HOSTNAME and USER are not ??? but actually are `t0lte` and `root` properly. It probably has nothing to do with this issue. 

- I'm sure that FIFO piping works in the aforementioned `/data/local/Inst` directory because the following works:
```
# mkfifo examplepipe
# echo message > examplepipe
... command exits after the pipe is read in another shell ...
# 
```

In another shell:
```
# cat examplepipe
message
```

- I'm also sure that vgdb is finding the valgrind instance because it doesn't report `FIFO not found` error.

- I'm guessing at this point that the reason vgdb hangs is that valgrind does not listen to the pipe for some reason.
Comment 1 Philippe Waroquiers 2014-09-10 21:57:02 UTC
Please refer to user manual section
   http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver-limitations
bullet 'Connecting to or interrupting a Valgrind process blocked in a system call.'
last paragraph, which is:
"Unblocking processes blocked in system calls is not currently implemented on Mac OS X and Android. So you cannot connect to or interrupt a process blocked in a system call on Mac OS X or Android. "

See also coregrind/Makefile.am, which contains:
if VGCONF_OS_IS_LINUX
if VGCONF_PLATVARIANT_IS_ANDROID
vgdb_SOURCES += vgdb-invoker-none.c
else
vgdb_SOURCES += vgdb-invoker-ptrace.c
endif
endif

vgdb should be able to connect to a program that 'executes' something from time to time
(as the vgdb FIFO will be polled from time to time).
The command option
   --vgdb-poll=<number>      gdbserver poll max every <number> basic blocks [5000] 
can be used to change the polling frequency.

As far as I can see, the test has been done with 'sleep 1000', which means the process is
blocked in a syscall.
So, this is a known limitation on android.
It would be nice to update Valgrind on android so that it uses vgdb-invoker-ptrace.c instead
of vgdb-invoker-none.c.

Can you conform vgdb can properly connect to a process that executes some basic blocks from
time to time ?
Assuming this is working ok, keeping this bug as a 'wishbug' to port vgdb-invoker-ptrace.c on android
Comment 2 Ayberk Özgür 2014-09-11 14:10:27 UTC
Indeed, I was able to `vgdb instrumentation on` / `vgdb instrumentation off` the following: `valgrind --vgdb=yes --tool=callgrind ping google.com`.

However, vgdb still freezes when trying to send a message to an application with native components launched using the wrapping method described here: http://stackoverflow.com/questions/13531496/cant-run-a-java-android-program-with-valgrind/19235439#19235439. I even tried bypassing `logwrapper` and wrapping the app directly with `start_valgrind.sh`. Doesn't help. What's more, the app itself freezes too. I tried leaving it alone for about an hour thinking that it might be a very slow process, still no luck. 

My `start_valgrind.sh` looks like:

```
#!/system/bin/sh

PACKAGE="some.package.name"

# Callgrind tool
VGPARAMS='-v --instr-atstart=no --error-limit=no --trace-children=yes --log-file=/sdcard/valgrind.log.%p --tool=callgrind --callgrind-out-file=/sdcard/callgrind.out.%p --vgdb=yes'

export TMPDIR=/data/data/$PACKAGE
export USER=root
export HOSTNAME=t0lte

exec valgrind $VGPARAMS $* 
```

My `start_valgrind_profiling.sh` script looks like the following and I run it to launch the app itself:

```
#!/usr/bin/env bash

PACKAGE="some.package.name"

adb root

adb push start_valgrind.sh /data/local/
adb shell chmod 777 /data/local/start_valgrind.sh

adb shell setprop wrap.$PACKAGE "/data/local/start_valgrind.sh"

echo "wrap.$PACKAGE: $(adb shell getprop wrap.$PACKAGE)"

adb shell am force-stop $PACKAGE
adb shell am start -a android.intent.action.MAIN -n $PACKAGE/.MainActivityName

adb logcat -c
adb logcat

exit 0
```

I tried this with three apps, one is https://code.google.com/p/android-native-egl-example/ (with nothing but the WRITE_EXTERNAL_STORAGE permission added). 

Second is the same app but with a busy loop inserted into its native onStart function:

```
while(true){
	for(int i=0;i<9999999;i++){
		i += 4;
		LOG_INFO("%d",i);
	}
}
```

Third is a Qt app with a lot of Qt and OpenCV libraries that regularly gets and processes the camera image, so it does not wait for I/O most of the time. 

All three apps respond exactly the same to `vgdb instrumentation on` in every point of its lifetime. 

One strange thing is that the valgrind process belongs to some user named `u0_a77`. Does the valgrind process not belonging to `root` introduce a problem here?
Comment 3 Philippe Waroquiers 2014-09-11 19:59:20 UTC
It looks  that there are 2 different problems:
1. a known limitation of vgdb
2. a problem of having applications freezing (with and without using vgdb IIUC).

I do not know much about android so cannot help a lot about that
(and before being able to help, I should re-install android SDK and emulator).
Probably better to file another bug for the 2nd problem, unless it is really related
to vgdb.