| Summary: | Valgrind is not looking up $ORIGIN rpath of shebang programs | ||
|---|---|---|---|
| Product: | [Developer tools] valgrind | Reporter: | Jack Zhao <jack.zhao.fdel> |
| Component: | general | Assignee: | Paul Floyd <pjfloyd> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | geofft, pjfloyd, tom |
| Priority: | NOR | ||
| Version First Reported In: | 3.10.0 | ||
| Target Milestone: | --- | ||
| Platform: | Other | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: |
repro case
Patch for FreeBSD Patch for Linux |
||
|
Description
Jack Zhao
2018-07-11 17:53:02 UTC
I don't think we have any code which does searching for dynamic libraries - we just load the interpreter (ie ld.so) and then let that do the actual dynamic linking. So this must be some environmental difference - either literally in the environment variables, or in the auxilliary vector or something. Hi, thanks for the response. For what it's worth, I was able to reproduce this with a barebones centos7 container. Please see log here: https://pastebin.com/raw/VbvL0khy Steps I took: 1. run a container of the centos:7 official image available from docker.io sudo docker run -i -t centos:7 /bin/bash 2. once in container, install gcc, gcc-c++, and valgrind yum install -y gcc gcc-c++ valgrind 3. download the attached repro case from some available server (or use any method to get the repro case into the container) curl http://<some_server>/test_repro.tar.gz 4. unpack the repro case tar xvzf test_repro.tar.gz 5. cd into the test_repro folder cd test_repro 6. run repro.sh ./repro.sh The test_repro.tar.gz will be attached right after. So if it's an environment-induced problem, it seems to happen in default/typical environments. Any guidance on what kind of environment settings might be causing it? Thanks! Created attachment 113911 [details]
repro case
I ran into this today myself and took a look at the code. The root cause is simple - valgrind virtualizes lookups of "/proc/self/exe" and repoints them at "/proc/self/fd/%d" for the open file handle of the program being instrumented. But, for a script, this file handle points to the script itself, not to the binary interpreting the script. Because the glibc dynamic linker uses readlink("/proc/self/exe") (or actually readlinkat) to figure out where to calculate $ORIGIN, this gets you the wrong result if your script and interpreter are not in the same directory.
I think the fix is also conceptually simple. We already know the path to the interpreter. It's the interp_name field in the ExeInfo, because valgrind had to go look up the binary already to start running the script. We just need to open that file and keep a file descriptor around. (I suspect it's safe to change the existing cl_exec_fd to point to the interpreter instead of the script - the only uses I see of it are things analogous to reading /proc/self/exe.) This is probably easier for someone familiar with the code, but I'm happy to try my hand at a patch if it's helpful.
I'll see if this is a problem on FreeBSD. /proc is optional so accessing /proc/self/exe won't be a problem. There may be an equivalent sysctl that is used. Yeah, looks like the same problem happens on FreeBSD. Here's a repro (I used the 15.0-CURRENT AMI on AWS):
root@freebsd:~ # pkg install valgrind
root@freebsd:~ # mkdir orig
root@freebsd:~ # cd orig
root@freebsd:~/orig # cat main.c
extern void stuff(void);
int main(void) {
stuff();
}
root@freebsd:~/orig # cat stuff.c
#include <stdio.h>
void stuff(void) {
printf("Hello world!\n");
}
root@freebsd:~/orig # cc -shared -o libstuff.so stuff.c
root@freebsd:~/orig # cc -L. -o main main.c -lstuff -Wl,-rpath,'$ORIGIN'
root@freebsd:~/orig # cd ..
root@freebsd:~ # mkdir link
root@freebsd:~/link # ln -s ../orig/main .
root@freebsd:~/link # echo '#!/root/link/main' > script
root@freebsd:~/link # chmod +x script
root@freebsd:~/link # ./script
Hello world!
root@freebsd:~/link # valgrind --tool=none -q ./script
ld-elf.so.1: Shared object "libstuff.so" not found, required by "script"
root@freebsd:~/link # valgrind --tool=none -q ./main
Hello world!
I agree that this is probably one of the sysctls instead of /proc/self/exe. Valgrind does some similar virtualization there.
Just had a quick look on FreeBSD. It looks like it's just using __realpathat on /path/link/main which, being a link, resolves to /path/orig/main, and then getdirentries. Under Valgrind it's calling _realpathat on /path/link/script which isn't a link so the directory remains /path/link and not /path/orig. Looking some more, on FreeBSD t looks like we're setting AT_EXECPATH to the script name as follows const HChar *exe_name = VG_(find_executable)(VG_(args_the_exename)); HChar resolved_name[VKI_PATH_MAX]; VG_(realpath)(exe_name, resolved_name); and that's causing ld.so to get $ORIGIN wrong. Created attachment 177987 [details]
Patch for FreeBSD
Initial patch for FrfeeBSD. I also have a test (based on the existing auxv test), not included in this patch.
And fixing AT_EXECFN doesn't help on Linux (though we should probably do that one day). I think that we need to resolve the linked path before opening VG_(cl_exec_fd). After a bit of trying to battle with load_client() where we were VG_(open)'ing "exe_name" (which could be a script with/without a shebang, and the shebang could refer to another script [which could repeat]) I've come to the conclusion that load_client() is simply the wrong place to set VG_(cl_exec_fd). By that point we've already done all the hard work of calling VG_(do_exec) and the potentially recursive calls to VG_(load_script). Eventually this must have handled the real binary file in VG_(load_ELF) [or VG_(load_macho)]. So I think that a better solution is to just call VG_(dup) on the already opened binary file fs at the end of VG_(load_ELF) or macho. In the light of that, I also want to rethink the handling of auxv AT_EXCEPATH. Created attachment 178008 [details]
Patch for Linux
In the end, no need to rework FreeBSD. commit 1fefba79021779d840bbf8cebc43e40c74b40f31 (HEAD -> master, origin/users/paulf/try-bug396415, origin/master, origin/HEAD, bug396415) Author: Paul Floyd <pjfloyd@wanadoo.fr> Date: Thu Feb 6 20:23:42 2025 +0100 Bug 396415 - Valgrind is not looking up $ORIGIN rpath of shebang programs Compiled from Git and it appears to work the way I expect on Linux. Thanks! # ../valgrind/vg-in-place --tool=none -q ./script Hello world! (In reply to Geoffrey Thomas from comment #14) > Compiled from Git and it appears to work the way I expect on Linux. Thanks! > > # ../valgrind/vg-in-place --tool=none -q ./script > Hello world! Just noticed your domain name. Nice! In addition to the error with $ORIGIN there was also a problem correctly attaching vgdb with scripts. Before this fix I got (with the none/tests/scripts/relative1 script) warning: "/home/paulf/scratch/valgrind/none/tests/scripts/relative1": not in executable format: file format not recognized warning: `/home/paulf/scratch/valgrind/none/tests/scripts/relative1': can't read symbols: file format not recognized. gdb is trying to open the script file, not the shell. Now I get 4 mov x19, x0 /* Put ps_strings in a callee-saved register */ (gdb) |