> uname -a Linux zanarkand 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux > valgrind --version valgrind-3.10.0 ----- Steps to reproduce: 1. Write a bare-minimum library with something that can be included. Mine looked like this: > cat foo.h extern void doSomething(void); > cat foo.cpp #include <iostream> void doSomething(void) { std::cout << "did something" << std::endl; } 2. Write a bare-minimum program that accepts a file as an argument and does something to the file. Make sure to include the library we just made and invoke its function. In my case I made it just print the file contents to stdout after invoking the doSomething function. See: > cat main.cpp #include <iostream> #include <fstream> #include <string> #include "foo.h" int main(int argc, char* argv[]) { doSomething(); if (argc != 2) { std::cerr << "one parameter, should be file path" << std::endl; } std::string line; std::ifstream myfile(argv[1]); if (myfile.is_open()) { while(getline(myfile, line)) { std::cout << line << std::endl; } myfile.close(); } } 3. Build the library as a shared object (i.e. libfoo.so) 4. Build the main program with the following rpath: $ORIGIN/../lib, see: > readelf -d main Dynamic section at offset 0x1de8 contains 28 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libfoo.so] 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 0x000000000000000f (RPATH) Library rpath: [$ORIGIN/../lib] 0x000000000000000c (INIT) 0x400c90 0x000000000000000d (FINI) 0x401104 0x0000000000000019 (INIT_ARRAY) 0x601dc8 0x000000000000001b (INIT_ARRAYSZ) 16 (bytes) 0x000000000000001a (FINI_ARRAY) 0x601dd8 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 0x000000006ffffef5 (GNU_HASH) 0x400298 0x0000000000000005 (STRTAB) 0x4005e8 0x0000000000000006 (SYMTAB) 0x4002e8 0x000000000000000a (STRSZ) 976 (bytes) 0x000000000000000b (SYMENT) 24 (bytes) 0x0000000000000015 (DEBUG) 0x0 0x0000000000000003 (PLTGOT) 0x602000 0x0000000000000002 (PLTRELSZ) 480 (bytes) 0x0000000000000014 (PLTREL) RELA 0x0000000000000017 (JMPREL) 0x400ab0 0x0000000000000007 (RELA) 0x400a68 0x0000000000000008 (RELASZ) 72 (bytes) 0x0000000000000009 (RELAENT) 24 (bytes) 0x000000006ffffffe (VERNEED) 0x4009f8 0x000000006fffffff (VERNEEDNUM) 3 0x000000006ffffff0 (VERSYM) 0x4009b8 0x0000000000000000 (NULL) 0x0 5. Put the library and main program in the following directories: test_repro/lib/libfoo.so test_repro/bin/main 6. Create the following file at the following location with a shebang line that points to main: test_repro/script/subdir/myscript (the `subdir` is important here) > cat myscript #!/path/to/my/test_repro/bin/main lorem ipsum 7. Make the 'myscript' file executable, and run it. It should work - main gets invoked as the program to feed the file into, and main is able to load in libfoo.so. 8. Run myscript through valgrind (i.e. `valgrind ./myscript`) and observe that it fails to load libfoo.so ----- Note that: - issue does not happen when running `valgrind /path/to/my/test_repro/bin/main myscript` (i.e. when execution does not depend on shebang) - issue does not happen when rpath contains an absolute path to /path/to/my/test_repro/lib, only when rpath relies on $ORIGIN - issue does not happen if I put `myscript` into `test_repro/script` instead of `test_repro/script/subdir`. This suggests that $ORIGIN path is processed but treated as path relative to `myscript` instead of path relative to the shebang program `main`. Anyway, thanks!
I don't think we have any code which does searching for dynamic libraries - we just load the interpreter (ie ld.so) and then let that do the actual dynamic linking. So this must be some environmental difference - either literally in the environment variables, or in the auxilliary vector or something.
Hi, thanks for the response. For what it's worth, I was able to reproduce this with a barebones centos7 container. Please see log here: https://pastebin.com/raw/VbvL0khy Steps I took: 1. run a container of the centos:7 official image available from docker.io sudo docker run -i -t centos:7 /bin/bash 2. once in container, install gcc, gcc-c++, and valgrind yum install -y gcc gcc-c++ valgrind 3. download the attached repro case from some available server (or use any method to get the repro case into the container) curl http://<some_server>/test_repro.tar.gz 4. unpack the repro case tar xvzf test_repro.tar.gz 5. cd into the test_repro folder cd test_repro 6. run repro.sh ./repro.sh The test_repro.tar.gz will be attached right after. So if it's an environment-induced problem, it seems to happen in default/typical environments. Any guidance on what kind of environment settings might be causing it? Thanks!
Created attachment 113911 [details] repro case
I ran into this today myself and took a look at the code. The root cause is simple - valgrind virtualizes lookups of "/proc/self/exe" and repoints them at "/proc/self/fd/%d" for the open file handle of the program being instrumented. But, for a script, this file handle points to the script itself, not to the binary interpreting the script. Because the glibc dynamic linker uses readlink("/proc/self/exe") (or actually readlinkat) to figure out where to calculate $ORIGIN, this gets you the wrong result if your script and interpreter are not in the same directory. I think the fix is also conceptually simple. We already know the path to the interpreter. It's the interp_name field in the ExeInfo, because valgrind had to go look up the binary already to start running the script. We just need to open that file and keep a file descriptor around. (I suspect it's safe to change the existing cl_exec_fd to point to the interpreter instead of the script - the only uses I see of it are things analogous to reading /proc/self/exe.) This is probably easier for someone familiar with the code, but I'm happy to try my hand at a patch if it's helpful.
I'll see if this is a problem on FreeBSD. /proc is optional so accessing /proc/self/exe won't be a problem. There may be an equivalent sysctl that is used.
Yeah, looks like the same problem happens on FreeBSD. Here's a repro (I used the 15.0-CURRENT AMI on AWS): root@freebsd:~ # pkg install valgrind root@freebsd:~ # mkdir orig root@freebsd:~ # cd orig root@freebsd:~/orig # cat main.c extern void stuff(void); int main(void) { stuff(); } root@freebsd:~/orig # cat stuff.c #include <stdio.h> void stuff(void) { printf("Hello world!\n"); } root@freebsd:~/orig # cc -shared -o libstuff.so stuff.c root@freebsd:~/orig # cc -L. -o main main.c -lstuff -Wl,-rpath,'$ORIGIN' root@freebsd:~/orig # cd .. root@freebsd:~ # mkdir link root@freebsd:~/link # ln -s ../orig/main . root@freebsd:~/link # echo '#!/root/link/main' > script root@freebsd:~/link # chmod +x script root@freebsd:~/link # ./script Hello world! root@freebsd:~/link # valgrind --tool=none -q ./script ld-elf.so.1: Shared object "libstuff.so" not found, required by "script" root@freebsd:~/link # valgrind --tool=none -q ./main Hello world! I agree that this is probably one of the sysctls instead of /proc/self/exe. Valgrind does some similar virtualization there.
Just had a quick look on FreeBSD. It looks like it's just using __realpathat on /path/link/main which, being a link, resolves to /path/orig/main, and then getdirentries. Under Valgrind it's calling _realpathat on /path/link/script which isn't a link so the directory remains /path/link and not /path/orig.
Looking some more, on FreeBSD t looks like we're setting AT_EXECPATH to the script name as follows const HChar *exe_name = VG_(find_executable)(VG_(args_the_exename)); HChar resolved_name[VKI_PATH_MAX]; VG_(realpath)(exe_name, resolved_name); and that's causing ld.so to get $ORIGIN wrong.
Created attachment 177987 [details] Patch for FreeBSD Initial patch for FrfeeBSD. I also have a test (based on the existing auxv test), not included in this patch.
And fixing AT_EXECFN doesn't help on Linux (though we should probably do that one day). I think that we need to resolve the linked path before opening VG_(cl_exec_fd).
After a bit of trying to battle with load_client() where we were VG_(open)'ing "exe_name" (which could be a script with/without a shebang, and the shebang could refer to another script [which could repeat]) I've come to the conclusion that load_client() is simply the wrong place to set VG_(cl_exec_fd). By that point we've already done all the hard work of calling VG_(do_exec) and the potentially recursive calls to VG_(load_script). Eventually this must have handled the real binary file in VG_(load_ELF) [or VG_(load_macho)]. So I think that a better solution is to just call VG_(dup) on the already opened binary file fs at the end of VG_(load_ELF) or macho. In the light of that, I also want to rethink the handling of auxv AT_EXCEPATH.
Created attachment 178008 [details] Patch for Linux
In the end, no need to rework FreeBSD. commit 1fefba79021779d840bbf8cebc43e40c74b40f31 (HEAD -> master, origin/users/paulf/try-bug396415, origin/master, origin/HEAD, bug396415) Author: Paul Floyd <pjfloyd@wanadoo.fr> Date: Thu Feb 6 20:23:42 2025 +0100 Bug 396415 - Valgrind is not looking up $ORIGIN rpath of shebang programs
Compiled from Git and it appears to work the way I expect on Linux. Thanks! # ../valgrind/vg-in-place --tool=none -q ./script Hello world!
(In reply to Geoffrey Thomas from comment #14) > Compiled from Git and it appears to work the way I expect on Linux. Thanks! > > # ../valgrind/vg-in-place --tool=none -q ./script > Hello world! Just noticed your domain name. Nice! In addition to the error with $ORIGIN there was also a problem correctly attaching vgdb with scripts. Before this fix I got (with the none/tests/scripts/relative1 script) warning: "/home/paulf/scratch/valgrind/none/tests/scripts/relative1": not in executable format: file format not recognized warning: `/home/paulf/scratch/valgrind/none/tests/scripts/relative1': can't read symbols: file format not recognized. gdb is trying to open the script file, not the shell. Now I get 4 mov x19, x0 /* Put ps_strings in a callee-saved register */ (gdb)