Bug 396415 - Valgrind is not looking up $ORIGIN rpath of shebang programs
Summary: Valgrind is not looking up $ORIGIN rpath of shebang programs
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (other bugs)
Version First Reported In: 3.10.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Paul Floyd
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-11 17:53 UTC by Jack Zhao
Modified: 2025-02-07 06:34 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
repro case (993 bytes, application/gzip)
2018-07-13 00:33 UTC, Jack Zhao
Details
Patch for FreeBSD (1.72 KB, patch)
2025-02-05 11:21 UTC, Paul Floyd
Details
Patch for Linux (2.31 KB, patch)
2025-02-06 07:33 UTC, Paul Floyd
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jack Zhao 2018-07-11 17:53:02 UTC
> uname -a
Linux zanarkand 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

> valgrind --version
valgrind-3.10.0

-----

Steps to reproduce:


1. Write a bare-minimum library with something that can be included. Mine looked like this:

> cat foo.h
extern void doSomething(void);

> cat foo.cpp 
#include <iostream>

void doSomething(void) {
    std::cout << "did something" << std::endl;
}


2. Write a bare-minimum program that accepts a file as an argument and does something to the file. Make sure to include the library we just made and invoke its function. In my case I made it just print the file contents to stdout after invoking the doSomething function. See:

> cat main.cpp 
#include <iostream>
#include <fstream>
#include <string>
#include "foo.h"

int main(int argc, char* argv[]) {
    doSomething();
    if (argc != 2) {
        std::cerr << "one parameter, should be file path" << std::endl;
    }
    std::string line;
    std::ifstream myfile(argv[1]);
    if (myfile.is_open()) {
        while(getline(myfile, line)) {
            std::cout << line << std::endl;
        }
        myfile.close();
    }
}


3. Build the library as a shared object (i.e. libfoo.so)

4. Build the main program with the following rpath: $ORIGIN/../lib, see:

> readelf -d main

Dynamic section at offset 0x1de8 contains 28 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libfoo.so]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/../lib]
 0x000000000000000c (INIT)               0x400c90
 0x000000000000000d (FINI)               0x401104
 0x0000000000000019 (INIT_ARRAY)         0x601dc8
 0x000000000000001b (INIT_ARRAYSZ)       16 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x601dd8
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x400298
 0x0000000000000005 (STRTAB)             0x4005e8
 0x0000000000000006 (SYMTAB)             0x4002e8
 0x000000000000000a (STRSZ)              976 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x602000
 0x0000000000000002 (PLTRELSZ)           480 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400ab0
 0x0000000000000007 (RELA)               0x400a68
 0x0000000000000008 (RELASZ)             72 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x4009f8
 0x000000006fffffff (VERNEEDNUM)         3
 0x000000006ffffff0 (VERSYM)             0x4009b8
 0x0000000000000000 (NULL)               0x0

5. Put the library and main program in the following directories:

test_repro/lib/libfoo.so
test_repro/bin/main

6. Create the following file at the following location with a shebang line that points to main:

test_repro/script/subdir/myscript (the `subdir` is important here)

> cat myscript
#!/path/to/my/test_repro/bin/main
lorem ipsum

7. Make the 'myscript' file executable, and run it. It should work - main gets invoked as the program to feed the file into, and main is able to load in libfoo.so.

8. Run myscript through valgrind (i.e. `valgrind ./myscript`) and observe that it fails to load libfoo.so

-----

Note that:

 - issue does not happen when running `valgrind /path/to/my/test_repro/bin/main myscript` (i.e. when execution does not depend on shebang)

 - issue does not happen when rpath contains an absolute path to /path/to/my/test_repro/lib, only when rpath relies on $ORIGIN

 - issue does not happen if I put `myscript` into `test_repro/script` instead of `test_repro/script/subdir`. This suggests that $ORIGIN path is processed but treated as path relative to `myscript` instead of path relative to the shebang program `main`.


Anyway, thanks!
Comment 1 Tom Hughes 2018-07-11 18:05:38 UTC
I don't think we have any code which does searching for dynamic libraries - we just load the interpreter (ie ld.so) and then let that do the actual dynamic linking.

So this must be some environmental difference - either literally in the environment variables, or in the auxilliary vector or something.
Comment 2 Jack Zhao 2018-07-13 00:33:12 UTC
Hi, thanks for the response. For what it's worth, I was able to reproduce this with a barebones centos7 container. Please see log here: https://pastebin.com/raw/VbvL0khy

Steps I took:

1. run a container of the centos:7 official image available from docker.io

    sudo docker run -i -t centos:7 /bin/bash

2. once in container, install gcc, gcc-c++, and valgrind

    yum install -y gcc gcc-c++ valgrind

3. download the attached repro case from some available server (or use any method to get the repro case into the container)

    curl http://<some_server>/test_repro.tar.gz

4. unpack the repro case

    tar xvzf test_repro.tar.gz

5. cd into the test_repro folder

    cd test_repro

6. run repro.sh

    ./repro.sh

The test_repro.tar.gz will be attached right after.

So if it's an environment-induced problem, it seems to happen in default/typical environments. Any guidance on what kind of environment settings might be causing it? Thanks!
Comment 3 Jack Zhao 2018-07-13 00:33:40 UTC
Created attachment 113911 [details]
repro case
Comment 4 Geoffrey Thomas 2025-01-30 00:00:36 UTC
I ran into this today myself and took a look at the code. The root cause is simple - valgrind virtualizes lookups of "/proc/self/exe" and repoints them at "/proc/self/fd/%d" for the open file handle of the program being instrumented. But, for a script, this file handle points to the script itself, not to the binary interpreting the script. Because the glibc dynamic linker uses readlink("/proc/self/exe") (or actually readlinkat) to figure out where to calculate $ORIGIN, this gets you the wrong result if your script and interpreter are not in the same directory.

I think the fix is also conceptually simple. We already know the path to the interpreter. It's the interp_name field in the ExeInfo, because valgrind had to go look up the binary already to start running the script. We just need to open that file and keep a file descriptor around. (I suspect it's safe to change the existing cl_exec_fd to point to the interpreter instead of the script - the only uses I see of it are things analogous to reading /proc/self/exe.) This is probably easier for someone familiar with the code, but I'm happy to try my hand at a patch if it's helpful.
Comment 5 Paul Floyd 2025-01-30 14:16:29 UTC
I'll see if this is a problem on FreeBSD. /proc is optional so accessing /proc/self/exe won't be a problem. There may be an equivalent sysctl that is used.
Comment 6 Geoffrey Thomas 2025-01-31 19:01:15 UTC
Yeah, looks like the same problem happens on FreeBSD. Here's a repro (I used the 15.0-CURRENT AMI on AWS):

root@freebsd:~ # pkg install valgrind
root@freebsd:~ # mkdir orig
root@freebsd:~ # cd orig
root@freebsd:~/orig # cat main.c
extern void stuff(void);

int main(void) {
	stuff();
}
root@freebsd:~/orig # cat stuff.c
#include <stdio.h>

void stuff(void) {
	printf("Hello world!\n");
}
root@freebsd:~/orig # cc -shared -o libstuff.so stuff.c
root@freebsd:~/orig # cc -L. -o main main.c -lstuff -Wl,-rpath,'$ORIGIN'
root@freebsd:~/orig # cd ..
root@freebsd:~ # mkdir link
root@freebsd:~/link # ln -s ../orig/main .
root@freebsd:~/link # echo '#!/root/link/main' > script
root@freebsd:~/link # chmod +x script
root@freebsd:~/link # ./script
Hello world!
root@freebsd:~/link # valgrind --tool=none -q ./script
ld-elf.so.1: Shared object "libstuff.so" not found, required by "script"
root@freebsd:~/link # valgrind --tool=none -q ./main
Hello world!

I agree that this is probably one of the sysctls instead of /proc/self/exe. Valgrind does some similar virtualization there.
Comment 7 Paul Floyd 2025-02-04 07:35:09 UTC
Just had a quick look on FreeBSD. It looks like it's just using __realpathat on /path/link/main which, being a link, resolves to /path/orig/main, and then getdirentries. Under Valgrind it's calling  _realpathat on /path/link/script which isn't a link so the directory remains /path/link and not /path/orig.
Comment 8 Paul Floyd 2025-02-04 21:17:34 UTC
Looking some more, on FreeBSD t looks like we're setting AT_EXECPATH to the script name as follows

   const HChar *exe_name = VG_(find_executable)(VG_(args_the_exename));
   HChar resolved_name[VKI_PATH_MAX];
   VG_(realpath)(exe_name, resolved_name);

and that's causing ld.so to get $ORIGIN wrong.
Comment 9 Paul Floyd 2025-02-05 11:21:54 UTC
Created attachment 177987 [details]
Patch for FreeBSD

Initial patch for FrfeeBSD. I also have a test (based on the existing auxv test), not included in this patch.
Comment 10 Paul Floyd 2025-02-05 12:20:17 UTC
And fixing AT_EXECFN doesn't help on Linux (though we should probably do that one day).

I think that we need to resolve the linked path before opening VG_(cl_exec_fd).
Comment 11 Paul Floyd 2025-02-06 07:13:57 UTC
After a bit of trying to battle with load_client() where we were VG_(open)'ing "exe_name" (which could be a script with/without a shebang, and the shebang could refer to another script [which could repeat]) I've come to the conclusion that load_client() is simply the wrong place to set VG_(cl_exec_fd).

By that point we've already done all the hard work of calling VG_(do_exec) and the potentially recursive calls to VG_(load_script). Eventually this must have handled the real binary file in VG_(load_ELF) [or VG_(load_macho)]. So I think that a better solution is to just call VG_(dup) on the already opened binary file fs at the end of VG_(load_ELF) or macho.

In the light of that, I also want to rethink the handling of auxv AT_EXCEPATH.
Comment 12 Paul Floyd 2025-02-06 07:33:50 UTC
Created attachment 178008 [details]
Patch for Linux
Comment 13 Paul Floyd 2025-02-06 20:05:32 UTC
In the end, no need to rework FreeBSD.

commit 1fefba79021779d840bbf8cebc43e40c74b40f31 (HEAD -> master, origin/users/paulf/try-bug396415, origin/master, origin/HEAD, bug396415)
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Thu Feb 6 20:23:42 2025 +0100

    Bug 396415 - Valgrind is not looking up $ORIGIN rpath of shebang programs
Comment 14 Geoffrey Thomas 2025-02-06 21:02:54 UTC
Compiled from Git and it appears to work the way I expect on Linux. Thanks!

# ../valgrind/vg-in-place --tool=none -q ./script
Hello world!
Comment 15 Paul Floyd 2025-02-07 06:34:58 UTC
(In reply to Geoffrey Thomas from comment #14)
> Compiled from Git and it appears to work the way I expect on Linux. Thanks!
> 
> # ../valgrind/vg-in-place --tool=none -q ./script
> Hello world!

Just noticed your domain name. Nice!

In addition to the error with $ORIGIN there was also a problem correctly attaching vgdb with scripts. Before this fix I got (with the none/tests/scripts/relative1 script)

warning: "/home/paulf/scratch/valgrind/none/tests/scripts/relative1": not in executable format: file format not recognized
warning: `/home/paulf/scratch/valgrind/none/tests/scripts/relative1': can't read symbols: file format not recognized.

gdb is trying to open the script file, not the shell. Now I get

4              mov     x19, x0         /* Put ps_strings in a callee-saved register */
(gdb)