SUMMARY I was running Valgrind tests for PostgreSQL and encountered a weird behavior which I managed to reduce to the following: When nvoking `pg_ctl` with: ``` valgrind ---trace-children=yes pg_ctl start -D ... ``` if we also supply `--suppressions` then depending on the path supplied being relative or absolute certain syscalls will or will not fail. The syscalls that do fail seem to be related to `exec(2)`. In my case both `execl` and `popen` fail with `exit code 1`. note: When starting postgres via pg_ctl we do need the `trace-children`: `pg_ctl` uses `exec` to start `postgres` binary which then uses `fork` to spawn multiple process, including backends that serve individual client connections. So the chain that breaks is `exec -> fork -> exec` STEPS TO REPRODUCE Here I build postgres from sources. The list of dependencies is relatively short. See https://wiki.postgresql.org/wiki/Compile_and_Install_from_source_code ```bash #!/bin/env bash # Build and initialize DB mkdir -p /tmp/vg_repro cd /tmp/vg_repro git clone -b REL_18_STABLE --depth 1 --single-branch https://git.postgresql.org/git/postgresql.git cd postgresql/ ./configure --enable-cassert --enable-debug CFLAGS='-ggdb -Og -g3 -fno-omit-frame-pointer -std=c99' --prefix=/tmp/vg_repro/pgbin make -s -j8 && make -s install export PATH=/tmp/vg_repro/pgbin/bin:$PATH initdb /tmp/vg_repro/pgdata --encoding=UTF8 --locale=C --no-sync cd /tmp/vg_repro pg_ctl -D /tmp/vg_repro/pgdata -l logfile start # Create a table we'll use later psql -p 5432 postgres $USER -AXqtc "create table x(a text);" pg_ctl -D /tmp/vg_repro/pgdata stop # The working variant with absolute path: valgrind --leak-check=no --suppressions=$(pwd)/postgresql/src/tools/valgrind.supp --time-stamp=yes --trace-children=yes pg_ctl start -D /tmp/vg_repro/pgdata # Invoke COPY FROM PROGRAM (runs popen(2)) psql -p 5432 postgres $USER -AXqtc "copy x from program '/bin/true'" #OK # Same thing with relative path: valgrind --leak-check=no --suppressions=postgresql/src/tools/valgrind.supp --time-stamp=yes --trace-children=yes pg_ctl start -D /tmp/vg_repro/pgdata psql -p 5432 postgres $USER -AXqtc "copy x from program '/bin/true'" # ERROR: program "/bin/true" failed # DETAIL: child process exited with exit code 1 # Don't forget to stop pg pg_ctl -D /tmp/vg_repro/pgdata stop ``` OBSERVED RESULT `exec`'d/`popen`'d target immediately exits with exit code 1. EXPECTED RESULT The target is executed normally regardless of absolute or relative paths being used in the arguments. ADDITIONAL INFORMATION This behavior was first spotted on CI running Ubuntu, I believe it should be present on most systems.
Running valgrind with `-v` shows: ``` ==00:00:00:00.027 33639== FATAL: can't open suppressions file "valgrind.supp" ```
Reproduced the error. For me, I see plenty of kevent syscall param errors on FreeBSD. Need to check that the wrapper is doing its job. For the error itself, the problem is that cwd has changed to the pgdata directory. -61399:0: main Getting the working directory at startup --61399:0: main ... /tmp/vg_repro/pgdata ==61399== Memcheck, a memory error detector ==61399== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al. ==61399== Using Valgrind-3.27.0.GIT and LibVEX; rerun with -h for copyright info ==61399== Command: /bin/sh -c /usr/bin/true ==61399== ==61399== FATAL: can't open suppressions file "postgresql/src/tools/valgrind.supp" Looking at the potgresql code I see several calls to ChangeToDataDir(). If it is doing a fork(), ChangeToDataDir(), exec() then that would explain what we are seeing. I'm not sure that we can do much. Possibly change the suppressions path to absolute before any exec?
I don't insist on my interpretation being correct, but I find it dubious that this behavior is (a) not clearly documented and (b) depends on the code executed in the child process. When I invoke a cli-application I expect the application to treat any path I supplied as relative to the `cwd` of the environment the application was called in. I imagine this behavior is a consequence of the way valgrind is implemented, but I think it shouldn't be too hard to convert any relative path into an absolute one before continuing execution. That being said, I can imagine this kind of behavior being useful for the case when each executable has a `.supp` file in its `cwd` and we want to dynamically use those as we chain call through those. But it seems like a very special case and given this behavior isn't documented, I doubt anyone ever realized this was an option, let alone use it in any real context. That being said, I'm not opposed to the idea of implementing the fix I've outlined above or documenting the current behavior if anyone is willing to review the fix/doc patch. Let me know.