Bug 513403 - Using relative paths for --suppressions and/or --log-file breaks --trace-children
Summary: Using relative paths for --suppressions and/or --log-file breaks --trace-chil...
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (other bugs)
Version First Reported In: 3.25.1
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-12-15 15:31 UTC by nickb
Modified: 2025-12-16 16:00 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description nickb 2025-12-15 15:31:40 UTC
SUMMARY
I was running Valgrind tests for PostgreSQL and encountered a weird behavior which I managed to reduce to the following:
When nvoking `pg_ctl` with:
```
valgrind ---trace-children=yes pg_ctl start -D ...
```
if we also supply `--suppressions` then depending on the path supplied being relative or absolute certain syscalls will or will not fail. The syscalls that do fail seem to be related to `exec(2)`. In my case both `execl` and `popen` fail with `exit code 1`.

note:
When starting postgres via pg_ctl we do need the `trace-children`: `pg_ctl` uses `exec` to start `postgres` binary which then uses `fork` to spawn multiple process, including backends that serve individual client connections. So the chain that breaks is `exec -> fork -> exec`

STEPS TO REPRODUCE
Here I build postgres from sources. The list of dependencies is relatively short. See https://wiki.postgresql.org/wiki/Compile_and_Install_from_source_code

```bash
#!/bin/env bash
# Build and initialize DB
mkdir -p /tmp/vg_repro
cd /tmp/vg_repro
git clone -b REL_18_STABLE --depth 1 --single-branch https://git.postgresql.org/git/postgresql.git
cd postgresql/
./configure --enable-cassert --enable-debug CFLAGS='-ggdb -Og -g3 -fno-omit-frame-pointer -std=c99' --prefix=/tmp/vg_repro/pgbin
make -s -j8 && make -s install
export PATH=/tmp/vg_repro/pgbin/bin:$PATH
initdb /tmp/vg_repro/pgdata --encoding=UTF8 --locale=C --no-sync
cd /tmp/vg_repro
pg_ctl -D /tmp/vg_repro/pgdata -l logfile start

# Create a table we'll use later
psql -p 5432 postgres $USER -AXqtc "create table x(a text);"

pg_ctl -D /tmp/vg_repro/pgdata stop

# The working variant with absolute path:
valgrind --leak-check=no  --suppressions=$(pwd)/postgresql/src/tools/valgrind.supp --time-stamp=yes  --trace-children=yes pg_ctl start -D /tmp/vg_repro/pgdata

# Invoke COPY FROM PROGRAM (runs popen(2))
psql -p 5432 postgres $USER -AXqtc "copy x from program '/bin/true'"
#OK

# Same thing with relative path:
valgrind --leak-check=no  --suppressions=postgresql/src/tools/valgrind.supp --time-stamp=yes  --trace-children=yes pg_ctl start -D /tmp/vg_repro/pgdata

psql -p 5432 postgres $USER -AXqtc "copy x from program '/bin/true'"
# ERROR:  program "/bin/true" failed
# DETAIL:  child process exited with exit code 1

# Don't forget to stop pg
pg_ctl -D /tmp/vg_repro/pgdata stop
```

OBSERVED RESULT
`exec`'d/`popen`'d target immediately exits with exit code 1.

EXPECTED RESULT
The target is executed normally regardless of absolute or relative paths being used in the arguments.

ADDITIONAL INFORMATION
This behavior was first spotted on CI running Ubuntu, I believe it should be present on most systems.
Comment 1 nickb 2025-12-15 19:27:19 UTC
Running valgrind with `-v` shows:
```
==00:00:00:00.027 33639== FATAL: can't open suppressions file "valgrind.supp"
```
Comment 2 Paul Floyd 2025-12-15 20:32:58 UTC
Reproduced the error.

For me, I see plenty of kevent syscall param errors on FreeBSD. Need to check that the wrapper is doing its job.

For the error itself, the problem is that cwd has changed to the pgdata directory.

-61399:0:    main Getting the working directory at startup
--61399:0:    main ... /tmp/vg_repro/pgdata
==61399== Memcheck, a memory error detector
==61399== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==61399== Using Valgrind-3.27.0.GIT and LibVEX; rerun with -h for copyright info
==61399== Command: /bin/sh -c /usr/bin/true
==61399== 
==61399== FATAL: can't open suppressions file "postgresql/src/tools/valgrind.supp"

Looking at the potgresql code I see several calls to ChangeToDataDir().

If it is doing a fork(), ChangeToDataDir(), exec() then that would explain what we are seeing.

I'm not sure that we can do much. Possibly change the suppressions path to absolute before any exec?
Comment 3 nickb 2025-12-16 16:00:43 UTC
I don't insist on my interpretation being correct, but I find it dubious that this behavior is (a) not clearly documented and (b) depends on the code executed in the child process.

When I invoke a cli-application I expect the application to treat any path I supplied  as relative to the `cwd` of the environment the application was called in. I imagine this behavior is a consequence of the way valgrind is implemented, but I think it shouldn't be too hard to convert any relative path into an absolute one before continuing execution.

That being said, I can imagine this kind of behavior being useful for the case when each executable has a `.supp` file in its `cwd` and we want to dynamically use those as we chain call through those. But it seems like a very special case and given this behavior isn't documented, I doubt anyone ever realized this was an option, let alone use it in any real context.

That being said, I'm not opposed to the idea of implementing the fix I've outlined above or documenting the current behavior if anyone is willing to review the fix/doc patch. Let me know.