119404 – executing ssh from inside valgrind fails

Bug 119404 - executing ssh from inside valgrind fails

Summary: executing ssh from inside valgrind fails

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	general (show other bugs)
Version:	3.1.0
Platform:	Compiled Sources Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-01-02 17:29 UTC by Patrick Ohly
Modified:	2009-07-01 09:02 UTC (History)
CC List:	2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Patrick Ohly 2006-01-02 17:29:48 UTC

I have a program which fork/exec's ssh. With valgrind 3.0.0 I could
run that program under valgrind control and ssh would be executed just
fine. With 3.1.0 I get a permission denied error for ssh.

I know that valgrind does not support suid programs, but I do not
want to debug the suid ssh itself, only a normal program invoking
it. I tried with --trace-children on and off, did not make a difference.

Here's a test case and system information:

$ uname -a
Linux knscsl010 2.4.9-e.57smp #1 SMP Thu Dec 2 20:51:12 EST 2004 i686 unknown
$ cat /etc/redhat-release 
Red Hat Linux Advanced Server release 2.1AS (Pensacola)
$ cat do_ssh 
#!/bin/sh

ssh "$@"

$ valgrind --version
valgrind-3.1.0
$ valgrind ./do_ssh 
==27744== Memcheck, a memory error detector.
==27744== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==27744== Using LibVEX rev 1471, a library for dynamic binary translation.
==27744== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==27744== Using valgrind-3.1.0, a dynamic binary instrumentation framework.
==27744== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==27744== For more details, rerun with: -v
==27744== 
./do_ssh: /usr/bin/ssh: Permission denied
==27746== 
==27746== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 17 from 2)
==27746== malloc/free: in use at exit: 0 bytes in 0 blocks.
==27746== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==27746== For counts of detected errors, rerun with: -v
==27746== No malloc'd blocks -- no leaks are possible.
==27744== 
==27744== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 17 from 2)
==27744== malloc/free: in use at exit: 0 bytes in 0 blocks.
==27744== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==27744== For counts of detected errors, rerun with: -v
==27744== No malloc'd blocks -- no leaks are possible.

When I look at the strace output, I see:
...
27862 stat64("/bin/ssh", 0xbefebb80)    = -1 ENOENT (No such file or directory)
27862 stat64("/usr/bin/ssh", {st_mode=S_IFREG|S_ISUID|0755, st_size=220580,
...}) = 0
27862 getgroups32(32, [33500, 33531, 33528]) = 3
27862 stat64("/usr/bin/ssh", {st_mode=S_IFREG|S_ISUID|0755, st_size=220580,
...}) = 0
27862 mmap2(0x80da000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0, 0) = 0x80da000
27862 mmap2(0x80db000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0, 0) = 0x80db000
27862 rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
27862 rt_sigtimedwait(~[INT CHLD], 0x62662dd4) = -1 EAGAIN (Resource temporarily
unavailable)
27862 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
27862 rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
27862 fork()                            = 27864
27864 --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
27862 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], 
<unfinished ...>
27864 gettid( <unfinished ...>
...
27864 write(2, "./do_ssh: /usr/bin/ssh: Permissi"..., 42) = 42

I would post the whole strace, but it is kind of large and might
contain private information. Please let me know if you need further
information.

Comment 1 Julian Seward 2006-01-02 17:37:12 UTC

There was recently a bug in VG_(getgroups) which caused process
starts to fail.  Can you check out the trunk and try that?
(easy; http://www.valgrind.org/downloads/repository.html has the details)

Comment 2 Patrick Ohly 2006-01-02 18:59:28 UTC

I could reproduce this problem on Debian 3.0, but only by manually
setting the suid bit on /usr/bin/ssh (normally it doesn't have that
set).

I also found a workaround: when executing a wrapper script instead
of ssh directly in the program under valgrind control,
then everything works unless I add --trace-children=yes.

Comment 3 Patrick Ohly 2006-01-02 19:10:05 UTC

I tried the trunk and it still fails.

Comment 4 Julian Seward 2006-02-14 16:42:04 UTC

As you have a workaround I'd like to close this.  Ok ?

Comment 5 Patrick Ohly 2006-02-20 14:47:37 UTC

I'd prefer to get this fixed properly because the workaround would
have to be installed for all future releases of valgrind, and other
people might also run into it.

After looking at the source it seems that check_executable in 
coregrind/m_libcfile.c is overly pessimistic about its ability to run
the executable:
it checks for SUID/SGID and flatly refuses to run such binaries even
if they are not meant to be run under valgrind control, which in turn
causes pre_exec_check and thus sys_execve to fail.

I inserted a "return 0" at the top of check_executable and verified
that this hacked valgrind executes my ssh with suid bit without
any problems.

How valid is the concern that SUID/SGID programs cannot be executed?
With the hack above I can run ssh under valgrind control (I get some
uninitialized memory reads; the ssh login into the remote machine works).

Can the check be changed so that it allows them at least when they
are not to be traced? The information whether the child is to be
traced is available in sys_execve, but I am not sure whether this
is the only relevant place and whether adding another parameter to
pre_exec_check and sys_execve would be the right thing to do.

Comment 6 Tom Hughes 2006-02-20 15:56:55 UTC

I'm not entirely sure why the exec wrapper is checking this - if the SUID bit is set and valgrind is controlling the child then the new valgrind instance that gets execed will refuse to load the child program anyway.

Comment 7 Patrick Ohly 2006-02-27 09:01:28 UTC

I'm not really qualified to answer Tom's question, but for the sake
of moving this discussion along I'd like to point out that a comment
in execve says that the check is there because valgrind would get
into trouble recovering from the failure if loading the child failed.
Whether that still is a valid concern I don't know.

Some more thoughts: does valgrind still use LD_PRELOAD techniques?
That definitely would fail with suid binaries, but as I said, with
current valgrind I could analyze a suid binary just fine.

On the other hand, further analysis shows that although a suid binary
can be analyzed, that takes place without the suid bit having any effect
(copy /usr/bin/id, set suid -> euid root without valgrind, no root
with valgrind).

So here's how I suggest to handle this:
- let exec run suid binaries
- if that means that they do not run under the intended user and/or group,
  print a warning refering the user to documentation which helps him
  sort out the problem
The solution were running a suid binary simply fails with EACCESS is
a lot harder to understand (okay, perhaps printing a warning isn't possible
there - I don't know).

Comment 8 Jacob Burckhardt 2006-05-12 02:53:41 UTC

I encountered the same problem of having exec return that strange
error.  So I would also like to request a warning about this.  

An older version of valgrind may have printed such a warning:

http://valgrind.org/docs/manual/dist.news.html

says that valgrind 2.0.0 had this change:

> - Don't fail silently if the executable is statically linked, or is
>   setuid/setgid. Print an error message instead.

Was that error message accidentally removed from valgrind?

Also, I suggest mentioning the setuid restriction in the manual where
it describes --trace-children.

Comment 9 R. Greayer 2006-10-13 23:16:17 UTC

Note that the suid/sgid checks don't take into account whether the current effective user id/group id match the file to be execed. I.e. the check, as it is, is unduly restrictive: valgrind refuses to run an executable that is SUID, even if the SUID would have no effect on privileges. For example, the Oracle OCI libraries exec the $ORACLE_HOME/bin/oracle application (which is SUID) in order to communicate to a local database. Even if the client program is run as the same user (e.g. oracle) as the owner of the executable, valgrind will refuse to run it.

Note also that valgrind will also refuse to run executables that are not readable (i.e. executables that are 'execute-only'). Obviously, if it must trace the to-be-execed executable, it needs read access to the executable. But if it does not need to trace the executable, then the only reason that it needs read access is so it can better guess as to whether execve will succeed (it checks the type of the file -- if it's neither ELF nor script, it knows execve will fail).

There are several possible solutions to the problem. The most flexible (it seems to me) would be to not do any pre-exec checking if trace-children is not in effect, and to properly restore the process state if execve fails. Comments in the code suggest that this might be difficult. The pre-exec processing changes process state in three ways prior to invoking the system call:

* it kills all threads
* it resets some signal handling
* it restores some limits (via setrlimit)

The only non-recoverable thing in the above list is the killing of threads. I'm not 100% sure why valgrind does this in user code anyway -- the system call itself, as I understand it, will properly terminate all the threads anyway.

The second solution would be to remove improve the pre-exec checking beforehand (i.e. don't worry about SUID if trace-children is not in effect. If it is, allow the exec if the effective user id matches the executables effective user id, and only check the format of the executable *if* it is readable).

The third solution would be to allow the user to specify a 'whitelist' of executables that will be execed without checking, so that users can workaround the sometimes-overly-restrictive pre-exec checking.

Additionally, in all cases, if an executable is 'execable' but not traceable (e.g. because it is suid, or is not readable), it should be an option to exec that particular child normally (without valgrind tracing). It may be the case that it would be useful to the user to trace those children which are traceable, and simply allow non-traceable children to execute normally.

Comment 10 Jeremy Fitzhardinge 2006-10-14 00:35:18 UTC

> There are several possible solutions to the problem.  The most flexible (it seems to me) would be to not do any pre-exec checking if trace-children is not in effect, and to properly restore the process state if execve fails.  Comments in the code suggest that this might be difficult.  The pre-exec processing changes process state in three ways prior to invoking the system call:
>
> * it kills all threads
> * it resets some signal handling
> * it restores some limits (via setrlimit)
>
> The only non-recoverable thing in the above list is the killing of threads.  I'm not 100% sure why valgrind does this in user code anyway -- the system call itself, as I understand it, will properly terminate all the threads anyway.
>   


Valgrind needs to report all the program termination information at exec 
time.  If it does all the exit-time work to generate that report and the 
exec fails, it would probably be hard to recover all the state needed to 
continue (I don't know if its reasonable to require all tools to have 
non-destructive or reversable exit-time reporting).  And if it simply 
does the exec without reporting, then lots of useful information may be 
lost.

There are also some subtle signal races around exec which are hard to 
deal with.  I seem to remember that when passing control from Valgrind's 
signal handling to the kernel's there are some tricky issues in making
sure that pending signals aren't improperly delivered or dropped on the 
ground.  Oh, yeah, the transition from having a signal handler to 
SIG_DFL is tricky; if there's a pending signal at exec time, then it
needs to remain pending over the exec and then be delivered to the new 
process with the default handler.  I don't think Valgrind gets this 
quite right now, but it ends up being OK if it can guarantee that exec 
will succeed once it does all the pulldown.

> The second solution would be to remove improve the pre-exec checking beforehand (i.e. don't worry about SUID if trace-children is not in effect. If it is, allow  the exec if the effective user id matches the executables effective user id, and  only check the format of the executable *if* it is readable).
>   


Allowing exec on SUID for no-trace would be OK, I think.  But I'd be a 
bit worried about simply matching uid/gid[s] for exec - even if they 
match there are still changes to the process credentials (something
subtle related to the handling of effective uid and saved effective 
uid).  And these days, systems using selinux/credentials/etc need much 
more complex permissions computations.

An alternative would be to get the kernel to do the test by running the 
execve in a controlled environment (fork off a process, run it under 
ptrace, kill it before it starts running anything), but that's pretty
complex.  And even then, Valgrind wouldn't know what the security 
environment of the new process should be.

> The third solution would be to allow the user to specify a 'whitelist' of executables that will be execed without checking, so that users can workaround the sometimes-overly-restrictive pre-exec checking.
>   


Possible, but not very pretty.  How would you identify the whitelist 
executables?  An equivalent but more generally useful thing would be the 
ability to list which children you want to trace.

> Additionally, in all cases, if an executable is 'execable' but not traceable (e.g. because it is suid, or is not readable), it should be an option to exec that particular child normally (without valgrind tracing).  It may be the case that it would be useful to the user to trace those children which are traceable, and simply allow non-traceable children to execute normally.
>   


Yes, doable.


    J

Comment 11 R. Greayer 2006-10-16 18:42:17 UTC

>Possible, but not very pretty.  How would you identify the whitelist
>executables?  An equivalent but more generally useful thing would be the
>ability to list which children you want to trace. 

I would imagine that you could specify the whitelist in the same
way as you could specify the list of children you want to trace.
But the approaches aren't *quite* equivalent.  The whitelist approach 
assumes that the precheck is desired even if we don't want to trace the
child.  This may be true, in some circumstances (e.g. we don't know
beforehand what files the application will try to exec, such as when the 
application is an interactive shell.).  The approach where the children 
to be traced are specified means, I assume, that other children (i.e. 
other unspecified files that get execed) will not be prechecked (except
to see if they have execute permission, which I can't see a reason to 
not do in all circumstances).

In either case (allowing children to be traced to be specified, or allowing
certain executables to be 'whitelisted' for execution), the mechanism
with which to specify the child could be similar, e.g.

valgrind --trace-child=/usr/local/bin/foo ...

or

valgrind --allow-exec=/usr/local/bin/bar ...

or allow the trace-list or whitelist to be read from a file:

valgrind --trace-these-children=<file>


---------------

With respect to SUID/SGID, is it not the case that if the current real-uid,
effective-uid, and saved-uid are the same, and the file to-be-execed is owned
by the same uid, then executing the file will have no effect on process 
authorizations?  That's my current understanding.  With respect to selinux, I don't *think* that SUID/SGID apply, do they?

Comment 12 Jeremy Fitzhardinge 2006-10-16 20:23:57 UTC

> valgrind --trace-child=/usr/local/bin/foo ...
>
> or
>
> valgrind --allow-exec=/usr/local/bin/bar ...
>
> or allow the trace-list or whitelist to be read from a file:
>
> valgrind --trace-these-children=<file>
>   


The trouble with pathnames is specifying what namespace context the name 
is evaluated in (cwd for relative paths; chroot and local namespaces for 
absolute).  On the other hand, Valgrind doesn't deal with chroot and 
local namespaces terribly well anyway.

Also, how should this work with fexecve?

> With respect to SUID/SGID, is it not the case that if the current real-uid,
> effective-uid, and saved-uid are the same, and the file to-be-execed is owned
> by the same uid, then executing the file will have no effect on process 
> authorizations?  That's my current understanding.


Linux has uid, euid, saved uid, and fsuid.  If they all match the suid 
executable uid, then I think nothing will happen.  Simiarly, there's the 
gid, egid sand saved gid; there's no fsgid, but there is the aux 
groups.  I'm not sure how they fit into the picture.  So its all pretty 
subtle.

Alternatively, you could just throw your hands up and add a 
--sim-hints=ignore-suid-exec to ignore the whole problem and go ahead 
anyway.


>   With respect to selinux, I don't *think* that SUID/SGID apply, do they?
>   


I'm not sure, but I think the basic rule that there are no privilege 
changes (except perhaps narrowing of privilege) without sguid being set 
is still preserved.  Otherwise very unexpected things could happen.

Comment 13 R. Greayer 2006-10-16 21:45:24 UTC

With respect to path issues, you could be as restrictive (the path provided on the command line has to EXACTLY match the path provided in the execve call) to as flexible as possible (e.g. use 'stat' on the original path provided, and 'stat' on the path passed into execve, and thereby determine if the file referenced in both cases is the same).  Either way, the facility would probably flexible enough for most purposes.  If you did it the 2nd way, I think fexecve would fall out naturally.  But the simple option of having an 'ignore-suid-exec' or 'ignore-suid-or-execute-only-exec' option would probably answer for an overwhelming majority of cases.

Comment 14 Julian Seward 2007-11-17 22:13:53 UTC

Situation is improved somewhat by r7175, whose commit message
is as follows.

Make handling of setuid executables marginally more sensible, as
suggested in #119404.

Prior to this commit, if the current traced process attempted to
execve a setuid executable, an error was always returned.  The revised
behaviour is:

If the current (traced) process attempts to execve a setuid
executable:

* If --trace-children=yes is not in effect, the execve is allowed.

* If --trace-children=yes is in effect, the execve is disallowed
  (as at present), but an error message is printed (unless in XML mode),
  so at least the execve does not fail silently any more.

As per discussion on #119404 we could probably do a lot better, but
these changes are at least simple, useful and uncontroversial.

Comment 15 Nicholas Nethercote 2009-07-01 09:02:59 UTC

Greater control of what is traced (whitelists/blacklists) is covered in bug 148932.  Given that Julian improved things as comment 14 describes, I'm declaring victory on this one.