Bug 85625

Summary:	Syscall param execve(envp) contains uninitialised or unaddressable byte(s)
Product:	[Developer tools] valgrind	Reporter:	Peter Seiderer <ps.report>
Component:	memcheck	Assignee:	Tom Hughes <tom>
Status:	RESOLVED FIXED
Severity:	normal	CC:	rdykiel
Priority:	NOR
Version:	2.1.2
Target Milestone:	---
Platform:	Compiled Sources
OS:	Linux
Latest Commit:		Version Fixed In:
Attachments:	Patch to avoid banning thread stacks on fork

Description Peter Seiderer 2004-07-21 15:38:16 UTC

-- begin file system.c --
#include <stdlib.h>
#include <pthread.h>

void *thread(void *args) {
  system("ls -l");
  return NULL;
}

int main(int argc, char *argv[]) {
#if NO_THREAD
  thread(NULL);
#else
  pthread_t th;
  pthread_create(&th, NULL, thread, NULL);
  pthread_join(th, NULL);
#endif
  return 0;
}
-- end file system.c --

Compile and valgrind output with one extra thread:

gcc -Wall -g -pthread system.c -o bug_one_thread

~/test/valgrind_bug1> valgrind --db-attach=yes --tool=memcheck ./bug_one_thread 
==23413== Memcheck, a memory error detector for x86-linux.
==23413== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al.
==23413== Using valgrind-2.1.2, a program supervision framework for x86-linux.
==23413== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==23413== For more details, rerun with: -v
==23413== 
==23416== Thread 2:
==23416== Syscall param execve(envp) contains uninitialised or unaddressable byte(s)
==23416==    at 0x1B9F82A7: execve (in /lib/libc.so.6)
==23416==    by 0x1B98A442: do_system (in /lib/libc.so.6)
==23416==    by 0x1B910C96: system (vg_libpthread.c:2526)
==23416==    by 0x8048446: thread (system.c:5)
==23416==  Address 0x52BFE41C is not stack'd, malloc'd or (recently) free'd
==23416== 
==23416== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- 
==23416== 
==23416== Thread 2:
==23416== Syscall param execve(envp[i]) contains uninitialised or unaddressable
byte(s)
==23416==    at 0x1B9F82A7: execve (in /lib/libc.so.6)
==23416==    by 0x1B98A442: do_system (in /lib/libc.so.6)
==23416==    by 0x1B910C96: system (vg_libpthread.c:2526)
==23416==    by 0x8048446: thread (system.c:5)
==23416==  Address 0x52BFE5BD is not stack'd, malloc'd or (recently) free'd
==23416== 
==23416== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- 
total 44
-rwxr-xr-x    1 seiderer users       16426 2004-07-21 14:54 bug_no_thread
-rwxr-xr-x    1 seiderer users       16720 2004-07-21 14:54 bug_one_thread
-rw-r--r--    1 seiderer users         283 2004-07-21 14:51 system.c
==23413== 
==23413== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 16 from 1)
==23413== malloc/free: in use at exit: 216 bytes in 2 blocks.
==23413== malloc/free: 5 allocs, 3 frees, 584 bytes allocated.
==23413== For a detailed leak analysis,  rerun with: --leak-check=yes
==23413== For counts of detected errors, rerun with: -v


Compile and valgrind output without extra thread:

gcc -Wall -g -pthread -DNO_THREAD system.c -o bug_no_thread

~/test/valgrind_bug1> valgrind --db-attach=yes --tool=memcheck ./bug_no_thread 
==23425== Memcheck, a memory error detector for x86-linux.
==23425== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al.
==23425== Using valgrind-2.1.2, a program supervision framework for x86-linux.
==23425== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==23425== For more details, rerun with: -v
==23425== 
total 44
-rwxr-xr-x    1 seiderer users       16426 2004-07-21 14:54 bug_no_thread
-rwxr-xr-x    1 seiderer users       16720 2004-07-21 14:54 bug_one_thread
-rw-r--r--    1 seiderer users         283 2004-07-21 14:51 system.c
==23425== 
==23425== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 16 from 1)
==23425== malloc/free: in use at exit: 216 bytes in 2 blocks.
==23425== malloc/free: 2 allocs, 0 frees, 216 bytes allocated.
==23425== For a detailed leak analysis,  rerun with: --leak-check=yes
==23425== For counts of detected errors, rerun with: -v

Is this a problem of valgrind or a problem of the pthread library?

Comment 1 Peter Seiderer 2004-07-21 15:44:44 UTC

Both versions of the testprogramm produce no error report when
valgrind-2.0.0 is used.

Comment 2 Nicholas Nethercote 2004-07-21 16:19:53 UTC

It may well be a Valgrind bug;  I think Valgrind might crash when 
execve(foo, NULL, NULL) is executed;  strictly speaking I think that's not 
allowed but the normal execution works ok.

Comment 3 Tom Hughes 2004-07-21 17:17:41 UTC

In message <20040721141954.12511.qmail@ktown.kde.org>
        Nicholas Nethercote <njn25@cam.ac.uk> wrote:

> It may well be a Valgrind bug;  I think Valgrind might crash when 
> execve(foo, NULL, NULL) is executed;  strictly speaking I think that's not 
> allowed but the normal execution works ok.

I think I fixed that didn't I? I think this is something else.

Tom

Comment 4 Nicholas Nethercote 2004-07-21 17:54:50 UTC

Seemingly not, this program segfaults for me:

#include <stdlib.h>
#include <unistd.h>

int main(void)
{
    execve("/bin/true", NULL, NULL);
    return 0;
}

Comment 5 Tom Hughes 2004-07-21 18:14:46 UTC

I was thinking of bug 83573 which apparently only dealt with envp being a NULL pointer. Is it even legal for argv to be NULL though? Don't you have to at least provide an argv[0] with the program name?

Comment 6 Tom Hughes 2004-07-21 18:24:04 UTC

I've fixed that execve() problem now but the problem with system when called in a second thread is still occurring.

Comment 7 Nicholas Nethercote 2004-07-21 19:29:19 UTC

I think the program name goes in the first argument to execve(), and the 
argv[] array holds the rest.

Comment 8 Tom Hughes 2004-07-21 19:33:35 UTC

You normally give the program name to execve() twice. The first argument is the name of the program that the kernel will start, but if you want argv[0] to be correct in the started program then you need to provide argv[0] in the call to execve.

Comment 9 Tom Hughes 2004-11-06 01:15:17 UTC

*** Bug 92763 has been marked as a duplicate of this bug. ***

Comment 10 Tom Hughes 2004-11-06 16:25:57 UTC

Right. I think I understand what is happening here. The environment is held on the processes initial stack, which is the stack of the main thread. When a process forks valgrind kills all the threads other than the one which does the fork, just as POSIX says it should.

Unfortunately when it kills those threads it also marks their stacks as inaccesible. So if you fork in a thread other than the main thread then the main thread's stack (and hence the environment) become inaccessible in the child process.

I suspect that the correct fix is not to mark the stacks as inaccesible as they do still exist. That's actually an interesting feature of forking when there are multiple threads - you sort of leak memory because all the other threads are killed bu their stack space is still allocated and accessible.

Comment 11 Tom Hughes 2004-11-07 11:21:06 UTC

Created attachment 8206 [details]
Patch to avoid banning thread stacks on fork

This patch changes VG_(nuke_all_threads) to disassociate the the stacks of the
threads being killed from the threads rather than marking them as inaccessible.


This should fix the problem with the environment (and other data from the
stacks of other threads) causing warnings after a fork. I believe that
VG_(nuke_all_threads) is only called in places where this is the behaviour that
we want or where it doesn't matter because we're about to exit anyway.

Comment 12 Tom Hughes 2004-11-13 00:12:06 UTC

I've now committed my patch to the CVS head but I'd be grateful if you could confirm whether or not it seems to fix the problem for you. Thanks.

Comment 13 Peter Seiderer 2004-11-15 13:02:35 UTC

Thanks for the patch, works for my little test program and the real world
application where the error first occured (checked with CVS head).

Comment 14 Richard Dykiel 2004-11-15 21:21:52 UTC

Works for me as well; thanks for the fix.

Comment 15 Tom Hughes 2004-11-15 23:55:53 UTC

Reopening so I can close with the correct resolution.

Comment 16 Tom Hughes 2004-11-15 23:56:04 UTC

*** Bug has been marked as fixed ***.