Bug 212267 - signal cause memcheck to reports false positives
Summary: signal cause memcheck to reports false positives
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.6 SVN
Platform: Unlisted Binaries Linux
: NOR major
Target Milestone: blocking3.5.1
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-29 10:15 UTC by Konstantin Serebryany
Modified: 2009-10-29 16:19 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
C++ test (3.77 KB, text/x-c++src)
2009-10-29 10:15 UTC, Konstantin Serebryany
Details
version 2 of the test case (3.82 KB, text/x-c++src)
2009-10-29 11:07 UTC, Konstantin Serebryany
Details
Patch to fix interrupted syscall bug (543 bytes, patch)
2009-10-29 11:53 UTC, Tom Hughes
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Konstantin Serebryany 2009-10-29 10:15:56 UTC
Created attachment 37940 [details]
C++ test

[ Initially discussed here: http://www.nabble.com/Re:-memcheck's-false-positive-due-to-epoll_wait--to26005359.html ]

Suppose a program makes system call that initializes its arguments (e.g. read() or epoll_wait()). 
Valgrind knows what part of argument is initialized by epoll_wait() only after epoll_wait() has returned:
POST(sys_epoll_wait)
{
   vg_assert(SUCCESS);
   if (RES > 0)
      POST_MEM_WRITE( ARG2, sizeof(struct vki_epoll_event)*RES ) ;
}

But, if the program is interrupted by a signal, the POST(sys_epoll_wait) part is not executed and the memory (ARG2) is left uninitialized (from memcheck's point of view). This leads to false positives, which appear to be quite numerous and annoying in our environment. 

Attached please fine a simple C++ program that demonstrates the problem. 
It shows how read() is interrupted by a signal. 
To get the report you will likely need to run the program many (50-100) times. 

% g++ epoll_wait_test2.cc -g -lpthread
% for((i=0; i< 1000; i++)); do ~/valgrind/trunk/inst/bin/valgrind -q  ./a.out ; done 2>&1 | tee log
....
==4038== Conditional jump or move depends on uninitialised value(s)
==4038==    at 0x4C1DA6A: bcmp (mc_replace_strmem.c:526)
==4038==    by 0x400D88: RunServer(void*) (epoll_wait_test2.cc:144)
==4038==    by 0x401066: main (epoll_wait_test2.cc:166)
==4038==
==4038== Conditional jump or move depends on uninitialised value(s)
==4038==    at 0x4C1DA82: bcmp (mc_replace_strmem.c:526)
==4038==    by 0x400D88: RunServer(void*) (epoll_wait_test2.cc:144)
==4038==    by 0x401066: main (epoll_wait_test2.cc:166)
==4038==
==4038== Conditional jump or move depends on uninitialised value(s)
==4038==    at 0x400D8B: RunServer(void*) (epoll_wait_test2.cc:144)
==4038==    by 0x401066: main (epoll_wait_test2.cc:166)
==4038==

In each particular case there is a simple workaround: memset() the array passed to a system call. 
But if you do so you may miss real bugs. Also, it may require lots of time to workaround all cases. 

Is that a known problem? 
Any other suggestion?
Comment 1 Tom Hughes 2009-10-29 10:35:39 UTC
Can you please provide the full version of the partial log you posted on the mailing list - you only included the epoll call and the signal return in that but I want to see what comes after that as the epoll should have been restarted and then returned something.

Add a signal trace to the log along with the syscall trace would probably also help.
Comment 2 Tom Hughes 2009-10-29 10:37:11 UTC
Instructions for running the test case that (a) produce valgrind logs (b) don't throw in lots of meaningless numbers from the test progam and (c) don't rely on bashisms would also be helpful ;-)
Comment 3 Tom Hughes 2009-10-29 10:49:00 UTC
Oh, one other thing - how do I actually know when a particular run has failed?
Comment 4 Tom Hughes 2009-10-29 10:59:01 UTC
Grrrr.... I've just noticed that the epoll_wait() call isn't even enabled in your test program! Can we have a test which actually works please?
Comment 5 Konstantin Serebryany 2009-10-29 11:02:10 UTC
The test *does* work. It shows the problem with read() system call. 
You need to run it many times to see the problem. 
I an trying to make a more reproducible test, but that's not trivial.
Comment 6 Konstantin Serebryany 2009-10-29 11:07:17 UTC
Created attachment 37942 [details]
version 2 of the test case

A better test added. 
Unfortunately, is still requires noticeable time to see the false report. 

% g++ -g -lpthread signal_test.cc
% ~/valgrind/trunk/inst/bin/valgrind -q ./a.out
<after quite some time>
==18671== Conditional jump or move depends on uninitialised value(s)
==18671==    at 0x4C1DA6A: bcmp (mc_replace_strmem.c:526)
==18671==    by 0x400D88: RunServer(void*) (signal_test.cc:146)
==18671==    by 0x401027: main (signal_test.cc:168)
==18671==
==18671== Conditional jump or move depends on uninitialised value(s)
==18671==    at 0x4C1DA82: bcmp (mc_replace_strmem.c:526)
==18671==    by 0x400D88: RunServer(void*) (signal_test.cc:146)
==18671==    by 0x401027: main (signal_test.cc:168)
==18671==
==18671== Conditional jump or move depends on uninitialised value(s)
==18671==    at 0x400D8B: RunServer(void*) (signal_test.cc:146)
==18671==    by 0x401027: main (signal_test.cc:168)
==18671==
Comment 7 Tom Hughes 2009-10-29 11:19:53 UTC
It's OK, I've reproduced it now. The critical bit of the log is this:

--10051-- interrupted_syscall: tid=1, ip=0x600a9313, restart=True, sres.isErr=False, sres.val=8
  completed and committed: nothing to do

Which shows that the signal hit after the system call had returned but before valgrind had finished processing it.
Comment 8 Tom Hughes 2009-10-29 11:53:20 UTC
Created attachment 37943 [details]
Patch to fix interrupted syscall bug

Try this patch and see if it helps...

Basically there is a window of about 11 machine instructions (on amd64) at the end of a system call and if a signal arrives during that window then the POST handler for the system call that was interrupted will not run.
Comment 9 Konstantin Serebryany 2009-10-29 12:43:42 UTC
With this patch the bug does not manifest itself (after quite some time of running the attached test). 
So, it is either fixed (hopefully) or well hidden now. 
I'll run it on more tests later today. 

Thanks!!!
Comment 10 Konstantin Serebryany 2009-10-29 14:06:04 UTC
Hm, after a very long run I can see this (might be a completely different problem though)


==32518== Invalid write of size 8
==32518==    at 0x580D677: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000598 is on thread 1's stack
==32518==
==32518== Invalid write of size 8
==32518==    at 0x580D67F: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff0005a0 is on thread 1's stack
==32518==
==32518== Invalid write of size 8
==32518==    at 0x580D693: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff0005a8 is on thread 1's stack
==32518==
==32518== Invalid write of size 4
==32518==    at 0x580D6D0: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000560 is on thread 1's stack
==32518==
==32518== Invalid write of size 4
==32518==    at 0x580D6D7: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000564 is on thread 1's stack
==32518==
==32518== Invalid write of size 8
==32518==    at 0x580D6DF: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000568 is on thread 1's stack
==32518==
==32518== Invalid write of size 8
==32518==    at 0x580D6E9: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000570 is on thread 1's stack
==32518==
==32518== Invalid write of size 8
==32518==    at 0x580D6EE: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x7FF0006EF: ???
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000558 is on thread 1's stack
==32518==
==32518== Invalid read of size 8
==32518==    at 0x580435E: vfprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x5827B89: vsnprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x580D6F2: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000560 is on thread 1's stack
==32518==
==32518== Invalid read of size 8
==32518==    at 0x5804368: vfprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x5827B89: vsnprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x580D6F2: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000568 is on thread 1's stack
==32518==
==32518== Invalid read of size 8
==32518==    at 0x5804373: vfprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x5827B89: vsnprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x580D6F2: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000570 is on thread 1's stack
==32518==
==32518== Invalid read of size 4
==32518==    at 0x5806F1C: vfprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x5827B89: vsnprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x580D6F2: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000560 is on thread 1's stack
==32518==
==32518== Invalid read of size 8
==32518==    at 0x5806F2F: vfprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x5827B89: vsnprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x580D6F2: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000570 is on thread 1's stack
==32518==
==32518== Invalid write of size 4
==32518==    at 0x5806F33: vfprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x5827B89: vsnprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x580D6F2: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000560 is on thread 1's stack
==32518==
==32518== Invalid read of size 4
==32518==    at 0x5806F35: vfprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x5827B89: vsnprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x580D6F2: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==32518==    by 0x401027: main (signal_test.cc:168)
==32518==  Address 0x7ff000598 is on thread 1's stack
==32518==
==32518== Invalid read of size 8
==32518==    at 0x5827BD3: vsnprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==32518==  Address 0x7ff000558 is on thread 1's stack
==32518==
Comment 11 Tom Hughes 2009-10-29 14:36:31 UTC
You need to get the output from trace-syscalls and trace-signals when it fails.
Comment 12 Konstantin Serebryany 2009-10-29 14:39:52 UTC
I can't reproduce this any more... I'll let you know if I see it again. 

And the original problem (on large piece of software) is gone with your patch (checked with ~1000 runs)
Comment 13 Tom Hughes 2009-10-29 15:03:15 UTC
Fix committed as r10921.
Comment 14 Konstantin Serebryany 2009-10-29 16:05:31 UTC
On Thu, Oct 29, 2009 at 4:36 PM, Tom Hughes <tom@compton.nu> wrote:

> https://bugs.kde.org/show_bug.cgi?id=212267
>
>
>
>
>
> --- Comment #11 from Tom Hughes <tom compton nu>  2009-10-29 14:36:31 ---
> You need to get the output from trace-syscalls and trace-signals when it
> fails.
>


Does this tell you something?

SYSCALL[1451,2]( 35) sys_nanosleep ( 0x630c100, 0x0 ) --> [async] ...
SYSCALL[1451,1](  0) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,1](  0) sys_read ( 3, 0x643fc90, 8 ) --> [async] ...
SYSCALL[1451,2]( 35) ... [async] --> Success(0x0:0x0)
SYSCALL[1451,2](  1) sys_write ( 4, 0x630c130, 8 ) --> [async] ...
SYSCALL[1451,2](  1) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,2]( 35) sys_nanosleep ( 0x630c100, 0x0 ) --> [async] ...
SYSCALL[1451,1](  0) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,1](  0) sys_read ( 3, 0x643fce0, 8 ) --> [async] ...
SYSCALL[1451,2]( 35) ... [async] --> Success(0x0:0x0)
SYSCALL[1451,2](  1) sys_write ( 4, 0x630c130, 8 ) --> [async] ...
SYSCALL[1451,2](  1) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,2]( 35) sys_nanosleep ( 0x630c100, 0x0 ) --> [async] ...
SYSCALL[1451,1](  0) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,1](  0) sys_read ( 3, 0x643fd30, 8 ) --> [async] ...
SYSCALL[1451,2]( 35) ... [async] --> Success(0x0:0x0)
SYSCALL[1451,2](  1) sys_write ( 4, 0x630c130, 8 ) --> [async] ...
SYSCALL[1451,2](  1) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,2]( 35) sys_nanosleep ( 0x630c100, 0x0 ) --> [async] ...
SYSCALL[1451,1](  0) ... [async] --> Success(0x0:0x8)
--1451-- poll_signals: got signal 27 for thread 1
--1451-- Polling found signal 27 for tid 1
--1451-- delivering signal 27 (SIGPROF):128 to thread 1
--1451-- push_signal_frame (thread 1): signal 27
==1451==    at 0x580D6C5: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==1451==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==1451==    by 0x401027: main (signal_test.cc:168)
SYSCALL[1451,1]( 35) sys_nanosleep ( 0x7fefff998, 0x0 ) --> [async] ...
SYSCALL[1451,2]( 35) ... [async] --> Success(0x0:0x0)
SYSCALL[1451,2](  1) sys_write ( 4, 0x630c130, 8 ) --> [async] ...
SYSCALL[1451,2](  1) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,2]( 35) sys_nanosleep ( 0x630c100, 0x0 ) --> [async] ...
SYSCALL[1451,1]( 35) ... [async] --> Success(0x0:0x0)
SYSCALL[1451,1]( 15) sys_rt_sigreturn ( )--1451-- VG_(signal_return) (thread
1): isRT=1 valid magic; RIP=0x580d6c5
 --> [pre-success] NoWriteResult
==1451== Invalid write of size 8
==1451==    at 0x5827B48: vsnprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==1451==    by 0x580D6F2: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==1451==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==1451==    by 0x401027: main (signal_test.cc:168)
==1451==  Address 0x7ff000488 is on thread 1's stack







>
> --
> Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
> You are watching the assignee of the bug.
>
Comment 15 Konstantin Serebryany 2009-10-29 16:06:28 UTC
On Thu, Oct 29, 2009 at 4:36 PM, Tom Hughes <tom@compton.nu> wrote:

> https://bugs.kde.org/show_bug.cgi?id=212267
>
>
>
>
>
> --- Comment #11 from Tom Hughes <tom compton nu>  2009-10-29 14:36:31 ---
> You need to get the output from trace-syscalls and trace-signals when it
> fails.
>

Does this give you something?

Does this tell you something?

SYSCALL[1451,2]( 35) sys_nanosleep ( 0x630c100, 0x0 ) --> [async] ...
SYSCALL[1451,1](  0) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,1](  0) sys_read ( 3, 0x643fc90, 8 ) --> [async] ...
SYSCALL[1451,2]( 35) ... [async] --> Success(0x0:0x0)
SYSCALL[1451,2](  1) sys_write ( 4, 0x630c130, 8 ) --> [async] ...
SYSCALL[1451,2](  1) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,2]( 35) sys_nanosleep ( 0x630c100, 0x0 ) --> [async] ...
SYSCALL[1451,1](  0) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,1](  0) sys_read ( 3, 0x643fce0, 8 ) --> [async] ...
SYSCALL[1451,2]( 35) ... [async] --> Success(0x0:0x0)
SYSCALL[1451,2](  1) sys_write ( 4, 0x630c130, 8 ) --> [async] ...
SYSCALL[1451,2](  1) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,2]( 35) sys_nanosleep ( 0x630c100, 0x0 ) --> [async] ...
SYSCALL[1451,1](  0) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,1](  0) sys_read ( 3, 0x643fd30, 8 ) --> [async] ...
SYSCALL[1451,2]( 35) ... [async] --> Success(0x0:0x0)
SYSCALL[1451,2](  1) sys_write ( 4, 0x630c130, 8 ) --> [async] ...
SYSCALL[1451,2](  1) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,2]( 35) sys_nanosleep ( 0x630c100, 0x0 ) --> [async] ...
SYSCALL[1451,1](  0) ... [async] --> Success(0x0:0x8)
--1451-- poll_signals: got signal 27 for thread 1
--1451-- Polling found signal 27 for tid 1
--1451-- delivering signal 27 (SIGPROF):128 to thread 1
--1451-- push_signal_frame (thread 1): signal 27
==1451==    at 0x580D6C5: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==1451==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==1451==    by 0x401027: main (signal_test.cc:168)
SYSCALL[1451,1]( 35) sys_nanosleep ( 0x7fefff998, 0x0 ) --> [async] ...
SYSCALL[1451,2]( 35) ... [async] --> Success(0x0:0x0)
SYSCALL[1451,2](  1) sys_write ( 4, 0x630c130, 8 ) --> [async] ...
SYSCALL[1451,2](  1) ... [async] --> Success(0x0:0x8)
SYSCALL[1451,2]( 35) sys_nanosleep ( 0x630c100, 0x0 ) --> [async] ...
SYSCALL[1451,1]( 35) ... [async] --> Success(0x0:0x0)
SYSCALL[1451,1]( 15) sys_rt_sigreturn ( )--1451-- VG_(signal_return) (thread
1): isRT=1 valid magic; RIP=0x580d6c5
 --> [pre-success] NoWriteResult
==1451== Invalid write of size 8
==1451==    at 0x5827B48: vsnprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==1451==    by 0x580D6F2: snprintf (in /usr/grte/v1/lib64/libc-2.3.6.so)
==1451==    by 0x400D72: RunServer(void*) (signal_test.cc:145)
==1451==    by 0x401027: main (signal_test.cc:168)
==1451==  Address 0x7ff000488 is on thread 1's stack


>
> --
> Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
> You are watching the assignee of the bug.
>
Comment 16 Tom Hughes 2009-10-29 16:19:57 UTC
I just noticed that this is an invalid write, and not on the line that I thought it was (I was looking at the original test program) so it is definitely something different and we shouldn't pursue it on this bug.