Version: valgrind-2.1.0 CVS (using KDE Devel) Installed from: Compiled sources Compiler: gcc (GCC) 3.2 OS: Linux The following short test programm failed with valgrind-2.1.0 and CVS HEAD checkout: #include <pthread.h> #include <limits.h> int main(int argc, char *argv[]) { struct timespec timeout; pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t cond = PTHREAD_COND_INITIALIZER; pthread_mutex_lock(&mut); /* wait forever */ timeout.tv_sec = INT_MAX; timeout.tv_nsec = 0; pthread_cond_timedwait(&cond, &mut, &timeout); pthread_mutex_unlock(&mut); return 0; } gcc -Wall test1.c -pthread valgrind --tool=corecheck ./a.out ==7390== Coregrind, a rudimentary error detector for x86-linux. ==7390== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote. ==7390== Using valgrind-2.1.0, a program supervision framework for x86-linux. ==7390== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward. ==7390== For more details, rerun with: -v ==7390== valgrind: vg_scheduler.c:786 (idle): Assertion `delta >= 0' failed. ==7390== at 0xB802F21C: vgPlain_skin_assert_fail (vg_mylibc.c:1211) ==7390== by 0xB802F21B: assert_fail (vg_mylibc.c:1207) ==7390== by 0xB802F289: vgPlain_core_assert_fail (vg_mylibc.c:1218) ==7390== by 0xB800E0C6: idle (vg_scheduler.c:793) sched status: Thread 1: status = WaitCV, associated_mx = 0xABFFE324, associated_cv = 0xABFFE31 4 ==7390== at 0x810227FA: pthread_cond_timedwait (vg_libpthread.c:1361) ==7390== by 0x80485B0: main (in /home/seiderer/valgrind/test/a.out) The following patch removes the signed/unsigned clash in case of huge pthread_cond_timedwait timeout values: Index: coregrind/vg_scheduler.c =================================================================== RCS file: /home/kde/valgrind/coregrind/vg_scheduler.c,v retrieving revision 1.142 diff -u -3 -p -r1.142 vg_scheduler.c --- coregrind/vg_scheduler.c 28 Feb 2004 15:40:36 -0000 1.142 +++ coregrind/vg_scheduler.c 5 Mar 2004 19:17:33 -0000 @@ -759,7 +759,7 @@ static void idle ( void ) { struct vki_pollfd pollfd[1]; - Int delta = -1; + UInt delta = (UInt)-1; Int fd = VG_(proxy_resfd)(); pollfd[0].fd = fd; @@ -782,19 +782,19 @@ void idle ( void ) } if (tp != NULL) { + vg_assert(tp->time >= now); delta = tp->time - now; - vg_assert(delta >= 0); } if (wicked) delta = 0; } /* gotta wake up for something! */ - vg_assert(fd != -1 || delta != -1); + vg_assert(fd != -1 || delta != (UInt)-1); /* If we need to do signal routing, then poll for pending signals every VG_(clo_signal_polltime) mS */ - if (VG_(do_signal_routing) && (delta > VG_(clo_signal_polltime) || delta == -1)) + if (VG_(do_signal_routing) && (delta > VG_(clo_signal_polltime) || delta == (UInt)-1)) delta = VG_(clo_signal_polltime); if (VG_(clo_trace_sched)) {
This trouble with this is that poll() takes a signed integer timeout, and considers anything <0 to be infinite. Is the problem that delta becomes negative, or that we should map all negative values to -1?
With 'tp->time' and 'now' defined as UInt and delta as Int the above calculation 'delta = tp->time - now;' will produce false results if 'tp->time - now' exceeds INT_MAX. The suggested patch tries to avoid this by changing the definition of delta from Int to UInt (and move the assertion a little bit upwards). If delta must be kept as Int the result of 'tp->time - now' must be checked and/or limited not to exceed INT_MAX. But the real (or a further) problem seems to be the handling of huge timeouts given to pthread_cond_timedwait, which wrap around due to the limited range of the internal time representation in 1/1000 seconds: pthread_cond_timedwait with a timeout of now + 5 seconds calls add_timeout with time = 5000 pthread_cond_timedwait with timeout of now + 4294968 seconds (49 days) calls add_timeout with time = 704 See attached test2.c program: gcc -Wall test2.c time valgrind --tool=corecheck ./a.out 5 real 0m5.196s time valgrind --tool=corecheck ./a.out 4294968 real 0m0.898s
Created attachment 5070 [details] Test for timeout wraparound
Created attachment 5071 [details] Suggested patch against timeout wraparound Suggested patch to avoid timeout wraparound from pthread_cond_timedwait (plus fix of wrong convertion from tv_usec to ms).
Created attachment 5086 [details] New suggested patch for coregrind/vg_scheduler.c Sorry, did not see the need for a signed delta because of the poll call while doing the first suggested patch. Here now a new, less invasive patch,limiting delta to INT_MAX and moving the 'wake up' assert some lines down to give a overwrite by clo_signal_polltime a chance.
Looks good. I've give it a closer look and probably apply it.
Created attachment 5620 [details] Update: Suggested patch against timeout wraparound Fix bug in previous patch, the line - ull_ms_now = 1000ULL * ((unsigned long long int)(ms_now)); should read + ull_ms_now = ((unsigned long long int)(ms_now)); Peter
Created attachment 7695 [details] Updated patch against valgrind-2.2.0 Updated patch against valgrind-2.2.0 (both previous patches in one). - Fix calculation of delta in vg_scheduler.c (the assertion). - Fix timestamp calculation of for pthread_cond_timedwait in vg_libpthread.c (short 'real' timeouts for long given timeouts).
This patch has now been committed, minus the piece that moved the assertion as I don't believe that to be correct or necessary. We should have something to wake up for regardless of whether we are having to route signals or not.
Hi Tom Hughes, I was looking at your patch for the head branch for this bug. Committed by Tom Hughes on 2004/10/17, 15:18:22 in branch HEAD http://cvs-digest.org/index.php?diff&file=valgrind/coregrind/vg_scheduler.c,v&version=1.191 And there is a problem with valgrind/coregrind/vg_libpthread.c fix at "Line 1375 int __pthread_mutex_timedlock(pthread_mu". You need to set ull_ms_now and ull_ms_end before the if statement. Like you did at Line 1548. So you just need to add the following. ull_ms_now = ((unsigned long long int)(ms_now)); ull_ms_end = ull_ms_now + (ull_ms_end_after_1970 - ull_ms_now_after_1970); Thanks for the fix. It works great for me. David
Those lines were put in the next day when the overnight builds failed.