I've been attempting to track down the cause of some dead locks in our locking code when run under valgrind on multi CPU boxes. Attached is a test case that implements a spin-lock using the xchgl instruction, with 2 child processes using the lock (they lock, sleep for a bit, then unlock, then possibly sleep for a bit longer). In the spin lock, if the child loops more than 50 times it prints a line out for each loop. This is fine, and expected if only one child is printing lines at the same time, however under valgrind I see (happens after a variable amount of time, generaly less than 30 seconds): ... Failed to get lock 74 times (pid 6639) Failed to get lock 74 times (pid 6638) Failed to get lock 75 times (pid 6639) Failed to get lock 75 times (pid 6638) ... You can see here that both children (pids 6639 and 6638) are trying to get the lock, but neither can, hance we have a deadlock. Using gdb to inspect the actual memory that stores the lock flag (it is stored in an anonymous mmap'd region) I see that although neither child process has got the lock, the flag is set to 1. Other observations: it appears that both children attempted to get the lock at exactly the same time, is there a race in the valgrind handling of this particular instruction ?
Created attachment 19786 [details] Test case This is the test case I am using. I have been compiling it with '-fomit-frame-pointer -O3 -g -march=pentium'. To show the bug, just run the testcase. After a period of time both child processes will spin (in the spinlock) attempting to get the lock
Argh, I forgot my system information! $ uname -a Linux psyche 2.6.15-1-686-smp #2 SMP Thu Feb 2 18:39:24 UTC 2006 i686 GNU/Linux $ /usr/local/bin/valgrind --version valgrind-3.2.3 $ /usr/local/bin/valgrind -v ./xchgl_test (parent process output only) ==572== Memcheck, a memory error detector. ==572== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al. ==572== Using LibVEX rev 1732, a library for dynamic binary translation. ==572== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. ==572== Using valgrind-3.2.3, a dynamic binary instrumentation framework. ==572== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. ==572== --572-- Command line --572-- ./xchgl_test --572-- Startup, with flags: --572-- --tool=memcheck --572-- --num-callers=40 --572-- --memcheck:leak-resolution=high --572-- --addrcheck:leak-resolution=high --572-- --memcheck:alignment=8 --572-- --addrcheck:alignment=8 --572-- -v --572-- Contents of /proc/version: --572-- Linux version 2.6.15-1-686-smp (Debian 2.6.15-4) (maks@sternwelten.at) (gcc versi on 4.0.3 20060128 (prerelease) (Debian 4.0.2-8)) #2 SMP Thu Feb 2 18:39:24 UTC 2006 --572-- Arch and hwcaps: X86, x86-sse1-sse2 --572-- Page sizes: currently 4096, max supported 4096 --572-- Valgrind library directory: /usr/local/lib/valgrind --572-- Reading syms from /lib/ld-2.3.6.so (0x4000000) --572-- Reading debug info from /lib/ld-2.3.6.so... --572-- ... CRC mismatch (computed 790AF4E1 wanted AA51E2A5) --572-- object doesn't have a symbol table --572-- Reading syms from /home/crispin/src/xchgl_bug/xchgl_test (0x8048000) --572-- Reading syms from /usr/local/lib/valgrind/x86-linux/memcheck (0x38000000) --572-- object doesn't have a dynamic symbol table --572-- Reading suppressions file: /usr/local/lib/valgrind/default.supp --572-- Reading syms from /usr/local/lib/valgrind/x86-linux/vgpreload_core.so (0x4019000) --572-- Reading syms from /usr/local/lib/valgrind/x86-linux/vgpreload_memcheck.so (0x401B00 0) --572-- Reading syms from /lib/tls/i686/cmov/libc-2.3.6.so (0x4025000) --572-- Reading debug info from /lib/tls/i686/cmov/libc-2.3.6.so... --572-- ... CRC mismatch (computed 50C33F55 wanted 9CAC8233) --572-- object doesn't have a symbol table --572-- REDIR: 0x4090640 (rindex) redirected to 0x401E160 (rindex) ==572== ==572== Process terminating with default action of signal 2 (SIGINT) ==572== at 0x4000792: (within /lib/ld-2.3.6.so) ==572== by 0x80486C6: main (xchgl_test.c:95) --572-- REDIR: 0x408A500 (free) redirected to 0x401D234 (free) --572-- REDIR: 0x40914D0 (memset) redirected to 0x401E6F0 (memset) --572== 573-- memcheck: SMs: n_issued = 6 (96k, 0M) ==572== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 11 from 1) --572-- --572-- supp: 11 Ubuntu-stripped-ld.so ==572== malloc/free: in use at exit: 0 bytes in 0 blocks. ==572== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. ==572== ==572== All heap blocks were freed -- no leaks are possible. --572-- memcheck: sanity checks: 0 cheap, 1 expensive --572-- memcheck: auxmaps: 0 auxmap entries (0k, 0M) in use --572-- memcheck: auxmaps: 0 searches, 0 comparisons --572-- memcheck: SMs: n_issued = 6 (96k, 0M) --572-- memcheck: SMs: n_deissued = 0 (0k, 0M) --572-- memcheck: SMs: max_noaccess = 65535 (1048560k, 1023M) --572-- memcheck: SMs: max_undefined = 0 (0k, 0M) --572-- memcheck: SMs: max_defined = 19 (304k, 0M) --572-- memcheck: SMs: max_non_DSM = 6 (96k, 0M) --572-- memcheck: max sec V bit nodes: 1 (0k, 0M) --572-- memcheck: set_sec_vbits8 calls: 1 (new: 1, updates: 0) --572-- memcheck: max shadow mem size: 400k, 0M --572-- translate: fast SP updates identified: 1,380 ( 91.2%) --572-- translate: generic_known SP updates identified: 69 ( 4.5%) --572-- translate: generic_unknown SP updates identified: 64 ( 4.2%) --572-- tt/tc: 2,919 tt lookups requiring 2,936 probes --572-- tt/tc: 2,919 fast-cache updates, 2 flushes --572-- transtab: new 1,458 (29,884 -> 498,828; ratio 166:10) [0 scs] --572-- transtab: dumped 0 (0 -> ??) --572-- transtab: discarded 0 (0 -> ??) --572-- scheduler: 27,629 jumps (bb entries). --572-- scheduler: 0/1,501 major/minor sched events. --572-- sanity: 1 cheap, 1 expensive checks. --572-- exectx: 30,011 lists, 6 contexts (avg 0 per list) --572-- exectx: 11 searches, 5 full compares (454 per 1000) --572-- exectx: 0 cmp2, 26 cmp4, 0 cmpAll $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping : 1 cpu MHz : 3200.510 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor ds_cpl cid cx16 xtpr bogomips : 6403.81 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping : 1 cpu MHz : 3200.510 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor ds_cpl cid cx16 xtpr bogomips : 6400.35
Yes, it's wildly un-atomic when executed by Valgrind. Although handling this right is definitely on the eventual to-do list, it's not easily fixed in the short term. At the moment the only workaround is to ask your app not to communicate with (other instances of) itself via shared memory; for one thing you get these spins, and for another memcheck will be confused. FWIW this is a "known problem". I have seen it for example with MPI libraries on SMPs getting into similar difficulties. At least most MPI libraries have an option to do all interprocess comms via the network and not use shared memory, which enables the abovementioned workaround.
Aha, thanks. For the moment I will make our automated valgrind tests use a single process which avoids the locking problems.
Julian, will this be fixed by bug 197793? If so, this can be marked as a dup of it.
Crispin, is it OK to add your example program (w/ modifications) to Valgrind's regression test suite (as GPL2+) ? *** This bug has been marked as a duplicate of bug 197793 ***
Julian - absolutely, re-licensing under GPLv2 is absolutely fine.