142103 – xchgl handling doesn't appear atomic

Bug 142103 - xchgl handling doesn't appear atomic

Summary: xchgl handling doesn't appear atomic

Status:	RESOLVED DUPLICATE of bug 197793

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	3.2.3
Platform:	Compiled Sources Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-02-23 12:46 UTC by Crispin Flowerday
Modified:	2009-07-01 10:37 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
Test case (1.90 KB, text/plain) 2007-02-23 12:48 UTC, Crispin Flowerday	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Crispin Flowerday 2007-02-23 12:46:54 UTC

I've been attempting to track down the cause of some dead locks in our locking
code when run under valgrind on multi CPU boxes.

Attached is a test case that implements a spin-lock using the xchgl instruction,
with 2 child processes using the lock (they lock, sleep for a bit, then unlock,
then possibly sleep for a bit longer).

In the spin lock, if the child loops more than 50 times it prints a line out for
each loop. This is fine, and expected if only one child is printing lines at the
same time, however under valgrind I see (happens after a variable amount of
time, generaly less than 30 seconds):

...
Failed to get lock 74 times (pid 6639)
Failed to get lock 74 times (pid 6638)
Failed to get lock 75 times (pid 6639)
Failed to get lock 75 times (pid 6638)
...

You can see here that both children (pids 6639 and 6638) are trying to get the
lock, but neither can, hance we have a deadlock.

Using gdb to inspect the actual memory that stores the lock flag (it is stored
in an anonymous mmap'd region) I see that although neither child process has got
the lock, the flag is set to 1.

Other observations: it appears that both children attempted to get the lock at
exactly the same time, is there a race in the valgrind handling of this
particular instruction ?

Comment 1 Crispin Flowerday 2007-02-23 12:48:13 UTC

Created attachment 19786 [details]
Test case

This is the test case I am using. I have been compiling it with
'-fomit-frame-pointer -O3 -g -march=pentium'. To show the bug, just run the
testcase. After a period of time both child processes will spin (in the
spinlock) attempting to get the lock

Comment 2 Crispin Flowerday 2007-02-23 12:59:17 UTC

Argh, I forgot my system information!

$ uname -a
Linux psyche 2.6.15-1-686-smp #2 SMP Thu Feb 2 18:39:24 UTC 2006 i686 GNU/Linux

$ /usr/local/bin/valgrind --version
valgrind-3.2.3

$ /usr/local/bin/valgrind -v ./xchgl_test (parent process output only)
==572== Memcheck, a memory error detector.
==572== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==572== Using LibVEX rev 1732, a library for dynamic binary translation.
==572== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==572== Using valgrind-3.2.3, a dynamic binary instrumentation framework.
==572== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==572==
--572-- Command line
--572--    ./xchgl_test
--572-- Startup, with flags:
--572--    --tool=memcheck
--572--    --num-callers=40
--572--    --memcheck:leak-resolution=high
--572--    --addrcheck:leak-resolution=high
--572--    --memcheck:alignment=8
--572--    --addrcheck:alignment=8
--572--    -v
--572-- Contents of /proc/version:
--572--   Linux version 2.6.15-1-686-smp (Debian 2.6.15-4) (maks@sternwelten.at) (gcc versi                 on 4.0.3 20060128 (prerelease) (Debian 4.0.2-8)) #2 SMP Thu Feb 2 18:39:24 UTC 2006
--572-- Arch and hwcaps: X86, x86-sse1-sse2
--572-- Page sizes: currently 4096, max supported 4096
--572-- Valgrind library directory: /usr/local/lib/valgrind
--572-- Reading syms from /lib/ld-2.3.6.so (0x4000000)
--572-- Reading debug info from /lib/ld-2.3.6.so...
--572-- ... CRC mismatch (computed 790AF4E1 wanted AA51E2A5)
--572--    object doesn't have a symbol table
--572-- Reading syms from /home/crispin/src/xchgl_bug/xchgl_test (0x8048000)
--572-- Reading syms from /usr/local/lib/valgrind/x86-linux/memcheck (0x38000000)
--572--    object doesn't have a dynamic symbol table
--572-- Reading suppressions file: /usr/local/lib/valgrind/default.supp
--572-- Reading syms from /usr/local/lib/valgrind/x86-linux/vgpreload_core.so (0x4019000)
--572-- Reading syms from /usr/local/lib/valgrind/x86-linux/vgpreload_memcheck.so (0x401B00                 0)
--572-- Reading syms from /lib/tls/i686/cmov/libc-2.3.6.so (0x4025000)
--572-- Reading debug info from /lib/tls/i686/cmov/libc-2.3.6.so...
--572-- ... CRC mismatch (computed 50C33F55 wanted 9CAC8233)
--572--    object doesn't have a symbol table
--572-- REDIR: 0x4090640 (rindex) redirected to 0x401E160 (rindex)
==572==
==572== Process terminating with default action of signal 2 (SIGINT)
==572==    at 0x4000792: (within /lib/ld-2.3.6.so)
==572==    by 0x80486C6: main (xchgl_test.c:95)
--572-- REDIR: 0x408A500 (free) redirected to 0x401D234 (free)
--572-- REDIR: 0x40914D0 (memset) redirected to 0x401E6F0 (memset)
--572== 573--  memcheck: SMs: n_issued      = 6 (96k, 0M)
==572== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 11 from 1)
--572--
--572-- supp:   11 Ubuntu-stripped-ld.so
==572== malloc/free: in use at exit: 0 bytes in 0 blocks.
==572== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==572==
==572== All heap blocks were freed -- no leaks are possible.
--572--  memcheck: sanity checks: 0 cheap, 1 expensive
--572--  memcheck: auxmaps: 0 auxmap entries (0k, 0M) in use
--572--  memcheck: auxmaps: 0 searches, 0 comparisons
--572--  memcheck: SMs: n_issued      = 6 (96k, 0M)
--572--  memcheck: SMs: n_deissued    = 0 (0k, 0M)
--572--  memcheck: SMs: max_noaccess  = 65535 (1048560k, 1023M)
--572--  memcheck: SMs: max_undefined = 0 (0k, 0M)
--572--  memcheck: SMs: max_defined   = 19 (304k, 0M)
--572--  memcheck: SMs: max_non_DSM   = 6 (96k, 0M)
--572--  memcheck: max sec V bit nodes:    1 (0k, 0M)
--572--  memcheck: set_sec_vbits8 calls: 1 (new: 1, updates: 0)
--572--  memcheck: max shadow mem size:   400k, 0M
--572-- translate:            fast SP updates identified: 1,380 ( 91.2%)
--572-- translate:   generic_known SP updates identified: 69 (  4.5%)
--572-- translate: generic_unknown SP updates identified: 64 (  4.2%)
--572--     tt/tc: 2,919 tt lookups requiring 2,936 probes
--572--     tt/tc: 2,919 fast-cache updates, 2 flushes
--572--  transtab: new        1,458 (29,884 -> 498,828; ratio 166:10) [0 scs]
--572--  transtab: dumped     0 (0 -> ??)
--572--  transtab: discarded  0 (0 -> ??)
--572-- scheduler: 27,629 jumps (bb entries).
--572-- scheduler: 0/1,501 major/minor sched events.
--572--    sanity: 1 cheap, 1 expensive checks.
--572--    exectx: 30,011 lists, 6 contexts (avg 0 per list)
--572--    exectx: 11 searches, 5 full compares (454 per 1000)
--572--    exectx: 0 cmp2, 26 cmp4, 0 cmpAll

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.20GHz
stepping        : 1
cpu MHz         : 3200.510
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor ds_cpl cid cx16 xtpr
bogomips        : 6403.81

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.20GHz
stepping        : 1
cpu MHz         : 3200.510
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor ds_cpl cid cx16 xtpr
bogomips        : 6400.35

Comment 3 Julian Seward 2007-02-23 13:08:03 UTC

Yes, it's wildly un-atomic when executed by Valgrind.  Although handling
this right is definitely on the eventual to-do list, it's not easily fixed
in the short term.  At the moment the only workaround is to ask your app
not to communicate with (other instances of) itself via shared memory;
for one thing you get these spins, and for another memcheck will be confused.

FWIW this is a "known problem".  I have seen it for example with MPI libraries
on SMPs getting into similar difficulties.  At least most MPI libraries have
an option to do all interprocess comms via the network and not use shared
memory, which enables the abovementioned workaround.

Comment 4 Crispin Flowerday 2007-02-23 14:46:28 UTC

Aha, thanks. For the moment I will make our automated valgrind tests use a single process which avoids the locking problems.

Comment 5 Nicholas Nethercote 2009-06-30 04:36:16 UTC

Julian, will this be fixed by bug 197793?  If so, this can be marked as a dup of it.

Comment 6 Julian Seward 2009-07-01 08:17:14 UTC

Crispin, is it OK to add your example program (w/ modifications) to
Valgrind's regression test suite (as GPL2+) ?

*** This bug has been marked as a duplicate of bug 197793 ***

Comment 7 Crispin Flowerday 2009-07-01 10:37:35 UTC

Julian - absolutely, re-licensing under GPLv2 is absolutely fine.