Bug 339537 - two threads hang in pthread_spin_lock and pthread_spin_unlock
Summary: two threads hang in pthread_spin_lock and pthread_spin_unlock
Status: RESOLVED DUPLICATE of bug 336435
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: unspecified
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-30 18:52 UTC by sage@newdream.net
Modified: 2014-10-01 10:24 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description sage@newdream.net 2014-09-30 18:52:12 UTC
I have two threads running under valgrind, one stuck in pthread_spin_lock, and one stuck in pthread_spin_unlock:

Thread 43 (Thread 21954):
#0  pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:33
#1  0x000000000068c11a in ceph_spin_lock (l=0x41264a8) at ./include/Spinlock.h:45
#2  lock (this=0x41264a8) at ./include/Spinlock.h:94
#3  Locker (s=..., this=<synthetic pointer>) at ./include/Spinlock.h:105
#4  is_active (this=0x4126000) at osd/OSD.h:1074
#5  OSD::dispatch_context (this=this@entry=0x4126000, ctx=..., pg=pg@entry=0x0, curmap=..., handle=handle@entry=0x21b38a70) at osd/OSD.cc:7129
#6  0x0000000000698977 in OSD::process_peering_events (this=0x4126000, pgs=..., handle=...) at osd/OSD.cc:8467
#7  0x00000000006edfb8 in OSD::PeeringWQ::_process (this=<optimized out>, pgs=..., handle=...) at osd/OSD.h:1595
#8  0x0000000000b70ee6 in ThreadPool::worker (this=0x41264b0, wt=0xc412090) at common/WorkQueue.cc:128
#9  0x0000000000b71f90 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:318
#10 0x0000000005d72182 in start_thread (arg=0x21b39700) at pthread_create.c:312
#11 0x000000000784438d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 27 (Thread 21938):
#0  pthread_spin_unlock () at ../nptl/sysdeps/x86_64/pthread_spin_unlock.S:23
#1  0x00000000006597de in ceph_spin_unlock (l=0x41264a8) at ./include/Spinlock.h:50
#2  unlock (this=0x41264a8) at ./include/Spinlock.h:98
#3  ~Locker (this=<synthetic pointer>, __in_chrg=<optimized out>) at ./include/Spinlock.h:108
#4  is_active (this=0x4126000) at osd/OSD.h:1075
#5  OSD::require_self_aliveness (this=this@entry=0x4126000, op=..., epoch=epoch@entry=2071) at osd/OSD.cc:6742
#6  0x00000000006776fe in OSD::require_same_or_newer_map (this=this@entry=0x4126000, op=..., epoch=2071, is_fast_dispatch=is_fast_dispatch@entry=false) at osd/OSD.cc:6802
#7  0x000000000069fd56 in OSD::handle_pg_notify (this=0x4126000, op=...) at osd/OSD.cc:7310
#8  0x00000000006a2c58 in OSD::dispatch_op (this=this@entry=0x4126000, op=...) at osd/OSD.cc:5690
#9  0x00000000006a81d8 in OSD::_dispatch (this=this@entry=0x4126000, m=m@entry=0x333592a0) at osd/OSD.cc:5843
#10 0x00000000006a88a7 in OSD::ms_dispatch (this=0x4126000, m=0x333592a0) at osd/OSD.cc:5386
#11 0x0000000000c203e9 in ms_deliver_dispatch (m=0x333592a0, this=0x4068700) at msg/Messenger.h:532
#12 DispatchQueue::entry (this=0x40688b8) at msg/DispatchQueue.cc:185
#13 0x0000000000b5cd8d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/DispatchQueue.h:104
#14 0x0000000005d72182 in start_thread (arg=0x19b29700) at pthread_create.c:312
#15 0x000000000784438d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This looks somewhat similar to #336435, but that fix should be in my version (1:3.10~20140411-0ubuntu1)


Reproducible: Sometimes

Steps to Reproduce:
This occurs very infrequently in our regression tests, but we have seen it several times.  It is tracked here:  http://tracker.ceph.com/issues/8822.




Valgrind us run like so:

 valgrind --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.1.log --time-stamp=yes --tool=memcheck ceph-osd -f -i 1

Gdb tells me they are hung on the same pthread_spinlock_t, and it looks like this:

(gdb) p _lock
$1 = {lock = -1}

Valgrind version is 1:3.10~20140411-0ubuntu1

OS is 
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.1 LTS
Release:        14.04
Codename:       trusty
Comment 1 Mark Wielaard 2014-09-30 20:02:30 UTC
(In reply to sage@newdream.net from comment #0)
> I have two threads running under valgrind, one stuck in pthread_spin_lock,
> and one stuck in pthread_spin_unlock:
> 
> This looks somewhat similar to #336435, but that fix should be in my version
> (1:3.10~20140411-0ubuntu1)
> [...]
> Valgrind version is 1:3.10~20140411-0ubuntu1
> 
> OS is 
> Distributor ID: Ubuntu
> Description:    Ubuntu 14.04.1 LTS
> Release:        14.04
> Codename:       trusty

I admit to not know much about ubuntu packages, but are you sure that package contains the fix for bug #336435? The fix for that bug was valgrind svn r14386 committed on Fri Aug 29 2014. The above package name and ubuntu release version make it sound like your package is not really based on valgrind 3.10.0 final which was released on 10 September 2014, but on a snapshot of upstream from April 2014.
Comment 2 sage@newdream.net 2014-09-30 20:33:43 UTC
Ah, so it is.  I'll check with Canonical.  Thanks!
Comment 3 Mark Wielaard 2014-10-01 10:24:53 UTC
OK, thanks. Lets assume this really is bug #336435 for now.
Please reopen if you have tested against 3.10.0 and the issue is still not resolved.

*** This bug has been marked as a duplicate of bug 336435 ***