Version: 3.5.0 (using KDE 1.2) OS: Linux Lock order violation wrongly (IMO) reported, if one of the lock was established by acquisition via trylock: /* thread 1 */ ... pthread_mutex_lock( lk1 ); ... r = pthread_mutex_trylock( lk2 ); if ( r == EBUSY ) { ... } /* thread 2 */ ... r = pthread_mutex_trylock( lk2 ); if ( r == EBUSY ) { } ... pthread_mutex_lock( lk1 ); The same issue, if only in one thread used pthread_mutex_trylock( lk2 ): data race is possible, but deadlock due to inconsistent lock ordering---impossible. Reproducible: Didn't try
Note: this causes _a lot_ of false positive in applications using Qt 4.7 where this mechanism is used inside QObject::connect tryLock should not enforce ordering if it is the second mutex.
Confirmed. Helgrind is right, that the lock order is actually wrong, but there is valid code for locking in the "wrong order" using tryLock for the second mutex, the one that's supposed to be locked first. See (Qt4-based) testcase at http://www.davidfaure.fr/2012/qmutex_trylock.cpp Helgrind warns about the wrong order, and it's being completely correct, this isn't the expected order. But because it's a tryLock and not a lock, it can't actually deadlock. I'm not sure if this means helgrind should ignore tryLock completely, or only in some cases, though.
An easy fix for the first case should be relatively simple, only add edges to the lock-order graph when trylock() succeeds, but that will still trigger false positives in the Qt case. trylock() however will never block, so no deadlock can be caused on the thread calling it. So, we still need to catch the following: Thread A: mutex1.trylock(); // succeeds mutex2.lock(); // blocked on thread B Thread B: mutex2.lock(); // succeeds mutex1.lock(); // blocked on thread A While not catching: Thread A: mutex1.lock(); mutex2.lock(); mutex2.unlock(); mutex1.unlock(); Thread B: mutex2.lock(); if (mutex1.trylock()) mutex1.unlock(); mutex2.lock() So what I'm thinking is: trylock() adds no edges, but lock() adds edges to locks held by (succeeding) trylocks(), because those could actually block. Does this sound reasonable, am I missing anything? This wouldn't report a false positive on your test case, David. The real question though: Am I opening up for any false negatives here?
(In reply to comment #3) > An easy fix for the first case should be relatively simple, only add edges > to the lock-order graph when trylock() succeeds, but that will still trigger > false positives in the Qt case. > > trylock() however will never block, so no deadlock can be caused on the > thread calling it. So, we still need to catch the following: > > Thread A: > mutex1.trylock(); // succeeds > mutex2.lock(); // blocked on thread B > > Thread B: > mutex2.lock(); // succeeds > mutex1.lock(); // blocked on thread A > > While not catching: > > Thread A: > mutex1.lock(); > mutex2.lock(); > mutex2.unlock(); > mutex1.unlock(); > > Thread B: > mutex2.lock(); > if (mutex1.trylock()) > mutex1.unlock(); > mutex2.lock() I do not understand the above example. Is it really mutex1 which is unlocked in the "if" ? or mutex2 which is 're-)locked at the end ? > > So what I'm thinking is: trylock() adds no edges, but lock() adds edges to > locks held by (succeeding) trylocks(), because those could actually block. To be sure I understand the idea, I rephrase : a Lock on mutex M adds a relationship M->L for all locks L currently held (whatever the way they were acquired: lock or trylock) a Trylock on mutex M does not add a relationship to any locks held. > Does this sound reasonable, am I missing anything? This wouldn't report a > false positive on your test case, David. > > The real question though: Am I opening up for any false negatives here? Difficult question effectively. Did spent some time trying to find an example of either false+ or false-. could not find one. Not a proof of course :).
(In reply to comment #4) > (In reply to comment #3) > > An easy fix for the first case should be relatively simple, only add edges > > to the lock-order graph when trylock() succeeds, but that will still trigger > > false positives in the Qt case. > > > > trylock() however will never block, so no deadlock can be caused on the > > thread calling it. So, we still need to catch the following: > > > > Thread A: > > mutex1.trylock(); // succeeds > > mutex2.lock(); // blocked on thread B > > > > Thread B: > > mutex2.lock(); // succeeds > > mutex1.lock(); // blocked on thread A > > > > While not catching: > > > > Thread A: > > mutex1.lock(); > > mutex2.lock(); > > mutex2.unlock(); > > mutex1.unlock(); > > > > Thread B: > > mutex2.lock(); > > if (mutex1.trylock()) > > mutex1.unlock(); > > mutex2.lock() > I do not understand the above example. > Is it really mutex1 which is unlocked in the "if" ? > or mutex2 which is 're-)locked at the end ? Sorry, the last one here should be mutex2.unlock(). It was supposed to be the reverse of the one above (mutex2->mutex1, instead of mutex1->mutex2), just with a trylock for mutex1 instead of a lock. > > > > So what I'm thinking is: trylock() adds no edges, but lock() adds edges to > > locks held by (succeeding) trylocks(), because those could actually block. > To be sure I understand the idea, I rephrase : > a Lock on mutex M adds a relationship M->L for all locks L currently held > (whatever the way they were acquired: lock or trylock) > a Trylock on mutex M does not add a relationship to any locks held. Yes, that's the idea. I think this should be sufficient. > > Does this sound reasonable, am I missing anything? This wouldn't report a > > false positive on your test case, David. > > > > The real question though: Am I opening up for any false negatives here? > Difficult question effectively. > Did spent some time trying to find an example of either false+ or false-. > could not find one. Not a proof of course :). I don't think a succeeding/failing trylock by itself can cause a deadlock, as it never blocks. A succeeding one can definitely lead to deadlocks if taking further locks.
Created attachment 91503 [details] testcase Hello, Any news about this? This is producing a lot of false positives in our code too. Basically our code always takes m before m2, except in one case where it tries to take m after m2, but gracefully aborts if it doesn't manage to do it. ATM, we are obliged to generate suppressions for all code spots because when we are unlucky the special case happens first, before all normal cases, and thus *all* the normal cases get frowned upon by helgrind... The testcase I'm attaching reflects this and can be integrated in helgrind as a testcase. I agree that what should happen is that helgrind doesn't add a dependency when mutex_trylock gets called, but add dependencies on previous mutex_trylocks when mutex_lock gets called. Samuel
Nothing on my end, I still believe the fix is fairly straightforward as you describe. I stopped using helgrind locally when tsan added deadlock detection which is covered by our buildbots.
Git commit d3e59db950789c4302f8f2ee2865bf1304853392 by Milian Wolff. Committed on 04/04/2020 at 08:42. Pushed by mwolff into branch 'master'. ignore races in QOrderedMutexLocker M +28 -0 kde.supp https://commits.kde.org/kde-dev-scripts/d3e59db950789c4302f8f2ee2865bf1304853392