243232 – Inconsistent Lock Orderings report with trylock

Bug 243232 - Inconsistent Lock Orderings report with trylock

Summary: Inconsistent Lock Orderings report with trylock

Status:	REPORTED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	helgrind (show other bugs)
Version:	3.5.0
Platform:	Compiled Sources Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2010-06-30 11:01 UTC by Petr Ovtchenkov
Modified:	2020-04-04 08:42 UTC (History)
CC List:	6 users (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
testcase (593 bytes, text/plain) 2015-03-09 10:51 UTC, Samuel Thibault	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Petr Ovtchenkov 2010-06-30 11:01:05 UTC

Version:           3.5.0 (using KDE 1.2) 
OS:                Linux

Lock order violation wrongly (IMO) reported, if one of the lock was established by acquisition via trylock:

/* thread 1 */
...
pthread_mutex_lock( lk1 );
...
r = pthread_mutex_trylock( lk2 );

if ( r == EBUSY ) {
...
}

/* thread 2 */
...
r = pthread_mutex_trylock( lk2 );
if ( r == EBUSY ) {
}
...
pthread_mutex_lock( lk1 );

The same issue, if only in one thread used pthread_mutex_trylock( lk2 ): data race is possible, but deadlock due to inconsistent lock ordering---impossible.

Reproducible: Didn't try

Comment 1 Olivier Goffart 2010-08-12 11:57:45 UTC

Note: this causes _a lot_ of false positive in applications using Qt 4.7 where this mechanism is used inside QObject::connect

tryLock should not enforce ordering if it is the second mutex.

Comment 2 David Faure 2012-10-04 17:49:38 UTC

Confirmed. Helgrind is right, that the lock order is actually wrong, but there is valid code for locking in the "wrong order" using tryLock for the second mutex, the one that's supposed to be locked first.

See (Qt4-based) testcase at http://www.davidfaure.fr/2012/qmutex_trylock.cpp

Helgrind warns about the wrong order, and it's being completely correct, this isn't the expected order. But because it's a tryLock and not a lock, it can't actually deadlock.

I'm not sure if this means helgrind should ignore tryLock completely, or only in some cases, though.

Comment 3 Peter Boström 2013-09-29 13:58:37 UTC

An easy fix for the first case should be relatively simple, only add edges to the lock-order graph when trylock() succeeds, but that will still trigger false positives in the Qt case.

trylock() however will never block, so no deadlock can be caused on the thread calling it. So, we still need to catch the following:

Thread A:
mutex1.trylock(); // succeeds
mutex2.lock(); // blocked on thread B

Thread B:
mutex2.lock(); // succeeds
mutex1.lock(); // blocked on thread A

While not catching: 

Thread A:
mutex1.lock();
mutex2.lock();
mutex2.unlock();
mutex1.unlock();

Thread B:
mutex2.lock();
if (mutex1.trylock())
  mutex1.unlock();
mutex2.lock()

So what I'm thinking is: trylock() adds no edges, but lock() adds edges to locks held by (succeeding) trylocks(), because those could actually block. Does this sound reasonable, am I missing anything? This wouldn't report a false positive on your test case, David.

The real question though: Am I opening up for any false negatives here?

Comment 4 Philippe Waroquiers 2013-09-29 16:37:30 UTC

(In reply to comment #3)
> An easy fix for the first case should be relatively simple, only add edges
> to the lock-order graph when trylock() succeeds, but that will still trigger
> false positives in the Qt case.
> 
> trylock() however will never block, so no deadlock can be caused on the
> thread calling it. So, we still need to catch the following:
> 
> Thread A:
> mutex1.trylock(); // succeeds
> mutex2.lock(); // blocked on thread B
> 
> Thread B:
> mutex2.lock(); // succeeds
> mutex1.lock(); // blocked on thread A
> 
> While not catching: 
> 
> Thread A:
> mutex1.lock();
> mutex2.lock();
> mutex2.unlock();
> mutex1.unlock();
> 
> Thread B:
> mutex2.lock();
> if (mutex1.trylock())
>   mutex1.unlock();
> mutex2.lock()
I do not understand the above example.
Is it really mutex1 which is unlocked in the "if" ?
or mutex2 which is 're-)locked at the end ?

> 
> So what I'm thinking is: trylock() adds no edges, but lock() adds edges to
> locks held by (succeeding) trylocks(), because those could actually block.
To be sure I understand the idea, I rephrase :
a Lock on mutex M adds a relationship M->L for all locks L currently held
    (whatever the way they were acquired: lock or trylock)
a Trylock on mutex M does not add a relationship to any locks held.

> Does this sound reasonable, am I missing anything? This wouldn't report a
> false positive on your test case, David.
> 
> The real question though: Am I opening up for any false negatives here?
Difficult question effectively.
Did spent some time trying to find an example of either false+ or false-.
could not find one. Not a proof of course :).

Comment 5 Peter Boström 2013-09-29 20:01:11 UTC

(In reply to comment #4)
> (In reply to comment #3)
> > An easy fix for the first case should be relatively simple, only add edges
> > to the lock-order graph when trylock() succeeds, but that will still trigger
> > false positives in the Qt case.
> > 
> > trylock() however will never block, so no deadlock can be caused on the
> > thread calling it. So, we still need to catch the following:
> > 
> > Thread A:
> > mutex1.trylock(); // succeeds
> > mutex2.lock(); // blocked on thread B
> > 
> > Thread B:
> > mutex2.lock(); // succeeds
> > mutex1.lock(); // blocked on thread A
> > 
> > While not catching: 
> > 
> > Thread A:
> > mutex1.lock();
> > mutex2.lock();
> > mutex2.unlock();
> > mutex1.unlock();
> > 
> > Thread B:
> > mutex2.lock();
> > if (mutex1.trylock())
> >   mutex1.unlock();
> > mutex2.lock()
> I do not understand the above example.
> Is it really mutex1 which is unlocked in the "if" ?
> or mutex2 which is 're-)locked at the end ?

Sorry, the last one here should be mutex2.unlock(). It was supposed to be the reverse of the one above (mutex2->mutex1, instead of mutex1->mutex2), just with a trylock for mutex1 instead of a lock.

> > 
> > So what I'm thinking is: trylock() adds no edges, but lock() adds edges to
> > locks held by (succeeding) trylocks(), because those could actually block.
> To be sure I understand the idea, I rephrase :
> a Lock on mutex M adds a relationship M->L for all locks L currently held
>     (whatever the way they were acquired: lock or trylock)
> a Trylock on mutex M does not add a relationship to any locks held.

Yes, that's the idea. I think this should be sufficient.

> > Does this sound reasonable, am I missing anything? This wouldn't report a
> > false positive on your test case, David.
> > 
> > The real question though: Am I opening up for any false negatives here?
> Difficult question effectively.
> Did spent some time trying to find an example of either false+ or false-.
> could not find one. Not a proof of course :).

I don't think a succeeding/failing trylock by itself can cause a deadlock, as it never blocks. A succeeding one can definitely lead to deadlocks if taking further locks.

Comment 6 Samuel Thibault 2015-03-09 10:51:28 UTC

Created attachment 91503 [details]
testcase

Hello,

Any news about this? This is producing a lot of false positives in our code too. Basically our code always takes m before m2, except in one case where it tries to take m after m2, but gracefully aborts if it doesn't manage to do it.

ATM, we are obliged to generate suppressions for all code spots because when we are unlucky the special case happens first, before all normal cases, and thus *all* the normal cases get frowned upon by helgrind...

The testcase I'm attaching reflects this and can be integrated in helgrind as a testcase.

I agree that what should happen is that helgrind doesn't add a dependency when mutex_trylock gets called, but add dependencies on previous mutex_trylocks when mutex_lock gets called.

Samuel

Comment 7 Peter Boström 2015-03-09 11:10:58 UTC

Nothing on my end, I still believe the fix is fairly straightforward as you describe. I stopped using helgrind locally when tsan added deadlock detection which is covered by our buildbots.

Comment 8 Milian Wolff 2020-04-04 08:42:31 UTC

Git commit d3e59db950789c4302f8f2ee2865bf1304853392 by Milian Wolff.
Committed on 04/04/2020 at 08:42.
Pushed by mwolff into branch 'master'.

ignore races in QOrderedMutexLocker

M  +28   -0    kde.supp

https://commits.kde.org/kde-dev-scripts/d3e59db950789c4302f8f2ee2865bf1304853392