Bug 409367 - exit_group() after signal arriving to thread waiting in futex() causes hangs
Summary: exit_group() after signal arriving to thread waiting in futex() causes hangs
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Philippe Waroquiers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-01 14:11 UTC by Allison Karlitskaya
Modified: 2019-07-11 20:39 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
small reproducer (662 bytes, text/x-csrc)
2019-07-01 14:11 UTC, Allison Karlitskaya
Details
fix hands and loops when process sends signal to itself (10.45 KB, patch)
2019-07-02 20:29 UTC, Philippe Waroquiers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Allison Karlitskaya 2019-07-01 14:11:35 UTC
Created attachment 121256 [details]
small reproducer

If the main thread of a traced program calls exit_group() very soon after a signal is delivered to another thread in the program which happens to be waiting on a futex() then valgrind will start consuming 100% CPU and not exit.

Expected: valgrind should exit.

I suspect this has something to do with signals interacting with the futex call wrapping because if I replace the futex wait with a normal sleep then everything works properly.

Also: if the process waits around a while before it exits, there is also no problem.  In order for the problem to occur the program needs to exit quickly.
Comment 1 Allison Karlitskaya 2019-07-01 15:13:53 UTC
Forgot to mention: this is against valgrind-3.14.0 as shipped with Ubuntu 19.04.
Comment 2 Philippe Waroquiers 2019-07-01 19:40:24 UTC
This looks very similar to a loop reproduced with 409141.
Comment 3 Philippe Waroquiers 2019-07-02 20:29:53 UTC
Created attachment 121295 [details]
fix hands and loops when process sends signal to itself

I have tested with the reproducer attached, and it works.
The test added by the patch is similar to this test.

If no remark on the approach, I will push in a few days ...
Comment 4 Philippe Waroquiers 2019-07-10 22:38:01 UTC
Pushed as 63a9f0793
Comment 5 Allison Karlitskaya 2019-07-11 07:41:45 UTC
(In reply to Philippe Waroquiers from comment #4)
> Pushed as 63a9f0793

Thanks very much, Philippe.

A few questions, if you don't mind:

1) is there any workaround to this problem that you can imagine (in terms of commandline flags, etc.) that void avoid the problem other than to update valgrind to a version that includes this patch?  Our current workaround is to add a sleep on the main thread before exit, and I'd like to remove that ASAP.

2) when is this patch likely to appear in a release?  when is it likely to appear in a stable release?

3) do you think this patch is suitable for backporting/vendor-patching for distro packages?

Thanks again.
Comment 6 Philippe Waroquiers 2019-07-11 20:39:48 UTC
(In reply to Allison Karlitskaya from comment #5)
> (In reply to Philippe Waroquiers from comment #4)
> > Pushed as 63a9f0793
> 
> Thanks very much, Philippe.
> 
> A few questions, if you don't mind:
> 
> 1) is there any workaround to this problem that you can imagine (in terms of
> commandline flags, etc.) that void avoid the problem other than to update
> valgrind to a version that includes this patch?  Our current workaround is
> to add a sleep on the main thread before exit, and I'd like to remove that
> ASAP.
I do not see a workaround at valgrind command line level.

> 
> 2) when is this patch likely to appear in a release?  when is it likely to
> appear in a stable release?
We only have stable releases :).
Typically, there is a release every 6 months or so.
The last release was the 12 of April.


> 
> 3) do you think this patch is suitable for backporting/vendor-patching for
> distro packages?
The patch can for sure be backported, if some distro want to do it.