Bug 407313 - Deadlock when syscall is handled in other thread
Summary: Deadlock when syscall is handled in other thread
Status: RESOLVED INTENTIONAL
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (other bugs)
Version First Reported In: 3.13.0
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-07 19:16 UTC by Falk Werner
Modified: 2019-05-08 05:34 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
strace and stacks of test application (191.22 KB, application/x-zip-compressed)
2019-05-07 19:16 UTC, Falk Werner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Falk Werner 2019-05-07 19:16:02 UTC
Created attachment 119899 [details]
strace and stacks of test application

SUMMARY

I use libfuse to write a simple filesystem. In order to test it, I created a simple application with two threads:
- one is driving the fuse filesystem (e.g. responding to syscalls like read)
- the other uses normal filesystem operations, such as stat, to test the filesystem

When executing this with Valgrind/Memcheck a deadlock occurs as soon as stat is called.

STEPS TO REPRODUCE
1. minimal example code can be optained here: https://github.com/falk-werner/libfuse-valgrind-deadlock
2. use `make run` to create and execute a docker container
3 valgrind ./libfuse-test

OBSERVED RESULT
The application will deadlock (see attached files for stack traces).

EXPECTED RESULT
No deadlock should appear.

SOFTWARE/OS VERSIONS
Linux: Ubuntu 18.04

ADDITIONAL INFORMATION
When the deadlock appears, fuse is trying to send the filesystem request, but the other thread is blocked in a read(1028, "A", 1) operation originated from Valgrind.
Comment 1 Tom Hughes 2019-05-07 19:37:25 UTC
What was the actual system call (ie which variant of stat) that caused the deadlock?
Comment 2 Tom Hughes 2019-05-07 19:39:08 UTC
Also, the valgrind output would be more useful that the strace. Make sure you include --trace-syscalls=yes at a minimu,
Comment 3 Tom Hughes 2019-05-07 19:44:41 UTC
The answer appears to be sys_newstat:

SYSCALL[10855,1](4) sys_newstat ( 0x4ea48e0(/tmp/libfuse_test_LKz1UA), 0x1ffefff1a0 )

Which by the looks of it has not been marked as potentially blocking.

I think we've been through this before - this use of fuse where the server is in the same process creates situations where system calls which would not normally be considered blocking are and I think we were reluctant to mark lots more system calls as blocking to support such as estoeric use case.
Comment 4 Tom Hughes 2019-05-07 19:47:31 UTC
I think https://bugs.kde.org/show_bug.cgi?id=278057 is what I was thinking of and it looks like we did do something - you just need to add --sim-hints=fuse-compatible to tell valgrind you're using fuse.
Comment 5 Falk Werner 2019-05-08 05:34:07 UTC
Thanks a lot, --sim-hints=fuse-compatible solves the issue. I was not aware, that such a flag exists. 

I will spend more effort in investigation next time. Sorry for inconvience.