Bug 459980 - setTimeoutSpecialCommand breaks FileJob
Summary: setTimeoutSpecialCommand breaks FileJob
Status: RESOLVED FIXED
Alias: None
Product: frameworks-kio
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: git master
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: KIO Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-04 16:47 UTC by Jan
Modified: 2022-12-01 12:49 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
backtrace of kio-fuse master @v5.0.1-16-gfbd09a3 (14.27 KB, text/x-log)
2022-10-04 16:47 UTC, Jan
Details
output of "kio-fuse -d" (@v5.0.1-16-gfbd09a3) (412.73 KB, text/plain)
2022-10-12 13:31 UTC, Jan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan 2022-10-04 16:47:17 UTC
Created attachment 152586 [details]
backtrace of kio-fuse master @v5.0.1-16-gfbd09a3

While giving https://invent.kde.org/system/kio-fuse/-/merge_requests/36 some test usage I've found that sometimes, video playback stops. mpv can still seek into previously cached positions, but does not seem to receive any more data from kio-fuse. Note that new clients can be started and will happily get content (until they too presumably freeze up).

This also happens in master (without the above branch applied). I have saved a coredump of kio-fuse for possible further analysis while it was frozen on playback for mpv and the backtrace is attached to the report.
Comment 1 a.saoutkin 2022-10-11 20:17:26 UTC
This looks to me like KIO FUSE hasn't replied to a FUSE request.
If you run KIO FUSE in debugging mode can you post the output? We should see the request that KIO FUSE isn't replying to.
Comment 2 Jan 2022-10-12 13:31:45 UTC
Created attachment 152744 [details]
output of "kio-fuse -d" (@v5.0.1-16-gfbd09a3)
Comment 3 a.saoutkin 2022-10-27 23:06:59 UTC
There's definitely a bug in SFTP here that needs to be fixed:
> kf.kio.core: finished() called twice! Please fix the kio_sftp KIO slave

We can see there's a stall in the call to read() as there is no reply to request 1902. However, I'm not (yet) convinced that the bug in SFTP noted above is causing this.

@Harald, could you take a look at this on the SFTP side?

@Jan, could you do the following?
1) Provide the version of kio-extras used for this.
2) Debug output for the same, but with my feature branch you were testing.
Comment 4 a.saoutkin 2022-10-27 23:07:37 UTC
Oh and in addition, are you able to replicate this with another protocol, such as file?
Comment 5 Jan 2022-10-28 18:07:52 UTC
This is all on an up-to-date arch:
kio: 5.99.0-1
kio-extras: 22.08.2-1
kio-fuse: 5.0.1-1

There is an easy way to reproduce this locally:

$ dd if=/dev/null of=/tmp/input bs=1 count=0 seek=9G
$ sudo systemctl start sshd
$ ./kio-fuse/utils/mount-url.sh sftp://localhost
$ pv /run/user/1000/kio-fuse-*/sftp/localhost/tmp/input > /dev/null

Your filejob feature branch makes this bug much less likely to happen, so a loop is needed instead:
$ while true; do pv /run/user/1000/kio-fuse-*/file/tmp/input > /dev/null; done

I use pv for the easy progress meter, but dd and cp works too. 

On master this usually triggers in about 30s-2m. For your feature branch this can take a *lot* longer (sometimes a couple minutes, sometimes an hour or longer). Debug logs are obviously pretty large because of this, so I think it's better to see if you can reproduce yourself first.

I cannot reproduce this with file:// or smb:// (on localhost) so far.
Comment 6 Fabian Vogt 2022-10-29 21:43:40 UTC
That's actually much easier to reproduce than I expected, I can trivially repeat it here.

To debug it, I added some code to log all KIO slave commands. That showed that CMD_CLOSE was actually sent twice to the slave, which turned out to be a bug in KIO: https://invent.kde.org/frameworks/kio/-/merge_requests/1013
That's unrelated though and (in this case at least) harmless.

I found the root cause a bit later. The sftp ioslave calls setTimeoutSpecialCommand, which means that SFTPWorker::special is called in regular intervals from the dispatching loop. After the sftp slave was ported from SlaveBase to WorkerBase, it now calls finished(), which resets the slave and returns it to the pool. For FileJob this is fatal, as the state is lost and all the application gets is a successful (!) code from the finished() signal!

FWICT the right way here is to change WorkerSlaveBaseBridge::special from using finalize(base->special(data)); to maybeError(base->special(data)); and revert 4747517737dcfbfa1964ef818908a69067b7936b.
Comment 7 Bug Janitor Service 2022-10-29 22:03:00 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/kio/-/merge_requests/1015
Comment 8 Bug Janitor Service 2022-11-05 12:48:31 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/kio/-/merge_requests/1026
Comment 9 Fabian Vogt 2022-12-01 12:49:52 UTC
Git commit 99228414ad1050d22e40c9913e7280287b60f22c by Fabian Vogt.
Committed on 01/12/2022 at 12:12.
Pushed by fvogt into branch 'master'.

Ignore finished() from the timeoutSpecialCommand

If a timeoutSpecialCommand is set, SlaveBase triggers a call to the Slave's
special() method after the given timeout expires. This call is triggered as
if it was coming from the application, so it must be taken care of that it
does not actually send any result, which the application might not expect.
In particular in the case of FileJob, calling finished() stopped the Job
immediately, the result() signal is emitted with no error set.

Ignore calls to finished() and error() from special() in that case.

M  +11   -2    src/core/slavebase.cpp

https://invent.kde.org/frameworks/kio/commit/99228414ad1050d22e40c9913e7280287b60f22c