Bug 452729 - SlaveBase::mimeType never returns
Summary: SlaveBase::mimeType never returns
Status: RESOLVED FIXED
Alias: None
Product: frameworks-kio
Classification: Unclassified
Component: general (show other bugs)
Version: git master
Platform: Other Linux
: HI major
Target Milestone: ---
Assignee: KIO Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-04-18 11:33 UTC by medin
Modified: 2022-05-03 17:34 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.94


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description medin 2022-04-18 11:33:55 UTC
When I try to copy some files (over 500MB) to Windows 10 machine I always get that "Device or resource busy" while the target machine is totally fine with much available resources. The copying works for small files but for bigger ones it starts and stops at random stage. The copy from Windows to Windows and even from Android to Windows works fine.
Comment 1 Harald Sitter 2022-04-18 11:35:38 UTC
Please fill in

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version:
Comment 2 Harald Sitter 2022-04-18 11:37:42 UTC
Also the data listed here while at it please https://community.kde.org/Guidelines_and_HOWTOs/Debugging/Debugging_IOSlaves/Debugging_kio_smb#Reporting_Bugs
Comment 3 medin 2022-04-18 11:57:31 UTC
(In reply to Harald Sitter from comment #2)
> Also the data listed here while at it please
> https://community.kde.org/Guidelines_and_HOWTOs/Debugging/Debugging_IOSlaves/
> Debugging_kio_smb#Reporting_Bugs

Windows: 10, 21H2

Operating System: Manjaro Linux
KDE Plasma Version: 5.24.4
KDE Frameworks Version: 5.92.0
Qt Version: 5.15.3
Kernel Version: 5.16.18-1-MANJARO (64-bit)
Graphics Platform: X11


Log:

dolphin --new-window smb://192.168.1.100/Share
[     0.000](5138)/(kf.kio.slaves.trash) unknown: Directory "/mnt/0A36AFE336AFCE4F/.Trash-1003" exists but didn't pass the security checks, can't use it
[     0.002](5138)/(kf.kio.slaves.trash) unknown: Directory "/mnt/FEF0C76EF0C72C29/.Trash-1003" exists but didn't pass the security checks, can't use it
[     0.002](5138)/(kf.kio.slaves.trash) unknown: Directory "/mnt/989A18F59A18D19C/.Trash-1003" exists but didn't pass the security checks, can't use it
[     0.000](5144)/(kf.kio.core) unknown: error() called twice! Please fix the kioslave5 KIO slave
[     0.000](5121)/(default) unknown: ""
[     0.000](5395)/(kf.kio.core) unknown: error() called twice! Please fix the kioslave5 KIO slave
Comment 4 medin 2022-04-18 13:13:15 UTC
And this is the log using libsmbclient method:  
https://pastebin.com/raw/y6mP0SKZ
Comment 5 Harald Sitter 2022-04-19 13:31:27 UTC
This is the weirdest bug I've seen like ever. It's 100% reproducible and also not that hard to hit during regular use. Bumping bug importance consequently. Moving to KIO since I suspect there's a bug in there somewhere. This happens also with samba servers.

The easiest way to test is to simply copy a file from local to remote twice, i.e. have an overwrite on the second copy. `smbstatus -L` on the server should show that the file remains locked after the second copy and further copying will fail on the busy file.

What happens here ultimately is that mimeType() in the SMB worker never returns. Why that happens I don't understand.
Specifically this is the result of a rather unfortunate chain of events: Something inside plasmashell (supposedly the progress notification) is causing a call to ::get() (supposedly for mimetype query) and that SMB::get() call never returns because it gets stuck on emitting mimeType(), which in turn is stuck because the Interface never replies with the correct CMD. Because mimeType() never returns, get() never returns, thus the open file is never closed and that is why the file is busy (i.e. open file = locked file = busy file = cannot be written to). 

I've tried to bisect this in KIO but that lead to 1306904f90b21b663e68a7dec4f67d81ef4fa11c which I don't see how it cause this other than incidentally enabling whatever it is inside plasmashell that causes the ::get() call.

Needs someone who has the slave IPC loaded into brain.

David, Meven, thoughts?

As a workaround one can `killall kioslave5` and that should free the file lock almost instantly.
Comment 6 Méven Car 2022-04-25 09:00:15 UTC
(In reply to Harald Sitter from comment #5)
> This is the weirdest bug I've seen like ever. It's 100% reproducible and
> also not that hard to hit during regular use. Bumping bug importance
> consequently. Moving to KIO since I suspect there's a bug in there
> somewhere. This happens also with samba servers.
> 
> The easiest way to test is to simply copy a file from local to remote twice,
> i.e. have an overwrite on the second copy. `smbstatus -L` on the server
> should show that the file remains locked after the second copy and further
> copying will fail on the busy file.
> 
> What happens here ultimately is that mimeType() in the SMB worker never
> returns. Why that happens I don't understand.
> Specifically this is the result of a rather unfortunate chain of events:
> Something inside plasmashell (supposedly the progress notification) is
> causing a call to ::get() (supposedly for mimetype query) and that
> SMB::get() call never returns because it gets stuck on emitting mimeType(),
> which in turn is stuck because the Interface never replies with the correct
> CMD. Because mimeType() never returns, get() never returns, thus the open
> file is never closed and that is why the file is busy (i.e. open file =
> locked file = busy file = cannot be written to). 
> 
> I've tried to bisect this in KIO but that lead to
> 1306904f90b21b663e68a7dec4f67d81ef4fa11c which I don't see how it cause this
> other than incidentally enabling whatever it is inside plasmashell that
> causes the ::get() call.
> 
I guess the ksmserver/knotifications can now trigger mimetype from the QML like for displaying icons.

> Needs someone who has the slave IPC loaded into brain.
> 
> David, Meven, thoughts?
> 

First check smb ioworker code.

```
[     0.000](5144)/(kf.kio.core) unknown: error() called twice! Please fix the kioslave5 KIO slave
[     0.000](5121)/(default) unknown: ""
[     0.000](5395)/(kf.kio.core) unknown: error() called twice! Please fix the kioslave5 KIO slave
```
We have those traces and it points towards the ioworker.

Something I noticed in get() for instance, if the file reads fails the job emits error and finished even returning processed(st->size);

A naive patch would be:
--- smb/kio_smb_file.cpp
+++ smb/kio_smb_file.cpp
@@ -106,6 +106,8 @@ void SMBSlave::get(const QUrl &kurl)
     }
     if (future.get() != KJob::NoError) { // check if read had an error
         error(future.get(), url.toDisplayString());
+        closeWithoutFinish();
+        return;
     }

The job returning error and finished and processedSized but not mimetype might cause calling code to stall, or retry.
If several mimetype() are called at the same time might cause the same issue.

It might explain your overwrite file case: we might be writing to a file and trying to gets its mimetype, that's bound to fail until 1024 bytes are written at least, or because the size of data returned exceeds the size of the buffer.

I am not very confident about this hypothesis, I haven't checked it.

Something interesting to do, is to compare with nfs ioworker.
Comment 7 Harald Sitter 2022-04-25 11:29:42 UTC
I've already tried my luck on the worker, but couldn't really find it misbehaving. The thing to note is that this is an all-success scenario.

The first copy completes successfully with everything in working order. The second copy does only stats on the dolphin side (noticing it is an overwrite and issuing the prompt) and the plasma side does only stats and gets, getting stuck.

Following your suggestion on trying another worker: the same happens to sftp. Overwrite a file -> plasma runs mimetype checks -> adding some strategic qdebugs I see SFTPInternal::sftpSendMimetype never returns, and for that worker we can be certain that it's not error-finished confusion because it uses the `Result` return type system.

What I've noticed is that plasma (or more specifically notifications' fileinfo.cpp in plasma-workspace) is running two mimetypefinders at the same time on the same url, could that be what's causing the confusion? Although I'd expect to at least one of them to return TBH.
Comment 8 Méven Car 2022-04-26 08:00:31 UTC
(In reply to Harald Sitter from comment #7)
> I've already tried my luck on the worker, but couldn't really find it
> misbehaving. The thing to note is that this is an all-success scenario.
> 
> The first copy completes successfully with everything in working order. The
> second copy does only stats on the dolphin side (noticing it is an overwrite
> and issuing the prompt) and the plasma side does only stats and gets,
> getting stuck.
> 
> Following your suggestion on trying another worker: the same happens to
> sftp. Overwrite a file -> plasma runs mimetype checks -> adding some
> strategic qdebugs I see SFTPInternal::sftpSendMimetype never returns, and
> for that worker we can be certain that it's not error-finished confusion
> because it uses the `Result` return type system.
> 
> What I've noticed is that plasma (or more specifically notifications'
> fileinfo.cpp in plasma-workspace) is running two mimetypefinders at the same
> time on the same url, could that be what's causing the confusion? Although
> I'd expect to at least one of them to return TBH.

So I see three issues, on the kio side we should allow multiple calls but early return failures when the resource is busy, we fail to notice that the file is being written to as we get() it, but since samba does not offer a better solution AFAICS we can't do much better.

And a second possible in plasma-workspace/applets/notifications making redundant MimeTypeFinderJob, the error seems to be on the QML side where two NotificationItem (with their JobItem using FileInfo) can get instantiated somehow in FullRepresentation and NotificationPopup.

A third issue is that MimeTypeFinderJob would like ::stat to fill UDS_MIME_TYPE but SMBSlave::statToUDSEntry does not (even naively using fileName), MimeTypeFinderJob has then a fallback using get().
Hence why we have successive calls to stat() and get() which should misbehave while SMBSlave::put is running.

SMBSlave::open() has a short path to download 1kB of the file for mimetype determination (as well as SFTPInternal::sftpSendMimetype file.cpp does fill mimetype in open() this seems like a strong expectation).
It seems to me MimeTypeFinderJob could use similar open() (or read()) short-path fallback instead of a complete get().
That would be the quickest path to a fix and more added general stability across ioworkers.
Comment 9 Harald Sitter 2022-04-26 09:26:06 UTC
All of this makes sense but I don't think any of these actually hit the heart of this bug.

I've done some more debugging to understand this problem better and stumbled upon https://invent.kde.org/frameworks/kio/-/blob/6b1fd98193e08b22dbf77e9e33606b2b1800ef88/src/core/mimetypefinderjob.cpp#L249 which is actually why this is broken to begin with.

What happens is that the worker sends the mimetype, then a blocking emit chain ends up in the aformentioned suspension and by the time the emit chain unwinds and we reach https://invent.kde.org/frameworks/kio/-/blob/6b1fd98193e08b22dbf77e9e33606b2b1800ef88/src/core/slaveinterface.cpp#L252 the connection is in suspended state and the worker gets no reply, leaving it hanging on https://invent.kde.org/frameworks/kio/-/blob/6b1fd98193e08b22dbf77e9e33606b2b1800ef88/src/core/slavebase.cpp#L707 and retaining the open file (lock). That is: this is actually a variant of bug #434455 but less obvious because we hold a remote busy. This seems flawed on a fundamental level though: we basically block the worker on waiting for something that never happens. We may as well abort it and start from scratch.

With that in mind your suggestion for using open&read in the mimetypefinder is actually the best solution. I suspect this is implemented the way it is because not all workers implement open&read though, so we still need a solution for when that is the case. A trivial approach might be to abort the worker as a whole?

i.e.

- try open()
- on failure: use get
- else: read()

- get()
- receive mimetype
- send ::kill() to the worker

get() is a sledgehammer way of getting the mimetype, so it seems only right that we also use the sledgehammer way of stopping it
Comment 10 Harald Sitter 2022-04-26 09:31:31 UTC
This seems to do the trick

@@ -245,9 +245,7 @@ void KIO::MimeTypeFinderJobPrivate::scanFileWithGet()
             m_suggestedFileName = job->queryMetaData(QStringLiteral("content-disposition-filename"));
         }
 
-        if (!m_url.isLocalFile()) { // #434455
-            job->putOnHold();
-        }
+        job->kill();
         q->emitResult();
     });
 }
Comment 11 Méven Car 2022-04-26 09:48:17 UTC
(In reply to Harald Sitter from comment #10)
> This seems to do the trick
> 
> @@ -245,9 +245,7 @@ void KIO::MimeTypeFinderJobPrivate::scanFileWithGet()
>              m_suggestedFileName =
> job->queryMetaData(QStringLiteral("content-disposition-filename"));
>          }
>  
> -        if (!m_url.isLocalFile()) { // #434455
> -            job->putOnHold();
> -        }
> +        job->kill();
>          q->emitResult();
>      });
>  }
Very interesting.
https://bugs.kde.org/show_bug.cgi?id=434455 already circumvented it for local files but what was the justification for it in the first place ?
Maybe that's just old design but better run tests & have a good look at the code before making too many assumptions.

It seems https://invent.kde.org/frameworks/kio/-/blob/6b1fd98193e08b22dbf77e9e33606b2b1800ef88/src/core/slavebase.cpp#L727 we had a way to get out of the loop but the inner loop does not break, it seems. The bug might be there in fact.
Missing
            if (cmd == CMD_NONE) {.
                break;
            }
?

Maybe a good time to open a MR and cc David.
Comment 12 Harald Sitter 2022-04-26 09:50:53 UTC
(In reply to Méven Car from comment #11)
> The bug might be there in fact.
> Missing
>             if (cmd == CMD_NONE) {.
>                 break;
>             }
> ?

That would resume the get() wouldn't it? What we actually want is for it to not resume.
Comment 13 Harald Sitter 2022-04-26 09:52:18 UTC
As in: we were only interested in the mimetype, so resuming the get() would be pointless, we'd needlessly stream the data over the network but then not do anything with it.
Comment 14 Méven Car 2022-04-26 14:37:08 UTC
(In reply to Harald Sitter from comment #12)
> (In reply to Méven Car from comment #11)
> > The bug might be there in fact.
> > Missing
> >             if (cmd == CMD_NONE) {.
> >                 break;
> >             }
> > ?
> 
> That would resume the get() wouldn't it? What we actually want is for it to
> not resume.

In fact this is already handled by

            if (!isSubCommand(cmd)) {
                break;
            }

That breaks for CMD_NONE

It is worth inspecting more the solution you found.

@@ -245,9 +245,7 @@ void KIO::MimeTypeFinderJobPrivate::scanFileWithGet()
             m_suggestedFileName = job->queryMetaData(QStringLiteral("content-disposition-filename"));
         }
 
-        if (!m_url.isLocalFile()) { // #434455
-            job->putOnHold();
-        }
+        job->kill();
         q->emitResult();
     });
 }

Interestingly `void SchedulerPrivate::putSlaveOnHold(KIO::SimpleJob *job, const QUrl &url)` which is called by `putOnHold` would kill the slave if it is called twice for the same ioworker. So it seems to validate your patch, making this more hastly.
Conceptually this makes prefect sense to stop the worker once we received the mimetype.
Comment 15 Méven Car 2022-04-26 14:52:39 UTC
Doc in slavebase.h states:

```
    /**
     * get, aka read.
     * @param url the full url for this request. Host, port and user of the URL
     *        can be assumed to be the same as in the last setHost() call.
     *
     * The slave should first "emit" the MIME type by calling mimeType(),
     * and then "emit" the data using the data() method.
     *
     * The reason why we need get() to emit the MIME type is:
     * when pasting a URL in krunner, or konqueror's location bar,
     * we have to find out what is the MIME type of that URL.
     * Rather than doing it with a call to mimetype(), then the app or part
     * would have to do a second request to the same server, this is done
     * like this: get() is called, and when it emits the MIME type, the job
     * is put on hold and the right app or part is launched. When that app
     * or part calls get(), the slave is magically reused, and the download
     * can now happen. All with a single call to get() in the slave.
     * This mechanism is also described in KIO::get().
     */
    virtual void get(const QUrl &url);
```

So get() will emit mimetype then data() but if the client does not do anything with it, it will stay opened as you found out debugging.

> This seems flawed on a fundamental level though: we basically block the worker on waiting for something that never happens. We may as well abort it and start from scratch.

I think so.
For the MimeTypeFinderJob job we explicitly want only the mimetype so killing the job should be expected, this is a different usecase than regular get().
Comment 16 David Faure 2022-05-03 09:45:16 UTC
Git commit 621585f0e818cd9f75cdd3bcc8665da7ac708283 by David Faure, on behalf of Harald Sitter.
Committed on 03/05/2022 at 09:26.
Pushed by dfaure into branch 'master'.

Don't put job/ioworker on hold after getting the mimetype

we only call get() so we can get the mimetype, we are in fact not
interested in the data. so, kill the job instead of putting it on hold.
putting it on hold means it will forever sit there waiting for
resumption. what's more it still holds any opened resources during this
time frame, which, among other things, includes file locks when used
with smb://

(kill is implicitly executed when the parent job ends)

M  +0    -3    src/core/mimetypefinderjob.cpp

https://invent.kde.org/frameworks/kio/commit/621585f0e818cd9f75cdd3bcc8665da7ac708283