Bug 291835 - KIO very slow when copying from network through smb
Summary: KIO very slow when copying from network through smb
Status: RESOLVED FIXED
Alias: None
Product: kio-extras
Classification: Frameworks and Libraries
Component: Samba (show other bugs)
Version: unspecified
Platform: Arch Linux Linux
: NOR major
Target Milestone: ---
Assignee: Harald Sitter
URL: https://bugzilla.samba.org/show_bug.c...
Keywords:
: 237972 416832 417358 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-01-18 07:42 UTC by Matthew Stobbs
Modified: 2020-12-07 02:04 UTC (History)
35 users (show)

See Also:
Latest Commit:
Version Fixed In: 20.08


Attachments
SMB Buffer size increase. (497 bytes, patch)
2013-01-07 11:04 UTC, Matthew Stobbs
Details
Wireshark log of file transfer with different smb buffer sizes and cifs (146.40 KB, image/gif)
2014-03-26 13:59 UTC, Julian Kalinowski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Stobbs 2012-01-18 07:42:30 UTC
Version:           4.7 (using KDE 4.7.4) 
OS:                Linux

When copying a file from my server (Ubuntu 11.04 server, samba 3.5.8), my Windows install copies at around 90MB/s. This is true whether using FileZilla for sftp or ftp, or Explorer for the samba connection.

On Sabayon, OpenSUSE, Fedora, Kubuntu, or compile my own with Gentoo, all network copying done through KIO doesn't get higher then 22 MB/s. On the command line, using scp, I get 87 MB/s. Using smbclient, I get a huge 98 MB/s.

This has been happening with every version of KDE from version 4.3, no matter the release. I used to be able to use sftp instead of fish, but that is missing from Sabayon (or KDE 4.7.4, not sure).

Reproducible: Always

Steps to Reproduce:
1. Connect to a remote file server using your chosen protocol (fish, smb, ftp).
2. Copy any file to your local system from the remote system (best seen with large files).
3. Wait for a long time for the files to copy.

Actual Results:  
Transfer speeds ranging from 18 MB/s to 21 MB/s on a gigabit network.

Expected Results:  
Transfer speeds ranging from 70 MB/s to 100 MB/s on a gigabit network.
Comment 1 Dawit Alemayehu 2012-01-28 01:34:07 UTC
That is because neither kio_smb nor kio_fish implement the optimization that allows ioslaves to directly save the content they retrieve from the network themselves instead of sending it through IPC to the client. If you simply grep for your .protocol files, you will see which ioslaves implement the optimization and hence provide a much more comparable download speed:

$ grep -ir -e "copyToFile" -e "CopyFromFile" /usr/share/kde4/services/*.protocol
/usr/share/kde4/services/fonts.protocol:copyToFile=false
/usr/share/kde4/services/fonts.protocol:copyFromFile=false
/usr/share/kde4/services/ftp.protocol:copyToFile=true
/usr/share/kde4/services/ftp.protocol:copyFromFile=true
/usr/share/kde4/services/sftp.protocol:copyToFile=true
/usr/share/kde4/services/sftp.protocol:copyFromFile=true
/usr/share/kde4/services/trash.protocol:copyFromFile=true
/usr/share/kde4/services/trash.protocol:copyToFile=true
/usr/share/kde4/services/videodvd.protocol:copyToFile=false
/usr/share/kde4/services/videodvd.protocol:copyFromFile=true

As you can see only sftp and ftp implement the afforementioned functionality. Hence this ticket needs to be opened against both the "smb" and "fish" ioslaves in order for whom ever maintains those ioslaves to add support for this optimization. I doubt "smb" has any maintainers right now so I will reassign this ticket to the fish ioslave for you.
Comment 2 Federico Cuello 2012-08-07 17:58:39 UTC
Increasing the receive buffer size in kio_smb.c did a great improvement in my case. I had ~14Mb/s and now I have ~50Mb/s over a gbit link. 

I changed MAX_XFER_BUF_SIZE from 16kb to 256kb.

--- kioslave/smb/kio_smb.h.orig 2012-08-07 14:48:48.602873311 -0300
+++ kioslave/smb/kio_smb.h      2012-08-07 14:49:03.785125012 -0300
@@ -75,7 +75,7 @@
 //---------------------------
 #include "kio_smb_internal.h"
 
-#define MAX_XFER_BUF_SIZE           16384
+#define MAX_XFER_BUF_SIZE           262144
 #define KIO_SMB                     7106
 
 using namespace KIO;
Comment 3 Federico Cuello 2012-08-08 00:52:20 UTC
OK, more on kio_smb:

I implemented copyToFile in kio_smb, but the transfer speed didn't improve much. What does impact a lot in transfer speed is the buffer size. I got the best results with a buffer size of around 1mb.

Also, from the command line, using smbget I got almost the same result with a similar buffer.
Comment 4 Maarten De Meyer 2012-11-16 16:15:31 UTC
*** Bug 237972 has been marked as a duplicate of this bug. ***
Comment 5 Matthew Stobbs 2013-01-07 11:04:01 UTC
Created attachment 76270 [details]
SMB Buffer size increase.

enlarge buffer size for kio_smb transfers.
Comment 6 Stijn Tintel 2013-01-15 00:21:34 UTC
For me an 1MB buffer seems to give the best result as well. With 256K buffer, a transfer from a Samba4 server over gigE maxed around 86MB/s. With 1MB buffer I get to 95MB/s. Increasing it further to 2MB did not result in a higher speed.
Comment 7 Stijn Tintel 2013-01-15 04:07:21 UTC
Seems that 95MB/s might have been my hard drive's max write speed. When copying to tmpfs, I can get up to 104MB/s using 4MB buffer size. I also tested 8MB buffer size but then kio_smb crashes. Given that many people have gigabit and SSD these days, I think it makes sense to use the value that gives the best performance. BTW, with the default value of 16K, it doesn't go over 29MB/s.
Comment 8 walmartshopper 2013-02-20 22:10:58 UTC
Ran into this today on KDE 4.10 after upgrading to a gigabit router.  Getting 109MB/s with smbclient and 16MB/s with kio_smb.
Comment 9 walmartshopper 2013-02-20 22:11:36 UTC
*** This bug has been confirmed by popular vote. ***
Comment 10 Mark 2013-07-13 16:23:26 UTC
I just had a roughly similar issue. I have a 1 gbit network and i was copying with ~6MB/s using dolphin (thus kio_smb) and an equal speed using smbclient. However, mounting it using cifs boosted the speed to ~85MB/s which i guess is my poor network card limitation.
Comment 11 Julian Kalinowski 2013-09-08 13:19:56 UTC
Still occuring in 4.10.5. Increasing buffer size helps.
Comment 12 Michael D 2013-09-15 20:13:46 UTC
I must apologize, but can someone provide instructions for a capable idiot regarding how to increase the buffer size? Does it require that I recompile my kernel after applying the patch?

I notice that I get about 4 times the read speed when mounting the drive using smb4k and transfering the files from the mount in Dolphin. So should one expect to get the same speeds by either increasing the buffer size or by reading from a mounted smb? I also notice that read speed is faster using scp over cp from the mounted smb.
Comment 13 Federico Cuello 2013-09-15 20:25:59 UTC
Hi Michael,

It's really sad that this bug isn't officially fixed yet.

What you need to do is change one line in kioslave/smb/kio_smb.h, from 
#define MAX_XFER_BUF_SIZE           16384
to
#define MAX_XFER_BUF_SIZE           1048576

How to do It depends on which distribution you are using, and you just need to recompile the kioslaves, not the kernel.

(In reply to comment #12)
> I must apologize, but can someone provide instructions for a capable idiot
> regarding how to increase the buffer size? Does it require that I recompile
> my kernel after applying the patch?
> 
> I notice that I get about 4 times the read speed when mounting the drive
> using smb4k and transfering the files from the mount in Dolphin. So should
> one expect to get the same speeds by either increasing the buffer size or by
> reading from a mounted smb? I also notice that read speed is faster using
> scp over cp from the mounted smb.
Comment 14 Mark 2013-09-15 20:43:31 UTC
Frederico, we can't "just" increase the size in the smb slave. It has to be thoroughly tested to see if that buffer size can be expected to work like a charm for everyone. Introducing regression is obviously not something i'd like to do :)

Also let me remind you that the smb slave is from the KDE2!!! times. It's ancient and we should be very happy that it even works.

Now i am porting this slave to KF5/Qt5 so i will certainly take this increased buffer size patch into consideration, but i will also look at how cifs has done it since that gives the best performance for me.
Comment 15 Federico Cuello 2013-09-15 21:01:49 UTC
(In reply to comment #14)
> Frederico, we can't "just" increase the size in the smb slave. It has to be
> thoroughly tested to see if that buffer size can be expected to work like a
> charm for everyone. Introducing regression is obviously not something i'd
> like to do :)

Yes, you can. Changing a buffer size is not a complete rewrite or any mayor change. I agree that it should be thoroughly tested, but here you can see that it has been tested by some people, and I guess that also by other people that didn't comment on this thread. This bug was reported more than a year and a half ago, a patch was provided, and successful test reports have been provided. There alse were beta and rc releases where it could also have been tested.

> Also let me remind you that the smb slave is from the KDE2!!! times. It's
> ancient and we should be very happy that it even works.
> 
> Now i am porting this slave to KF5/Qt5 so i will certainly take this
> increased buffer size patch into consideration, but i will also look at how
> cifs has done it since that gives the best performance for me.

I'm looking forward to that.
Comment 16 Michael D 2013-09-16 13:38:06 UTC
Thanks for your help Federico. I'm afraid I've no clue how to recompile the kioslaves on my system (Kubuntu 13.10), so I'll just stick to mounting the share first and then accessing it that way, since that appears to give good performance.

My voice doesn't mean much, but I---and likely the majority of people using relatively modern hardware---would be happy if someone implemented the patch already with a reasonably safe buffer size. We're talking about 400x performance gains!
Comment 17 Michael D 2013-09-16 13:38:50 UTC
I meant "We're talking 400% (not times) performance gains!!!!!".
Comment 18 Krzysztof Marczak 2013-10-22 16:44:26 UTC
Actually I'm using KDE 4.10.5 on Debian and I observe even worse scenario. With Gigabit Ethernet maximum speed which I get is 1.3MB/s via kio_sftp and about 5MB/s via kio_smb. If I copy the same big files using e.g Gnome Commander I'm getting about 60MB/s via sftp. Copying of local files is also about 2 times slower with KDE applications (Krusader, Dolphin) than using other. In many cases it makes KDE unusable for professional work with bigger files over the network.
Comment 19 Clésio Luiz 2013-11-17 12:18:57 UTC
So increase it and let people test it. KDE samba performance is very very low.
Comment 20 Dawit Alemayehu 2013-11-25 05:34:05 UTC
Git commit 6c0de3209da77eb10976af72307188eb68aa0689 by Dawit Alemayehu.
Committed on 23/10/2013 at 13:01.
Pushed by adawit into branch 'KDE/4.12'.

Add support for 'copyToFile' and 'copyFromFile' optimizations.

This implementation should make copying files from windows shares to local file
faster. Additionally, this patch also adds support for other missing features
such as partial download resumption and modified timestamp preservation.
Related: bug 176271
REVIEW: 112982

M  +5    -1    kioslave/smb/kio_smb.h
M  +417  -5    kioslave/smb/kio_smb_dir.cpp
M  +16   -0    kioslave/smb/kio_smb_internal.cpp
M  +7    -0    kioslave/smb/kio_smb_internal.h
M  +2    -0    kioslave/smb/smb.protocol

http://commits.kde.org/kde-runtime/6c0de3209da77eb10976af72307188eb68aa0689
Comment 21 Julian Kalinowski 2013-12-28 00:18:11 UTC
Even with the mentioned commit, i only get 20 MiB/s.
With a buffer size of 262144, i get 54MiB/s (with or without the new commit).

And when mounted with cifs, i get 90MiB/s.
So the bug is not fixed yet. At least increase the buffer size to 262144.
Comment 23 Dawit Alemayehu 2014-03-26 12:59:15 UTC
Oh and before someone else proposes to increase the buffer size yet again
https://git.reviewboard.kde.org/r/113915/
Comment 24 Mark 2014-03-26 13:12:02 UTC
I don't get it..

The current buffer size:
16348

Then from the MSDN link you just posted every modern day PC is very likely to have:
16644 (on windows that is)

The difference there is minor: 16644 - 16348 = 296
That's minor. Yet the speed between SMB - windows copy and a SMB - linux copy is still far faster on my pc when i copy from SMB to Windows then when i copy to Linux.

Yes, it improved greatly with the current 16348 and windows now isn't 10x faster anymore (just ~2x). But there must be something else we're missing somewhere.

Linux is not the problem since if i mount a SMB share through CIFS i get faster speeds then on windows. Is there a way for me to see the SMB negotiation headers?
Comment 25 Mark 2014-03-26 13:30:34 UTC
Ohh, this changes things a bit.

Quote: "If CAP_LARGE_READX  or CAP_LARGE_WRITEX  capability is enabled on the SMB Negotiate Server Response,   the maximum buffer size  used  is  61440 (60K)  for large read( SMB_COM_ READ_ANDX )  and 65535 (64K) for large write (SMB_COM_WRITE_ANDX) , regardless of MaxBufferSize.   But this is only true if the SMB signing is not turned on (we will discuss this further in the next section). "

On windows they default to enable CAP_LARGE_READX and CAP_LARGE_WRITEX. They also default to disable server signing. Samba defaults to large readwrite to true and server signing to disabled as can be read here: http://www.samba.org/samba/docs/man/manpages-3/smb.conf.5.html

Thus the MAX_XFER_BUF_SIZE in KIO should be set to 65535 if i'm reading the documentation correctly.

However, since this value is dependent on memory and samba server settings, that means it has to have a bit of logic in it to determine the correct value (just a define won't do). In other terms: there is some work to do there in getting it working how it should be working according to the defaults.
Comment 26 Julian Kalinowski 2014-03-26 13:58:07 UTC
As mentioned in the previous post, MSDN entry states that if supported, Large Read/Large Write are used with Buffer sizes of around 60k.

I analyzed the Pakets with wireshark:
kio_smb (unpatched): requests 16348 bytes, awaits answer, requests next 16kb.
kio_smb (patched with max buffer of 104857): requests 8x 130048 bytes and then 1x 8192 bytes before waiting for responses. Receives next bunch after waiting for all responses.
cifs mounted: requests 2x 61440 bytes, waits for ONE answer, requests next 61kb.

The cifs behaviour makes sense: Instead of wasting transmission time while waiting for the next packet, it requests 2 at once at makes sure it has some bytes requested all the time.
Additionally, it uses the buffer size defined for max read as stated in the MSDN thread: 61440 Bytes.

The kio_smb default behaviour is stupid. It has to wait for every packet before requesting the next and additionally, it uses just 16kb buffer size for some reason.

kio_smb with a patched buffer size behaves weird, but speeds up transfers because it requests multiple batches at once.

PLEASE: fix this bug already. All information is there, cifs implementation is good.
Comment 27 Julian Kalinowski 2014-03-26 13:59:10 UTC
Created attachment 85765 [details]
Wireshark log of file transfer with different smb buffer sizes and cifs
Comment 28 Mark 2014-03-26 14:15:12 UTC
(In reply to comment #26)
> PLEASE: fix this bug already. All information is there, cifs implementation
> is good.

Yes, the information is there when you know where to look. I didn't know all this stuff till just a few hours ago. I also didn't know how CIFS was behaving untill you pointed it out with a wireshark trace.

I really like to thank you for your trace and further investigation after my last reply. This - while knowing the solution - makes it worth diving into kio_smb and fixing it. I don't have a lot of time for doing that in the coming weeks so i would welcome anyone willing to pick it up.

Otherwise i will put it on my list of things to do at the frameworks sprint (end of april 2014).
Comment 29 Dawit Alemayehu 2014-03-27 04:24:38 UTC
This is just pointless! Have any of you even bothered to check the commit link in comment#20 to see what the buffer size is set to now??????

Really this is just tiring and this is not even my code or something I maintain. I am done!
Comment 30 Mark 2014-03-27 17:25:27 UTC
(In reply to comment #29)
> This is just pointless! Have any of you even bothered to check the commit
> link in comment#20 to see what the buffer size is set to now??????
> 
> Really this is just tiring and this is not even my code or something I
> maintain. I am done!

I don't know if you meant it as a reply to me or in general. Sorry for not actually checking the code. I took the reviewboard link you posted last: https://git.reviewboard.kde.org/r/113915/ for granted and wrongly assumed that the value MAX_XFER_BUF_SIZE (which was 16348) was still the current value since the review was discarded.

Sorry for that.

Either way, i still think it's worth to take a look at this further. Which i will do at the frameworks sprint, discuss the findings thus far and progress from there on.
Comment 31 Julian Kalinowski 2014-03-28 11:56:19 UTC
Well, i just checked your commit again. It doubles the transfer speed in my case, which is nice.
The increased buffer size wasn't mentioned in the comment, so i guess nobody saw it.

However, using cifs, transfer speed is still twice as high as using kio-smb with the 64k buffer size, because it requests more data before a reply comes in (async read-ahead, i think it's in file.c: cifs_readpages).
And for some reason, the kio-slave does that too if buffer size is increased above the 64k, thats why a buffer size of 1048576 works better, even if the transferred packets are only allowed to be 64k.

So beyond buffer size increase (which is nice!), read-ahead would be useful to reduce waiting time.
Comment 32 Dawit Alemayehu 2014-03-29 14:45:03 UTC
(In reply to comment #30)
> (In reply to comment #29)
> > This is just pointless! Have any of you even bothered to check the commit
> > link in comment#20 to see what the buffer size is set to now??????
> > 
> > Really this is just tiring and this is not even my code or something I
> > maintain. I am done!
> 
> I don't know if you meant it as a reply to me or in general. Sorry for not
> actually checking the code. I took the reviewboard link you posted last:
> https://git.reviewboard.kde.org/r/113915/ for granted and wrongly assumed
> that the value MAX_XFER_BUF_SIZE (which was 16348) was still the current
> value since the review was discarded.

It was a general statement, not something directed at anyone specific.
Comment 33 Dawit Alemayehu 2014-03-29 16:16:07 UTC
(In reply to comment #31)
> Well, i just checked your commit again. It doubles the transfer speed in my
> case, which is nice.
> The increased buffer size wasn't mentioned in the comment, so i guess nobody
> saw it.
> 
> However, using cifs, transfer speed is still twice as high as using kio-smb
> with the 64k buffer size, because it requests more data before a reply comes
> in (async read-ahead, i think it's in file.c: cifs_readpages).
> And for some reason, the kio-slave does that too if buffer size is increased
> above the 64k, thats why a buffer size of 1048576 works better, even if the
> transferred packets are only allowed to be 64k.

1048576 is default read/write size for SMB2 protocol. See the smb.conf man pages or the excerpt taken from those pages at https://lists.samba.org/archive/samba-technical/2011-June/078093.html for the details. The newer CIFS protocol implementations support large transfer sizes beyond 65K. However, kio_smb cannot simply hard code and use such large buffer size because it has to support severs that implement the older version of the protocol. And hard coding such change will likely impact those servers adversely.

> So beyond buffer size increase (which is nice!), read-ahead would be useful
> to reduce waiting time.

No there is a much easier solution to this and that is to do the same thing the command line smbclient does. By default, smbclient sets the io buffer size to 64512 (ctx->io_bufsize = 64512). However, you can override that value using a command line option (-b). We can provide the same configuration such that people can set the read/write or general buffer size globally or per specific hosts in kio_smbrc. The default will still be 64K, but the user can change that value to their heart's content.
Comment 34 Mauro Molinari 2014-04-23 07:47:20 UTC
I also noticed that file transfers through Dolphin using smb:/ and nfs:/ protocols are sensibly slower than doing the same file transfers from the same shared folders mounted on the local file system. This is in contrast with the performance I get if I use the ftp:/ protocol to access the same server, which is very fast.
These are the transfer speeds I get on my 802.11g WLAN:
- reading from server with ftp:/ protocol in Dolphin: around 3,9 MB/s
- reading from server with smb:/ protocol in Dolphin: around 2,2 MB/s
- reading from server with nfs:/ protocol in Dolpin: around 1,9 MB/s
- reading from server using a SMB mount point: around 3,5 MB/s
- reading from server using a NFS mount point: around 3,9 MB/s
- reading from server using a curlftpfs mount point: around 500 kB/s (there must be some severe problem with curlftpfs, but it's not important here)

Conclusion: said that 3,9 MB/s should be the maximum speed I can achieve using my WLAN, ftp transfer with ftp:/ protocol in Dolphin is highly efficient, while the same is not true for smb:/ and nfs:/ protocols (the latter is the worst one!! And this was a surprise for me). 
This is a pity, because accessing shared resources in Dolphin without having to mount them would be very handy, especially for the novice user and when combined with resource bookmarking.

This evening I will make some tests using the wired Gigabit connection instead of WLAN. I'm expecting even higher differences.

Using Linux Mint 16 KDE.
Comment 35 Mauro Molinari 2014-04-23 07:51:07 UTC
Forgot to say: my transfer tests using mounted NFS/SMB shared folders were still made with Dolphin. So the difference in my case seems to rely on the protocol handling itself, not on the application I use to copy files (which was Dolphin in all my tests).
Comment 36 Mauro Molinari 2014-04-23 21:19:43 UTC
I repeated my experiments after connecting the laptop to the Gigabit ethernet (instead of WLAN). Here are the results:
- copy from NAS using smb:/ protocol in Dolphin: <14 MB/s
- copy from NAS using nfs:/protocol in Dolphin: <8 MB/s
- copy from NAS using ftp:/ protocol in Dolphin: <70 MB/s
- copy from NAS (using Dolphin) after mounting SMB share: ~60 MB/s
- copy from NAS (using Dolphin) after mount NFS share: ~21 MB/s (*)

(*) probably capped by the NAS CPU utilization, which was >90% (still surprising for me, since I had expected NFS to be lighter and faster than SMB...)

So, this clearly confirms what was observed by others, too.
Comment 37 Fred Albrecht 2014-08-15 15:35:53 UTC
I've just done tests here on Kubuntu 14.4 with KDE 4.13.3.

I'm copying a 1.4 GiB file from a Synology DS213 NAS which is otherwise idle and get around 1MiB/s, whatever the protocol (SMB or FTP) on a quiet Gb Ethernet link.

The machine has a small Windows partition and the speed is considerably higher under that system.

It's interesting (sorry, sad) that I get the same speed through ADSL2+, about 10Mb, as through Gb Ethernet...
Comment 38 emelenas 2015-10-18 07:59:44 UTC
Although the last entry is a year old, I'd appreciate an update on this. Copying through smb/dolphin is still slow on my fedora 22. Any chances of additional work?
Comment 39 C. Priisholm 2016-02-10 07:53:08 UTC
I've been browsing the bugs and this seems to be related to what I am seeing, but with a twist.

When copying to a mounted cifs share, I either get the expected performance (around 100MB/s on a 1Gb network) or I get a much slower transferspeed (around 15-25 MB/s) when copying gigabyte sized files with Dolphin.

To recreate I have the source folder in one pane and the parent folder (containing the target folder) in the other pane. 
It behaves well if I drag the file on top of the target folder and selects copy/move, i.e. insert the file into the target folder. 
But if I open the target folder - i.e. shows it contents in the pane rather than showing the parent folder - and then drag a file to the pane, then the transfer speed drops dramatically.
Comment 40 oliver@openbrackets.net 2016-12-23 12:50:00 UTC
If it's any consoloation, they are having exactly the same discussion over at gnome-gvfs:

https://bugzilla.gnome.org/show_bug.cgi?id=776339

I did a bunch of recompiling with different buffer sizes over there. For my case 4MB+ buffers give good read speed (wire speed 100MB/s). I haven't dealt with the write case yet for gvfs, as the code is very different.

Conclusion as far as I see (for gvfs AND kioslave) is one of these:

1. short-term: make a tunable config variable available so people can grow their buffer sizes to max out their wires. (I totally understand and agree with the devs that just hard coding a huge size is not really acceptable). Hopefully this works for write as well as read (read is proven). 

2. medium-term: this comment:
https://bugs.kde.org/show_bug.cgi?id=291835#c26
sounds the most primising to me, ie UNDERSTAND how cifs manages to be proper fast (presumably) without huge buffer sizes (eg pre-request 2x64kB for read..something similar for write..?)

I am happy to help if the kioslave devs can agree with the above strategy?

Opinions?
Comment 41 Mark 2016-12-23 14:29:11 UTC
Hmm, i have a new idea for this.

The biggest issue we have here is buffer management and finding the right buffer size that satisfies everyone. That is impossible because everyone (obviously) has a different setup so a perfect buffer size for me might be horrible for others.

So.. Lets not do that anymore. Let the operating system itself decide that (Linux in this case). Linux has two functions for that. Sendfile [1] and splice [2].

Both work about the same. One is newer then the other, and one does copy, the other does not (or should not). Lets ignore the details there for a moment.

Samba internally uses sendfile if it was enabled, but in my experience even that works dreadfully slow so if we can bypass that, that would be great! And i think we can.

smbc_open returns a file descriptor and we have (or can get if we need to) a file descriptor for the file we want to write into or read from. So we have all the basics that are needed for sendfile/splice to do whatever it thinks is best. Buffer management is then done by those functions internally and we quite simply can ignore that.

This is hypothetical! I don't know if it would actually work this way. I have a local client/server test piece (totally unrelated to samba) that uses splice (and sendfile optionally) and gets close to maxing out my network connection. Say ~92% and that is consistently.

If this were to work than the issues described in this long standing bug report are quite likely gone. kio_smb_file.cpp is probably the file that would have to change a lot to make this work. Anyone up for it? Just as a proof of concept to see if this idea actually works.


[1] https://linux.die.net/man/2/sendfile
[2] https://linux.die.net/man/2/splice
Comment 42 oliver@openbrackets.net 2016-12-23 17:09:08 UTC
@Mark

That sounds very neat, some questions:

1. gvfs-smb backend uses libsmbclient, so does the cmd line smbget. I assume kioslave-smb is the same? Is the libsmbclient API, which we don't control, compatible with a sendfile / splice "streaming" approach?

2. given the apparent lack of devs on this long-standing issue, is this rather ambitious approach, while potentially being the "more correct/elegant way", likely to result in more paralysis... ie what can we commit/release today?
Comment 43 oliver@openbrackets.net 2016-12-23 17:16:04 UTC
@Mark

one more thought...

if 1. above is OK, ie smbclient / smbc_read|write can support and FD streaming approach, then, eventhough we are getting rid of the problem of determining buffer size, is that going to solve the throughput issue?

I don't understand the problem in great detail yet, but it seems to me that because of the way the SMB protocol works, the way the libsmbclient API needs to be called and fed data, is quite critical, ref https://bugs.kde.org/show_bug.cgi?id=291835#c26 where wireshark forensics show that smbclient/cifs use clever strategy of pre-fetching "one block ahead" to get max throughput...

How can sendfile / splice ever understand these semantics?

Would we be getting rid of the "deciding what size buffer to use" problem, while failing to address the actual problem of achieving highly optimised throughput...?
Comment 44 Mark 2016-12-23 18:16:43 UTC
(In reply to oliver@openbrackets.net from comment #43)
> @Mark
> 
> one more thought...
> 
> if 1. above is OK, ie smbclient / smbc_read|write can support and FD
> streaming approach, then, eventhough we are getting rid of the problem of
> determining buffer size, is that going to solve the throughput issue?
> 
> I don't understand the problem in great detail yet, but it seems to me that
> because of the way the SMB protocol works, the way the libsmbclient API
> needs to be called and fed data, is quite critical, ref
> https://bugs.kde.org/show_bug.cgi?id=291835#c26 where wireshark forensics
> show that smbclient/cifs use clever strategy of pre-fetching "one block
> ahead" to get max throughput...
> 
> How can sendfile / splice ever understand these semantics?
> 
> Would we be getting rid of the "deciding what size buffer to use" problem,
> while failing to address the actual problem of achieving highly optimised
> throughput...?

For all your questions: i don't know :)

But i do know that sendfile/splice works marvelously on sockets. If i do an iperf benchmark between my computer and my locak server i get around 990mbit/sec. That is very fast for a 1gbit connection. It's hitting the maximum throughput, anything more is close to impossible due to TCP overhead.

Now if i use sendfile/splice to copy a file over the same network i get between 900 and 950 mbit/sec. Close enough for me :)

And that is with just those file descriptors and letting sendfile/splice handle whatever they want to handle. I don't know how it internally does smart stuff, but i do know it's blazingly fast.

If all of this works with samba around id.. We will just have to try it out i guess.
Comment 45 oliver@openbrackets.net 2016-12-23 18:33:46 UTC
I think we can safely assume that the SMB protocol is not just "a socket". Obviously no-one wants to re-implement an SMB protocol client, that would be a huge task. 

So it's really a question of whether libsmbclient (or similar lib) can support "just being fed data from an FD". From all I have seen, that is not the case. libsmbclient expects discrete calls to smbc_read|write:

/**@ingroup file
 * Read from a file using an opened file handle.
 *
 * @param fd        Open file handle from smbc_open() or smbc_creat()
 *
 * @param buf       Pointer to buffer to receive read data
 *
 * @param bufsize   Size of buf in bytes
 *
 * @return          Number of bytes read;
 *                  0 upon EOF;
 *                  < 0 on error, with errno set:
 *                  - EISDIR fd refers to a directory
 *                  - EBADF  fd  is  not  a valid file descriptor or
 *                    is not open for reading.
 *                  - EINVAL fd is attached to an object which is
 *                    unsuitable for reading, or no buffer passed or
 *		      smbc_init not called.
 *
 * @see             smbc_open(), smbc_write()
 *
 */
ssize_t smbc_read(int fd, void *buf, size_t bufsize);
Comment 46 Nate Graham 2018-01-25 05:12:47 UTC
Given that we are obviously resource-constrained here, it seems that the approach of doing a major re-architecting of the SMB KIOSlave to use some new approach may not bear fruit anytime soon.

I see two other practical options here:
1. Provide a user-tunable knob to change the buffer size (probably only settable by manually editing kiorc)
2. Work on copying what the cifs client does with its clever pre-fetching

Does that sound right?
Comment 47 oliver@openbrackets.net 2018-01-25 09:55:19 UTC
@Nate Graham

Yes, those were my pragmatic conclusions too. 

I think the buffer tunable is possible, but it's a cludge. 

The real issue is network latency and the fact that each 64kB block takes an amount of time to transfer which is comparable to network latency:

from the gnome thread: https://bugzilla.gnome.org/show_bug.cgi?id=776339#c21

> Interestingly my ping time to the server is 0.7ms and at 100MByte/s a
> 64kb buffer would theoretically take 0.625ms, so we can see how round
> trip time can quickly become significant

It gets much worse over wireless or other higher latency networks. 

So the next block must be requested and already on it's way before the first block finishes xfer. That's the solution, and that's what mount-cifs does.

Since I stopped looking at this 12 months ago, I have been using an fstab mount using mount.cifs for those smb shares where throughput matters.

mount.cifs flies. Always at 95% of wirespeed, or better.
Comment 48 Christoph Feck 2018-02-14 23:09:19 UTC
libsmclient does not offer an async I/O API. The only way to get the speed is to use the Linux kernel CIFS VFS, which isn't available via API, only via a mount.
Comment 49 Nate Graham 2018-02-14 23:46:53 UTC
> The only way to get the speed is to use the Linux kernel CIFS VFS, which isn't available via API, only via a mount.

Which we could do via https://bugs.kde.org/show_bug.cgi?id=75324
Comment 50 Christoph Feck 2018-02-14 23:55:42 UTC
Actually, what that bug proposes is the reverse of what we need here: We would like kio-smb to use a kernel mount to speed up all KDE applications that use KIO.
Comment 51 Mark 2018-02-15 09:19:13 UTC
FWIW in my mind this is by no means a user space vs kernel space thing.

I talked about sendfile/splice before in this very bug, back then i had no numbers to back that up.

Now i do. Here's what i did.
1. I measured the iperf performance between two machines on the same network
2. That resulted in roughly 940Mbit/s thoughput (higher depends on the hardware, cheap hardware is capable of this). And that is no special kernel module. All just plain simple usermode.
3. I took that number and wrote a sendfile/splice application. Just to see if copying over TCP/IP in a local network can reach those potential speeds of ~940Mbit/s
4. So i made this tool to test just that: https://github.com/markg85/netsplice (just one file, main.cpp)
5. Surprise, it reaches ~930 Mbit/s with sendfile and splice
6. And yes, i did checksum the results. They match so the file had been transferred correctly.
7. Take special note that this was tested with an AMD Fusion E350! SMB most certainly can't get full performance because it has 100% CPU usage at around ~55MB/s in my tests)

So with that i've proven that copying a file over TCP/IP can saturate your network line just fine. Something CIFS, SMB, NFS, SSHFS, FTP (to name a few) can't. CIFS being the best though. The difference that remains is protocol overhead and difference between flags provided (in the case of SMB vs CIFS).

That CIFS is in the kernel has absolutely nothing to do with network throughput performance! Not at gigabit speeds that is.

The mere fact that CIFS can reach *better* speeds (it still won't saturate the network, it gets to about 80% or so in my setups) is very likely due to a difference in flags being passed. How to figure out the proper flags? Well, wireshark and lots of code reading i guess.

Anyhow, when running my proof of concept file copy (https://github.com/markg85/netsplice) you can saturate the network, which was the goal. Sure, it misses a gazillion options, but that's not the point :) It merely proves that copying a file can reach close to theoretical network limits.

We just need someone to look at this really closely (the flag difference between SMB and CIFS) and figure out the best flags to pass. Anyone up to the task?
Using sendfile in the KIO SMB code might still be benificial, but also a lot more complicated so just stick at comparing flags for the moment.
Comment 52 Christoph Feck 2018-02-15 22:48:21 UTC
Mark, the issue is not whether user-space code can saturate the network. The problem is that there is no async I/O API available in libsmbclient.

If I understand the previous comments correctly, there is a limitation in the specification to 64 KB chunks.

libsmbclient does not allow to send a network request for the next 64KB chunk while an old one is still running. You have to wait for the reply, then send the request for the next chunk. On some networks, this causes a several ms delay between the requests. The faster the transfer speed, the more painful is the delay (delay vs. payload ratio).

The CIFS implementation in the kernel does not have this issue; while created by the same (Samba) team, it does not use libsmbclient and always requests the next chunk ahead of arrival of the previous chunk, so that the network is always saturated.

It would be possible to write a network client out-of-kernel that does not have the libsmbclient limitation, but we do not have the power to write one. I do not know if there is any other library or code out there that we could use.

> We just need someone

Not me for anything related to networks or databases ;)
Comment 53 Krzysztof Marczak 2018-02-16 07:26:48 UTC
Maybe look for the answers in other software. Gnome-commander also uses SMB and SFTP and uses all possible hardware throughput. Maybe if you look into source code of this application, will be possible to figure out how it should be done.
Comment 54 Mark 2018-02-16 10:32:45 UTC
(In reply to Christoph Feck from comment #52)
> Mark, the issue is not whether user-space code can saturate the network. The
> problem is that there is no async I/O API available in libsmbclient.
> 
> If I understand the previous comments correctly, there is a limitation in
> the specification to 64 KB chunks.
> 
> libsmbclient does not allow to send a network request for the next 64KB
> chunk while an old one is still running. You have to wait for the reply,
> then send the request for the next chunk. On some networks, this causes a
> several ms delay between the requests. The faster the transfer speed, the
> more painful is the delay (delay vs. payload ratio).
> 
> The CIFS implementation in the kernel does not have this issue; while
> created by the same (Samba) team, it does not use libsmbclient and always
> requests the next chunk ahead of arrival of the previous chunk, so that the
> network is always saturated.
> 
> It would be possible to write a network client out-of-kernel that does not
> have the libsmbclient limitation, but we do not have the power to write one.
> I do not know if there is any other library or code out there that we could
> use.
> 
> > We just need someone
> 
> Not me for anything related to networks or databases ;)

I'm fairly sure samba internally uses sendfile when "use sendfile = yes" is set (https://www.samba.org/samba/docs/current/man-html/smb.conf.5.html). However, i'm also quite sure that there were bugs with this and the advice was to have it disabled by default (hence the default is disabled). This must be a bug on the samba side of thigs as sendfile alone works just fine for years now. Just putting "use sendfile = yes" in the config doesn't help though, not in my experience at least. Samba does some weird things there that lose efficiency.
Comment 55 Nate Graham 2020-01-28 18:05:05 UTC
*** Bug 416832 has been marked as a duplicate of this bug. ***
Comment 56 Nate Graham 2020-02-11 15:33:21 UTC
*** Bug 417358 has been marked as a duplicate of this bug. ***
Comment 57 Harald Sitter 2020-02-18 15:32:28 UTC
I did some research....

Let's start with the important thing: KIO does not dictate the request size.

KIO requests a read from smbc of up-to a given amount. smbc internally will break that amount into concurrent network requests (libsmb_context.c, clireadwrite.c). The actual amount of requests is calculated based on server capabilities. In other words: smbc may not have an async API, that doesn't stop it from fulfilling a single client read requests with numerous network requests to the server.

Of note here is that the server's capabilities will severely impact the concurrency and thus
throughput. e.g. If you use a server that only speaks SMB1 and/or doesn't have the necessary capabilities you'll generally see much worse performance, and there is nothing to be done about that.

Looking at the SMB2+ scenarios exclusively it does however mean the the larger the request size KIO uses the higher the throughput. If you request a 1G read of a 1G file you may well get that back in a single read call at near ideal performance. And while that would seem attractive, it isn't. We need progress reporting and the larger the request size -> the less reads -> the less updates we can give to progress. i.e. the transfer dialog would be broken. As a result we'd probably want no less than filesize/100 or maybe 50 requestSize so as to update every percent or two.

Indeed when increasing the buffer size to filesize/50 you'll probably see fairly good performance. On my tests against a windows10 server and a 1G file that's as folows:
Win -> Win: 100-110 MiB/s
Win -> mount: ~108 MiB/s (~9.59s)
Win -> KIO-current: ~58 MiB/s (~18s)
Win -> KIO-dynamic-request-size: 70-85 MiB/s (~12.54s)

So far that doesn't look bad, now the sync API gets in the way though. Each read effectively is a blocking chain of

- read()
- write()
- emit progress()

Meaning write() will directly impact throughput because the next batch of read requests cannot be sent to the server until the read loop wraps around. IOW: we do not "queue" the next read request with the server until after we've written the current one. A quick concurrency hack to mitigate that with a threaded r/w queue suggest the impact of that is actually considerable.

Win -> KIO-dynamic-request-size+threaded-write: ~106 MiB/s (~9.74s)

That seems about as efficient as this can get considering we need to drag the data through user space.

So we'll probably want a smarter size calculation (+cap at some reasonable value, because this will impact ram usage) + a circular buffer between read() and write() so we can "buffer" data.
Comment 58 Nate Graham 2020-02-18 17:30:33 UTC
Fantastic work, Harald! Really top-notch engineering right there.
Comment 59 Christoph Feck 2020-02-18 21:46:32 UTC
Does "dynamic request size" mean that it starts e.g. at 256KB, then measures the time it needed, and if it was less than 1 second, then it will double the buffer size, measure again, etc. up to (say) 128MB?
Comment 60 Fabby 2020-02-18 23:39:11 UTC
Holy Moly!  round(106/108*100)=98% efficiency and (nearly) 100% speed increase!

Where can I send the bottle of Champagne? (or crate of Belgian beer if that's your preference...)
Comment 61 Harald Sitter 2020-04-03 17:03:38 UTC
Git commit 46b5fb425c148b9a6b02eed3ef25f14feb5996ba by Harald Sitter.
Committed on 03/04/2020 at 17:03.
Pushed by sitter into branch 'master'.

smb: fast copy

Summary:
see https://bugs.kde.org/show_bug.cgi?id=291835#c57 for background

- reading now happens inside a future. should be safe since we don't have
  any other threads doing anything while we wait.
- the future feeds into a buffer from which the main thread will
  take file segments and write them to disk
- buffer has 4 segments and synchronizes the threads via wait conditions
- the size of a segment is determined somewhat dynamically between 64kb
  and 4mb. the larger a file is the more it benefits from larger
  read requests

under perfect conditions this yields approximately mount-level copy
performance, unfortunately those are hard to hit so on average it's usually
less (somewhere in the range of 10 to 20% depending on the actual file
size and server type).

for many tiny files performance is about where it was before. the larger
the files get the greater the gains from this diff though.

specifically here's some samples I've taken:

- for downloads from windows10
  - 1G & 4G file
    - compared to 20.04 is ~77% faster
    - within 10% of windows
  - 8G file
    - compared to 20.04 is ~79% faster
    - within 5% of windows
- uploads to windows10
  - all sizes
    - compared to 20.04 is ~50% faster
    - now comparable performance to windows
- for remote-to-remote file copies from windows10 to smbd 4.11.6
  - 1000 x 5K files
    - no change, dreadfully slow, likely problem in KIO internals
  - 1G file
    - compared to 20.04 is ~45% faster
    - within 8% of windows
  - 4G file
    - compared to 20.04 is ~95% faster
    - and somehow 18% faster than windows (could be a fluke)

I've done transfers for 128M, 256M, 512M, 1G, 4G and partially 8G.
Differences not mentioned are either unchanged, negligible or in line with
documented trends.
FIXED-IN: 20.08

Test Plan:
- fallocate -l 1G file
- copy around
- copy kio-extras around

Reviewers: ngraham, cfeck, #frameworks, #dolphin

Subscribers: mmustac, meven, hallas, anthonyfieroni, asturmlechner, kde-frameworks-devel, kfm-devel

Tags: #dolphin, #frameworks

Differential Revision: https://phabricator.kde.org/D27504

M  +4    -0    smb/CMakeLists.txt
M  +1    -0    smb/autotests/CMakeLists.txt
A  +151  -0    smb/autotests/transfertest.cpp     [License: UNKNOWN]  *
M  +0    -1    smb/kio_smb.h
M  +53   -34   smb/kio_smb_dir.cpp
M  +55   -42   smb/kio_smb_file.cpp
A  +105  -0    smb/transfer.cpp     [License: UNKNOWN]  *
A  +75   -0    smb/transfer.h     [License: UNKNOWN]  *

The files marked with a * at the end have a non valid license. Please read: https://community.kde.org/Policies/Licensing_Policy and use the headers which are listed at that page.


https://commits.kde.org/kio-extras/46b5fb425c148b9a6b02eed3ef25f14feb5996ba