Bug 467561 - Nfs /home client hopeless since linux kernel 6.2
Summary: Nfs /home client hopeless since linux kernel 6.2
Status: RESOLVED UPSTREAM
Alias: None
Product: kde
Classification: I don't know
Component: general (show other bugs)
Version: unspecified
Platform: Arch Linux Linux
: NOR grave
Target Milestone: ---
Assignee: Unassigned bugs mailing-list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-19 08:01 UTC by Richard PALO
Modified: 2023-04-10 07:41 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
inxi -FAZ output on LTS (5.25 KB, text/plain)
2023-03-19 08:01 UTC, Richard PALO
Details
mountstats /home/richard on LTS (6.25 KB, text/plain)
2023-03-21 13:35 UTC, Richard PALO
Details
mountstats /home/richard on mainline 6.2 (6.24 KB, text/plain)
2023-03-21 13:35 UTC, Richard PALO
Details
nfsiostats LTS (897 bytes, text/plain)
2023-03-22 21:18 UTC, Richard PALO
Details
nfsiostats mainline (897 bytes, text/plain)
2023-03-22 21:23 UTC, Richard PALO
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard PALO 2023-03-19 08:01:43 UTC
Created attachment 157411 [details]
inxi -FAZ output on LTS

SUMMARY
I report here as it is unclear which particular component is responsible...
I've tried to solicit info via the KDE and EndeavourOS fora:
- https://forum.kde.org/viewtopic.php?f=18&t=177449
- https://forum.endeavouros.com/t/nfs-home-client-hopeless-since-linux-kernel-6-2/38098
(given the distro EndeavourOS is Arch based, but isn't listed I indicated Archlinux above)

Primary info reproduced here:
================================
Been running nfs /home on our Arch based clients since years, and more recently having upgraded the HW to Asus PN51-E1s, we moved to EndeavourOS… But since a week it is impossible to run mainstream, luckily linux-lts is unaffected.

Currently running KDE/plasma

Operating System: EndeavourOS 
KDE Plasma Version: 5.27.2
KDE Frameworks Version: 5.103.0
Qt Version: 5.15.8
Kernel Version: 6.1.15-1-lts or 6.2.2-arch1-1 (64-bit)
Graphics Platform: X11
Processors: 12 × AMD Ryzen 5 5500U with Radeon Graphics
Memory: 30.7 Gio of RAM
Graphics Processor: AMD Radeon Graphics
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: MINIPC PN51-E1
System Version: 0505

Resulting mount (from /etc/fstab) is thus
server:/home/richard on /home/richard type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=14,retrans=2,sec=sys,clientaddr=192.168.0.41,local_lock=none,addr=192.168.0.1)

/etc/fstab uses https://wiki.archlinux.org/title/NFS#Mount_using_/etc/fstab_with_systemd 1
plus in /etc/nfsmount.conf

[ MountPoint "/export/home" ]
background=True

The server is pure arch x86_64 running 6.1.15-1-lts on a supermicro H8SGL-F

What are the symptoms?

Extremely long login times, on LTS it’s nearly interactive (<5seconds) vs a minute or two
using dolphin accessing /home or other nfs shares is excruciating, mostly on first access (also long moments as opposed to nearly interactive) and file accesses are clearly longer, though maybe not as exaggerated as folder openings.

Booting back and forth between LTS and mainstream reproduces the problem.

It seems like perhaps a serious caching problem, or worse.

==================

ADDITIONAL INFORMATION
Not having too much time to spend on this as running LTS doesn’t exhibit the problem, this morning I believe I can determine that it’s possible that QT is at fault.
That is, with plasma there is dolphin, by default, which has the problem. So I installed pcmanfm and pcmanfm-qt to see about alternatives because in a terminal screen access on NFS seems just fine.

pcmanfm on both lts and mainstream seems just fine
pcmanfm-qt is similar to dolphin, fine on lts but dog slow on mainstream.

Anybody else can verify that? or knows of any way to tune? perhaps it needs a larger read_ahead_kb or something.

[added] and maybe it’s me, but it seems that in lts, dolphin loads fast, then asynchronously adds the size of folders later, a bit at a time… in mainstream it looks perhaps synchronous, evidently taking a real long time to interrogate the nfs server for all that… possible difference in the semantics of the underlying calls?

BTW, currently linux 6.2.6.arch1-1 is installed and exhibits the same problem (QT5 5.15.8+kde+r183-1)
and LTS is at 6.1.19-1-lts.

We also have numerous other NFS mounts from the same server which exhibits the same symptoms as $HOME

Ultimately, KDE on our platforms is unusable on Linux 6.2, if no resolution is found before LTS upgrades to 6.2 or higher with similar results, we'll have to roll back to LXDE as our desktop platform.
 
for completeness, I attach the output from `inxi -Faz`  from LTS.
Comment 1 Richard PALO 2023-03-21 13:35:00 UTC
Created attachment 157492 [details]
mountstats /home/richard on LTS
Comment 2 Richard PALO 2023-03-21 13:35:30 UTC
Created attachment 157493 [details]
mountstats /home/richard on mainline 6.2
Comment 3 Richard PALO 2023-03-21 13:37:24 UTC
added mountstats /home/richard for both, issued as soon as possible after login and a terminal could be created.
NB: these are with the option 'fsc' in the nfs mount statements in /etc/fstab

I find curious over 17x the number of RPC calls in mainline 6.2 as opposed to LTS.
Comment 4 Richard PALO 2023-03-22 21:18:00 UTC
Created attachment 157521 [details]
nfsiostats LTS
Comment 5 Richard PALO 2023-03-22 21:23:27 UTC
Created attachment 157522 [details]
nfsiostats mainline

rather distressing mainline nfsiostats compared with LTS
5 times total more ops/s
5x slower read ops/s and kB/s
considerable write ops/s and kB/s and kB/op reduction
Comment 6 Nate Graham 2023-04-04 22:54:19 UTC
Sounds you have multiple system components affected, and you said it started with kernel 6.2, I would suggest that it's a kernel issue, and recommend reporting it at https://bugzilla.kernel.org.

In general, in a managed/client focused environment, I would strongly recommend either using an LTS style distro like Kubuntu, or having an internal QA process to prevent certain updates from reaching clients. Arch is a very fast-moving distro and it's expected that things will break once in a while, with Arch users being expected to be able to troubleshoot and report informed bug reports. If this doesn't describe your clients, a different distro might be a better fit.

BTW, who is "we"? Can you provide some details about your use of Plasma on these machines? It sounds interesting.
Comment 7 Richard PALO 2023-04-10 07:41:02 UTC
(In reply to Nate Graham from comment #6)
> Sounds you have multiple system components affected, and you said it started
> with kernel 6.2, I would suggest that it's a kernel issue, and recommend
> reporting it at https://bugzilla.kernel.org.
> 
> In general, in a managed/client focused environment, I would strongly
> recommend either using an LTS style distro like Kubuntu, or having an
> internal QA process to prevent certain updates from reaching clients. Arch
> is a very fast-moving distro and it's expected that things will break once
> in a while, with Arch users being expected to be able to troubleshoot and
> report informed bug reports. If this doesn't describe your clients, a
> different distro might be a better fit.
> 
> BTW, who is "we"? Can you provide some details about your use of Plasma on
> these machines? It sounds interesting.

Sorry, I guess I missed your response.

In general, we (our site) uses linux-lts, but with the clients we tended toward mainstream for better hw support.
This was quite important when the clients were aarch64 but now that we've gone back to x86_64 with Ryzen, there is a bit less problem there.

The choice of plasma on the  clients was two-fold, first, certain tools like the pdf viewer okular are (in our usage) superior to the usual evince or equivalent, mainly because we send/receive lots of 'electronically signed' pdfs.  Using LibreOffice for signing or verifying signatures was a PITA with pdfs.

Secondly, now that we have more memory on the clients due to moving back to x86_64, the user experience is more appreciated under plasma -- once the changes with the initial exposure migrating from lxde/lxqt was overcome [naturally].
These users are administrative/finance users in an SME where NFS home simplifies considerably the system administration.

I'll check with the latest 6.2 update to see if there's any change, if not I'll file an issue upstream linux as suggested.

cheers