Bug 392145

Summary: kactivitymanagerd crash every login. /home on NFS
Product: [Plasma] kactivitymanagerd Reporter: Paul Worrall <p.r.worrall>
Component: generalAssignee: Ivan Čukić <ivan.cukic>
Status: RESOLVED FIXED    
Severity: normal CC: j, plasma-bugs, rdieter
Priority: NOR    
Version: 5.12.3   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Paul Worrall 2018-03-21 19:12:56 UTC
Plasma 5.12.3 was working fine on Kubuntu 17.10 plus backports ppa

On upgrading to Kubuntu 18.04 beta, plasmashell crashes every login leaving an empty black screen.

The only clues I can see are:

1. /home is an NFS share

2. .xsession-errors contains:

KActivities: Database can not be opened in WAL mode. Check the SQLite version (required >3.7.0). And whether your filesystem supports shared memory

followed by multiple reports of:
KActivities: FATAL ERROR: Failed to contact the activity manager daemon

Does NFS support shared memory?

If I unmount /home and use a local /home then all seems fine.

sqlite -version says 2.8.17
Comment 1 Paul Worrall 2018-03-27 17:38:33 UTC
paul@desktop:~/.local/share/kactivitymanagerd/resources$ sqlite3 database
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
sqlite> pragma journal_mode;
Error: disk I/O error
sqlite> pragma journal_mode=wal;
Error: disk I/O error
sqlite>
Comment 2 Paul Worrall 2018-03-28 19:27:04 UTC
Comment 1 was on 18.04 beta.

With 17.10 + Plasma 5.12.3 and the same NFS mounted /home:

paul@paul-BB-64004H:~/.local/share/kactivitymanagerd/resources$ sqlite3 database
SQLite version 3.19.3 2017-06-08 14:26:16
Enter ".help" for usage hints.
sqlite> pragma journal_mode;
wal
sqlite> 

which seems to show that sqlite has a problem with NFS on 18.04 (which has a different sqlite version)
Comment 3 Paul Worrall 2018-03-28 21:33:13 UTC
As a work-around on 18.04: if I make the kactivitymanagerd/resources directory a symlink to directory on the local (non-NFS) disk then plasma does not crash.
Comment 4 Rex Dieter 2018-03-28 21:35:31 UTC

*** This bug has been marked as a duplicate of bug 387979 ***
Comment 5 Paul Worrall 2018-04-07 17:39:59 UTC
I do not think this is a duplicate of 387979:

1.  If I mount the same NFS share as /home on a Kubuntu 17.10 machine all is well, so the database isn't corrupt.

2. deleting ~/.local/share/kactivitymanagerd/ doesn't help like it did for 387979

3. this bug is manifest only if ~/.local/share/kactivitymanagerd/ is on an NFS mounted folder (see comment 3
Comment 6 Rex Dieter 2018-04-07 18:06:37 UTC
Strictly the fix is the same.  A crashing kactivitiesmanagerd will no longer crash plasmashell.

Remaining issue: kactivitiesmanagerd is still (sometimes?) crashing (or otherwise failing to start properly) when db is storged on nfs :(
Comment 7 Jason Tibbitts 2018-04-23 17:31:02 UTC
I just wanted to add that it may be related to NFS, but there is also something else involved which I don't yet understand.

Basically, I have a couple of hundred users all of which have ~ on NFS and this problem occurs for only three of them.  I do not know what is special about those three, nor have I figured out what I can delete to get things to go back to normal.  The three users don't even have their homes on the same NFS server, and other users are fine even when they share the same NFS server or even the same volume on that server.  All of the client machines have the same software: Fedora 27 with all applied updates, which includes sqlite 3.20.1 and framesworks 5.44.0 and plasma 5.12.4.
Comment 8 Paul Worrall 2018-04-23 17:57:33 UTC
@Jason Tibbits:  Is there any difference between users in the line in fstab that mounts the NFS share, e.g. mount options?
Comment 9 Ivan Čukić 2018-04-23 18:20:38 UTC
The NFS might be the problem - namely, from what I've read in sqlite3 docs, the 'shared memory' might not work on some NFSs. I'll probably have to make Kamd disable the database (and everything that uses the database - favourites, recent documents...) on nfs instead of it failing to start.
Comment 10 Jason Tibbitts 2018-04-23 23:55:23 UTC
All user home directories are automounted and the mount options are identical between users.

The problem with attributing this to NFS in general is that it does appear to work most of the time.  I mean, I have always seen occasional kactivitymanagerd crashes, and I've never been motivated to dig into them, but only these three users it crashes in a tight loop; new instances spawn and crash immediately and more instances are spawned in their place which crash immediately and so on.  So much so that the machines get quite slow from all of the IO writing core files to the journal.  It wouldn't be so problematic if it just crashed at login and stayed dead.

Really, NFS should be fine.  The only instances where you're going to run into problems is when you're trying to do concurrent writes on an NFS implementation where locking doesn't work.  The Linux NFS client and server together should support what sqlite needs, and in case there's no concurrency involved because there is access from only one client.  And in any case, Firefox certainly doesn't have any problems with it and it uses sqlite rather extensively.
Comment 11 Jason Tibbitts 2018-04-24 00:05:02 UTC
Also, the NFS version involved here is 4.2; NFS3 is not involved at all.  It shouldn't matter at all, but we use kerberized NFS (sec=krb5p) and the servers are all running RHEL7.4.
Comment 12 Paul Worrall 2018-05-12 08:44:21 UTC
After receiving a recent update to Firefox it can't access it's bookmarks.  

Turns out ff uses an sqlite database for them, and symlinking the ff profile directory to a local disk like I did for Kamd (comment 3) 'fixes' the problem.

It seems as though some recent change to sqlite or how it's used causes problems with databases on NFS shares.
Comment 13 Jason Tibbitts 2018-05-14 16:04:16 UTC
Oh, great.  This is going to be absolutely terrible.

Does anyone have any reference to an upstream sqlite bug on this?  So far I haven't found anything other than vague mentions that file locking might be subtly broken on some platforms, but that should still only cause problems if multiple processes are accessing the same database, and then it would just result in some kind of database corruption and an immediate failure when opening the database or setting WAL mode.  But maybe sqlite changed to fail early and didn't update all of their documentation.
Comment 14 Jason Tibbitts 2018-05-14 16:46:45 UTC
I chatted with some of the Linux NFS kernel developers and they know of no problems with sqlite on NFS when used against a Linux server.  I did find some documentation of problems with Solaris servers, but the Firefox developers did get the "unix-excl" VFS pushed into upstream sqlite a few years ago which does internal locking.  I'm not sure what kactivitymanagerd is using.  (I searched the source but don't see anywhere that might be set; I guess it's only using frameworks or qt functions to interact with sqlite.)

I did some very basic testing to see that unix-excl locking works as expected.  I also tested with a few sqlite versions (3.6.20 {doesn't have WAL or unix-excl at all}, 3.7.17, 3.20.1 and 3.22.0) and none had any particular problem opening or creating databases and setting the journal_mode to WAL (except for 3.6.20 which simply doesn't support WAL).

And I still can't find any statement of what's not supposed to work on NFS.  All I can find is vague mentions that some NFS locking implementations might be buggy, but nothing conclusive.  And honestly, Plasma really does need to work (and be fully functional) with NFS home directories.
Comment 15 Ivan Čukić 2018-07-23 08:47:03 UTC
I guess you found the same things regarding NFS and sqlite as I did.

The symlink is a nice workaround. And one that I think is best suitable for this situation.

Sadly, there is not much that can be done here:

1. Disable WAL which would completely kill the performance of sqlite and still would not be a guarantee that everything would work on NFS

2. Replace sqlite with another database and deal with the new problems that would arise from that (like nepomuk/baloo did quite a few times without clear benefits)
Comment 16 Paul Worrall 2019-04-07 14:29:01 UTC
Thanks for looking into this.  I've rebuilt the NFS server host, upgrading from NFS 4.1 to 4.2, and upgraded the client to Kubuntu 19.04 (beta) and now I can no longer reproduce this problem.  .xsession-errors no longer contains the reported messages about KActivities