Bug 387341

Summary: Memory leak in plasma shell with 5.10.5
Product: [Plasma] plasmashell Reporter: Nick Coghlan <ncoghlan>
Component: generalAssignee: David Edmundson <kde>
Status: RESOLVED FIXED    
Severity: normal CC: erin-kde, kde, kde, nate, plasma-bugs, rdieter
Priority: NOR    
Version: 5.10.5   
Target Milestone: 1.0   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In: 5.9.4
Sentry Crash Report:
Attachments: Massif memory analysis file ( Plasmashell 4:5.11.4-0neon+16.04+xenial+build77 ; Qt5 5.9.2+dfsg-4 )
Massif memory snapshot file ( Plasmashell 4:5.11.4-0neon+16.04+xenial+build77 ; Qt5 5.9.2+dfsg-4 )

Description Nick Coghlan 2017-11-27 03:13:42 UTC
I'm currently seeing significant memory leaks in plasmashell (reaching 12+ GiB virtual RAM usage and hence triggering the Linux OOM killer) with the following components (initially on Fedora 26, now on Fedora 27):

    $ rpm -qa plasma-desktop qt5-qtbase
    plasma-desktop-5.10.5-1.fc27.x86_64
    qt5-qtbase-5.9.2-5.fc27.x86_64

While it's plasmashell where the memory leak is appearing for me, I suspect the actual bug may be in Qt, since the appearance of the problem is fairly well correlated with the Fedora 26 upgrade from qt5-qtbase 5.7.1 to qt5-qtbase 5.9.2 last week.

(Note: while the leak does seem to be wallpaper slideshow related, I don't believe this is a duplicate of https://bugs.kde.org/show_bug.cgi?id=381000, as the memory leak portion of that was deemed to have already been fixed in Qt 5.9.2, as per https://bugs.kde.org/show_bug.cgi?id=386844 )
Comment 1 Erin Yuki Schlarb 2017-11-30 14:46:24 UTC
I see the same issue on let's call it KDE Neon User Debian edition with
 * libqt5widgets5 5.9.2+dfsg-4
 * plasma-desktop 4:5.11.3-0neon+16.04+xenial+build76

The exact same issue also occurs on Fedora 27 (vanilla).

I'm also 99% certain that this is a problem with GPU buffer management (i.e. some texture is not freed) because on my laptop the huge increase in memory usage is only accredited to the `plasmashell` process if `plasmashell` is started with the `LIBGL_ALWAYS_SOFTWARE=true` environment variable set (causing all rendering to be inside the process in software and also indicating that the amdgpu driver has an accounting problem…).

It should also be noted that there does not seem to be any (noticable) leak in plasmashell in the default configuration (i.e.: setting `XDG_CONFIG_HOME` to a newly created directory), but the leak can also be triggered by enabling the "Media-Frame" plasmoid. Also once the leak has been triggered once (and there seem to be several other things that may trigger it as well), there will be a steady loss of about 5-30MB per second on average (but lots more when having "Media-Frame" enabled).

I can also confirm that the slideshow wallpaper has nothing to do with this. (It neither triggers the memory leak nor does increase the rate of leaking.) Probably the issue lies within QtQuick or with its Plasma integration.
Comment 2 David Edmundson 2017-11-30 15:51:19 UTC
Virtual memory is not a measure of anything.

But if there is a leak please get a trace in massif. http://cs.swan.ac.uk/~csoliver/ok-sat-library/internet_html/doc/doc/Valgrind/3.8.1/html/ms-manual.html
Comment 3 Erin Yuki Schlarb 2017-11-30 21:14:15 UTC
@David Edmundson: I was refering to RSS, not VSS.

I'll generate massif traces tomorrow (reading the linked documentation it'll likely take several tries to obtain something usable).
Comment 4 David Edmundson 2017-11-30 21:19:45 UTC
Thanks
Also if you can try what 387128 is saying, that would be very helpful.
Comment 5 Erin Yuki Schlarb 2017-12-02 12:34:38 UTC
@David Edmundson:
Massif report (`ms_print`) of a detailed snapshot taken just before I told `plasmashell` to quit:
https://pastebin.com/XkMgSSJ8

The full and snapshot memory analysis files are attached.

Looking into 387128 now…
Comment 6 Erin Yuki Schlarb 2017-12-02 12:37:26 UTC
Created attachment 109172 [details]
Massif memory analysis file ( Plasmashell 4:5.11.4-0neon+16.04+xenial+build77 ; Qt5 5.9.2+dfsg-4 )
Comment 7 Erin Yuki Schlarb 2017-12-02 12:38:02 UTC
Created attachment 109173 [details]
Massif memory snapshot file ( Plasmashell 4:5.11.4-0neon+16.04+xenial+build77 ; Qt5 5.9.2+dfsg-4 )
Comment 8 David Edmundson 2017-12-04 14:07:01 UTC
Thanks for doing that, it's a stupid bug.

*** This bug has been marked as a duplicate of bug 387128 ***
Comment 9 Erin Yuki Schlarb 2017-12-04 19:26:40 UTC
@David Edmundson: Are you sure that this is a duplicate of 387128? Because from what I can tell they have hardly anything to do with each other (afaik: one is a memory leak caused by a buggy QML script buffering notifications, the other is a bug in the Qt SceneGraph or the QML/QtQuick2 libraries as used by PlasmaShell).

I have seen no evidence at this point to suggest that this issues may be related aside from the leak happening "with QML" – but then almost everything in PlasmaShell is QML in one why or another. The memory increases caused the notification bug are one-time (per notification) events, while this bug is about a slow, but continuous and (apparently) exponentional leak of memory. I'm sorry if I didn't make this clear in my last comment.

On another note: Could you arrange for somebody with the nVidia binary driver see if they can reproduce this issue? I feel it may stem from a specific interaction between Qt and Mesa.

Recap on how to test for this:

 * Make sure you have plasma-desktop 5.10+ and Qt5 5.9.2
 * Start `plasmashell`
 * Add the "Media-Frame" Plasmoid to the desktop and select some folder with pictures to display
 * Wait 5 to 10 minutes while regularily checking Plasma's memory usage and the remaining "Available" system memory using `free`
 * If memory usage stablizes after the inital take-in, then you're not affected, if it starts growing quickly (and continues to grow even after you remove the Media-Frame) then you are affected

Hopefully that makes it clearer what this bug is and isn't about. (The above also works if you don't receive a single notification during the entire process.)
Comment 10 Christoph Feck 2017-12-31 17:12:51 UTC
David, did you see comment #9?
Comment 11 Erin Yuki Schlarb 2018-01-05 23:23:05 UTC
Meanwhile I also tested this with AMD's "GPU Pro" binary userland driver and, while being an awful experience in a lot of ways, this particular bug does not occur there. I then also retested with Mesa from latest master and 13.x (i.e. a very old version) and the issue occurs on both of these platforms. Using older kernel versions (tested down to 4.8.x) does not change anything either.

Next stop is trying some older Qt versions (if I finally manage to get that crap to compile 😞). We'll see…

PS: Most likely this is not a bug *in* plasmashell, but for lack of a better I'll keep posting updates here until the actual problem is found.
Comment 12 Nick Coghlan 2018-01-07 12:37:44 UTC
Note: for my case, the problem seems *highly* likely to be specifically in the background image slideshow, as I haven't had any problems whatsoever since I switched the Wallpaper configuration to "Plain Color", whereas on the Slideshow setting I can see the plasmashell resident set size growing every time it switches to a new image (configuring it to switch images every second while displaying 2560x1440 images makes that growth quite obvious).

These are the current component versions that I just repeated that check with:

    $ rpm -qa plasma-desktop qt5-qtbase
    plasma-desktop-5.11.4-1.fc27.x86_64
    qt5-qtbase-5.9.2-6.fc27.x86_64

This is *not* a notifications related bug, and I don't think Alexander is talking about the same bug I'm seeing either, as I run KDE with all default settings, and the deciding factor between "leaking memory" and "not leaking memory" for me is whether or not I have a wallpaper slideshow enabled.
Comment 13 Erin Yuki Schlarb 2018-01-08 17:48:57 UTC
@Nick: Are you using a Mesa/Gallium based Userland GL driver? Because I only see this particular issue with those (using either software or hardware rendering) and newer Qt versions. I *also* see a memory leak when using `plasmashell` without the "Media-Frame", but it's a lot slower (several hours until it really becomes noticable, with media frame I can reach leak speeds of about 500MB per minute after only a couple of minutes).
Comment 14 Nate Graham 2018-02-24 13:47:23 UTC
Looks like the same not-Qt-but-something-else slideshow memory leak issue described in Bug 368838

*** This bug has been marked as a duplicate of bug 368838 ***
Comment 15 Erin Yuki Schlarb 2018-02-24 17:59:15 UTC
Yes, that is the same issue. I see the same symptoms including that fact that memory seems to go “missing” until you kill plasmashell. Other applications are apparently leaking memory as well, so after about 10 days of usage (I never reboot my system unless I have to, only suspend) the system will crash due to OOM.
Comment 16 Matt Whitlock 2018-02-24 23:25:52 UTC
(In reply to Alexander Schlarb from comment #15)
> I see the same symptoms including that fact
> that memory seems to go “missing” until you kill plasmashell.

@Alexander Schlarb: Have you tried running 'radeontop'? It's the only tool I've found so far that can report on the VRAM usage level. plasmashell definitely leaks VRAM like a sieve when it's running a slideshow. I haven't tried with LIBGL_ALWAYS_SOFTWARE=true yet, but that's a good pointer, so thanks for that.
Comment 17 Erin Yuki Schlarb 2018-02-26 12:27:35 UTC
Continuing this in 368838…
Comment 18 Erin Yuki Schlarb 2018-02-27 11:46:34 UTC
After some discussion on #368838 it turns out that this is a bug in the basic render loop of the Qt Scene Graph component when interacting with MESA drivers, that was fixed in Qt 5.9.4 and 5.10.0:

https://codereview.qt-project.org/#/c/200715/
https://codereview.qt-project.org/#/c/202781/

As a workaround one can start `plasmashell` with a different QSG render loop, like this:

    QSG_RENDER_LOOP=threaded plasmashell

Unfortunately #368838 is not fixed by this.
Comment 19 Erin Yuki Schlarb 2018-02-27 12:37:58 UTC
… or not. Issue is probably still in Qt, but the workaround with QSG_RENDER_LOOP=threaded does work.
Comment 20 Nick Coghlan 2018-03-10 07:51:25 UTC
I just tried to reproduce the slideshow-specific misbehaviour that I was seeing, and I agree that it has been fixed since I last reproduced the problem back in January (https://bugs.kde.org/show_bug.cgi?id=387341#c12)

    $ rpm -qa plasma-desktop qt5-qtbase
    plasma-desktop-5.11.5-1.fc27.x86_64
    qt5-qtbase-5.9.4-4.fc27.x86_64

Now, even after changing my wallpaper back to switching 1440p images every second, plasmashell memory usage remains pretty stable (around the 146M mark), whereas the previous symptom was for it to shoot up by multiple megabytes per second.

So closing again, and marking as "Fixed in 5.9.4".

(https://bugs.kde.org/show_bug.cgi?id=368838#c57 indicates that I *might* be being overly optimistic here, and there may still be a race condition that puts the basic render loop back into the "bad" state. Even if that's the case though, I don't think keeping this issue open will offer any benefit given the more detailed analysis in https://bugs.kde.org/show_bug.cgi?id=368838)