Summary: | Extreme stutters/hangs when using certain desktop effects when "~/.cache" is on slow storage | ||
---|---|---|---|
Product: | [Plasma] kwin | Reporter: | Ritchie Frodomar <alkalinethunder> |
Component: | effects-various | Assignee: | KWin default assignee <kwin-bugs-null> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | ales.astone, brodierobertson54321, dannkunt, dashonwwIII, devminer, duha.bugs, fabian.arndt, fabian, jlp, kde.podagric, kde, kde, luluklzde, meven29, miranda, mithras.ftw, mpalys7274, nate, nerumo, ngompa13, oded, postix, vikts, zixaphirmoxphar |
Priority: | NOR | ||
Version: | 6.0.4 | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | https://invent.kde.org/plasma/kwin/-/commit/c747f9c3a7cb9aab74ea07f38eb5de43feb06a2c | Version Fixed In: | 6.1.0 |
Sentry Crash Report: | |||
Attachments: | Logs of different Kwin hangs happening from both Kwin's perspective and the kerne'ls. |
Wanted to add additional info - I did mention "slow and busy" drives, but I should clarify that the busy-ness of the drive absolutely does affect the hanging. You can observe this even on a moderately-fast spinning disk by writing lots of data to it (like using dd if=/dev/urandom of=wherever bs=somethinghuge). This solved the Alt+Tab animations being slow and hanging for me, even though I already have a sort-of fast PCIe gen 3 SSD. The data loaded from disk on every Alt+Tab seems small enough to just stay in memory, so I would appreciate if the effects would just stay loaded in, *especially* for something like Alt+Tab. If I understand correctly, the problem is that kwin This is probably an issue we should fix for the benefit of people with a single slow storage disk, but I have to ask... if you're a technical expert and have multiple options for storage hardware options, would it mot make more sense to put the cache on your fastest one, rather than a known slow one? Is it throughout whole .cache folder or a specific subfolder? Any chance I can mount tmpfs for these animation only? (In reply to Nate Graham from comment #4) > This is probably an issue we should fix for the benefit of people with a > single slow storage disk, but I have to ask... if you're a technical expert > and have multiple options for storage hardware options, would it mot make > more sense to put the cache on your fastest one, rather than a known slow > one? In my case I have my boot and application data on my NVME drive as this has the most noticeable effect on system speeds, but my home directory is on my bulk storage hard drive. SSDs are much cheaper now and on my next upgrade I will probably make the swap but even if I do move to something faster it doesn't change the issue that Kwin is constantly adding wear to the drive with little pings every time a desktop effect is used. Additionally Plasma seems to be the only software that is this negatively effected by the cache existing on a hard drive, GNOME, Hyprland and other desktops do not seem to exhibit similar issues. (In reply to Mithras from comment #5) > Is it throughout whole .cache folder or a specific subfolder? Any chance I > can mount tmpfs for these animation only? The folder at fault is .cache/kwin/qmlcache from what I can tell, at least from early testing putting this on a tmpfs seems to not cause additional issues. Can this also be the reason why Plasma can sometimes completely freeze during heavy IO (like downloading a big game on Steam)? (In reply to Brodie Robertson from comment #7) > Additionally Plasma seems to be the only software that is this negatively > effected by the cache existing on a hard drive, GNOME, Hyprland and other > desktops do not seem to exhibit similar issues. For sure, we can and will fix this. People have complained about similar issues when using HDDs for years. I'm just saying that if you have faster storage available, and you're a technical expert enough to put things on different disks according to their performance characteristics, it would make sense to put the cache folder on a performant disk too, until this is fixed (and probably even after this is fixed). (In reply to Nate Graham from comment #10) > (In reply to Brodie Robertson from comment #7) > > Additionally Plasma seems to be the only software that is this negatively > > effected by the cache existing on a hard drive, GNOME, Hyprland and other > > desktops do not seem to exhibit similar issues. > For sure, we can and will fix this. People have complained about similar > issues when using HDDs for years. > > I'm just saying that if you have faster storage available, and you're a > technical expert enough to put things on different disks according to their > performance characteristics, it would make sense to put the cache folder on > a performant disk too, until this is fixed (and probably even after this is > fixed). I never was affected by this problems since my laptop have two SSD, but the constant Read/Write is making unecessary wear on the driver. I also notice that are other directorys/files for plasma cache outside of the affected `~/.cache/kwin/qmlcache` (but I could be missing more): ```bash ~/.cache/KDE/ ~/.cache/plasmashell/ ~/.cache/plasma-systemmonitor/ ~/.cache/plasma_theme_internal-system-colors.kcache ~/.cache/plasma_theme_default.kcache ``` This files are small in size, so will not be a good ideia to use /tmp directory instead for this type of cache? I dont know if the kde cache could grow when using other kde plugins, but I dont really think anyone is running plasma 5 and up on system with less that 2Gb of ram (In reply to Nate Graham from comment #10) > (In reply to Brodie Robertson from comment #7) > > Additionally Plasma seems to be the only software that is this negatively > > effected by the cache existing on a hard drive, GNOME, Hyprland and other > > desktops do not seem to exhibit similar issues. > For sure, we can and will fix this. People have complained about similar > issues when using HDDs for years. > > I'm just saying that if you have faster storage available, and you're a > technical expert enough to put things on different disks according to their > performance characteristics, it would make sense to put the cache folder on > a performant disk too, until this is fixed (and probably even after this is > fixed). In my case, when I first discovered this issue my OS was on NVMe with my home being on an LBM pool (two 4TB HDDs + one 1TB NVMe cache) forming an 8TB volume. Plasma is the only thing on my system that struggles with that configuration, and I don't store anything else in /home that would benefit from an SSD more than it would the extra space. 8 TB worth of SSDs is still prohibitively expensive for me, and not worth it for what I store on them. Even with gaming, the little extra load time doesn't bother me. (In reply to Victoria from comment #9) > Can this also be the reason why Plasma can sometimes completely freeze > during heavy IO (like downloading a big game on Steam)? I apologize for making a post unrelated to the bug at hand, but I believe that this kind of lockup is related to the system's dirty_bytes settings. If you are using a system that uses the default Linux Kernel values. like Arch Linux, and have a lot of memory, the default values can cause huge blocking synchronous writes. Try looking at the following for guidance: https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/ https://github.com/pop-os/default-settings/blob/2d38675f21a190d6ab2794b8961339c28ce0f7dd/etc/sysctl.d/10-pop-default-settings.conf https://wiki.archlinux.org/title/sysctl#Small_periodic_system_freezes I tried to simulate a very slow .cache using the delay device mapper module: > dd if=/dev/zero of=cachelp bs=1M count=64 > /sbin/mkfs.ext4 cachelp > sudo losetup -f cachelp > echo "0 $(sudo blockdev --getsz /dev/loop0) delay /dev/loop0 0 500)" | sudo dmsetup create delayed > sudo mount /dev/mapper/delayed ~testuser/.cache > sudo chown testuser /home/testuser/.cache Then I ran as testuser > dbus-run-session gdb --args kwin_wayland --x11-display $DISPLAY --exit-with-session /usr/bin/konsole Moved the window around a bit and maximised it, to make sure caches (.qmlc, ksysoca) got generated. Then I restarted kwin and maximised the konsole window and it lagged. Backtrace: Thread 1 (Thread 0x7f515559bb00 (LWP 31711) "kwin_wayland"): #0 0x00007f5158508daa in fdatasync () at /lib64/libc.so.6 #1 0x00007f5158d65d63 in QLockFile::tryLock(std::chrono::duration<long, std::ratio<1l, 1000l> >) () at /lib64/libQt6Core.so.6 #2 0x00007f515b167fd0 in QSGRhiSupport::preparePipelineCache(QRhi*, QQuickWindow*) () at /lib64/libQt6Quick.so.6 #3 0x00007f515b168d0c in QSGRhiSupport::createRhi(QQuickWindow*, QSurface*, bool) () at /lib64/libQt6Quick.so.6 #4 0x00007f515b14a763 in () at /lib64/libQt6Quick.so.6 #5 0x00007f515b14c73d in () at /lib64/libQt6Quick.so.6 #6 0x00007f515963d7e9 in QWindow::event(QEvent*) () at /lib64/libQt6Gui.so.6 #7 0x00007f5159fc2f1e in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib64/libQt6Widgets.so.6 #8 0x00007f5158d8f030 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib64/libQt6Core.so.6 #9 0x00007f51595eda8b in QGuiApplicationPrivate::processExposeEvent(QWindowSystemInterfacePrivate::ExposeEvent*) () at /lib64/libQt6Gui.so.6 #10 0x00007f515964c05c in QWindowSystemInterface::sendWindowSystemEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt6Gui.so.6 #11 0x00007f515964c1e7 in QWindowSystemInterface::flushWindowSystemEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt6Gui.so.6 #12 0x00007f5159627340 in QPlatformWindow::setVisible(bool) () at /lib64/libQt6Gui.so.6 #13 0x00007f515aa839b4 in () at /lib64/libQt6Qml.so.6 #14 0x00007f515aa97936 in () at /lib64/libQt6Qml.so.6 #15 0x00007f515aa9613d in QQmlBinding::doUpdate(QQmlJavaScriptExpression::DeleteWatcher const&, QFlags<QQmlPropertyData::WriteFlag>, QV4::Scope&) () at /lib64/libQt6Qml.so.6 #16 0x00007f515aa94084 in QQmlBinding::update(QFlags<QQmlPropertyData::WriteFlag>) () at /lib64/libQt6Qml.so.6 #17 0x00007f515ab096c8 in QQmlNotifier::emitNotify(QQmlNotifierEndpoint*, void**) () at /lib64/libQt6Qml.so.6 #18 0x00007f5158de7e88 in () at /lib64/libQt6Core.so.6 #19 0x00007f515ba2479d in KWin::Window::setElectricBorderMaximizing(bool) () at /lib64/libkwin.so.6 Apparently something in Qt's Rhi has a lock file in .cache which needs to be synced to disk (/home/testuser/.cache/kwin/qtpipelinecache-x86_64-little_endian-lp64/qqpc_opengl.lck). Fortunately it's only a fdatasync and not a fsync or even sync so it only waits until that specific file data has made it to disk, but it can still block for a while. Maybe there's a way to avoid QLockFile use there. (In reply to Fabian Vogt from comment #14) > I tried to simulate a very slow .cache using the delay device mapper module: > > > dd if=/dev/zero of=cachelp bs=1M count=64 > > /sbin/mkfs.ext4 cachelp > > sudo losetup -f cachelp > > echo "0 $(sudo blockdev --getsz /dev/loop0) delay /dev/loop0 0 500)" | sudo dmsetup create delayed > > sudo mount /dev/mapper/delayed ~testuser/.cache > > sudo chown testuser /home/testuser/.cache > > Then I ran as testuser > > > dbus-run-session gdb --args kwin_wayland --x11-display $DISPLAY --exit-with-session /usr/bin/konsole > > Moved the window around a bit and maximised it, to make sure caches (.qmlc, > ksysoca) got generated. Then I restarted kwin and maximised the konsole > window and it lagged. Backtrace: > > Thread 1 (Thread 0x7f515559bb00 (LWP 31711) "kwin_wayland"): > #0 0x00007f5158508daa in fdatasync () at /lib64/libc.so.6 > #1 0x00007f5158d65d63 in QLockFile::tryLock(std::chrono::duration<long, > std::ratio<1l, 1000l> >) () at /lib64/libQt6Core.so.6 > #2 0x00007f515b167fd0 in QSGRhiSupport::preparePipelineCache(QRhi*, > QQuickWindow*) () at /lib64/libQt6Quick.so.6 > #3 0x00007f515b168d0c in QSGRhiSupport::createRhi(QQuickWindow*, QSurface*, > bool) () at /lib64/libQt6Quick.so.6 > #4 0x00007f515b14a763 in () at /lib64/libQt6Quick.so.6 > #5 0x00007f515b14c73d in () at /lib64/libQt6Quick.so.6 > #6 0x00007f515963d7e9 in QWindow::event(QEvent*) () at /lib64/libQt6Gui.so.6 > #7 0x00007f5159fc2f1e in QApplicationPrivate::notify_helper(QObject*, > QEvent*) () at /lib64/libQt6Widgets.so.6 > #8 0x00007f5158d8f030 in QCoreApplication::notifyInternal2(QObject*, > QEvent*) () at /lib64/libQt6Core.so.6 > #9 0x00007f51595eda8b in > QGuiApplicationPrivate::processExposeEvent(QWindowSystemInterfacePrivate:: > ExposeEvent*) () at /lib64/libQt6Gui.so.6 > #10 0x00007f515964c05c in > QWindowSystemInterface::sendWindowSystemEvents(QFlags<QEventLoop:: > ProcessEventsFlag>) () at /lib64/libQt6Gui.so.6 > #11 0x00007f515964c1e7 in > QWindowSystemInterface::flushWindowSystemEvents(QFlags<QEventLoop:: > ProcessEventsFlag>) () at /lib64/libQt6Gui.so.6 > #12 0x00007f5159627340 in QPlatformWindow::setVisible(bool) () at > /lib64/libQt6Gui.so.6 > #13 0x00007f515aa839b4 in () at /lib64/libQt6Qml.so.6 > #14 0x00007f515aa97936 in () at /lib64/libQt6Qml.so.6 > #15 0x00007f515aa9613d in > QQmlBinding::doUpdate(QQmlJavaScriptExpression::DeleteWatcher const&, > QFlags<QQmlPropertyData::WriteFlag>, QV4::Scope&) () at /lib64/libQt6Qml.so.6 > #16 0x00007f515aa94084 in > QQmlBinding::update(QFlags<QQmlPropertyData::WriteFlag>) () at > /lib64/libQt6Qml.so.6 > #17 0x00007f515ab096c8 in QQmlNotifier::emitNotify(QQmlNotifierEndpoint*, > void**) () at /lib64/libQt6Qml.so.6 > #18 0x00007f5158de7e88 in () at /lib64/libQt6Core.so.6 > #19 0x00007f515ba2479d in KWin::Window::setElectricBorderMaximizing(bool) () > at /lib64/libkwin.so.6 > > Apparently something in Qt's Rhi has a lock file in .cache which needs to be > synced to disk > (/home/testuser/.cache/kwin/qtpipelinecache-x86_64-little_endian-lp64/ > qqpc_opengl.lck). Fortunately it's only a fdatasync and not a fsync or even > sync so it only waits until that specific file data has made it to disk, but > it can still block for a while. > > Maybe there's a way to avoid QLockFile use there. Relevant documentation: https://doc.qt.io/qt-5/qmldiskcache.html Another workaround would be to use `export QML_DISK_CACHE_PATH=/tmp/qmlcache` where /tmp/qmlcache is a fast drive or a ramdisk. For some reason (that I don't know but there is probably one) we don't use `qt_add_qml_module` in plasma and kwin, that does pre-compile the js/qml/C++-binding and embedded it in the executable resources: https://doc.qt.io/qt-6/qt-add-qml-module.html#caching-compiled-qml-sources We do use it in applications (spectacle, neochat...). (In reply to Méven Car from comment #15) > (In reply to Fabian Vogt from comment #14) > > Apparently something in Qt's Rhi has a lock file in .cache which needs to be > > synced to disk > > (/home/testuser/.cache/kwin/qtpipelinecache-x86_64-little_endian-lp64/ > > qqpc_opengl.lck). Fortunately it's only a fdatasync and not a fsync or even > > sync so it only waits until that specific file data has made it to disk, but > > it can still block for a while. > > > > Maybe there's a way to avoid QLockFile use there. > > Relevant documentation: > https://doc.qt.io/qt-5/qmldiskcache.html > > Another workaround would be to use `export > QML_DISK_CACHE_PATH=/tmp/qmlcache` where /tmp/qmlcache is a fast drive or a > ramdisk. > > For some reason (that I don't know but there is probably one) we don't use > `qt_add_qml_module` in plasma and kwin, that does pre-compile the > js/qml/C++-binding and embedded it in the executable resources: > https://doc.qt.io/qt-6/qt-add-qml-module.html#caching-compiled-qml-sources > > We do use it in applications (spectacle, neochat...). The QML disk cache is completely unrelated. This is the Qt Rhi pipeline cache, which is (unconditionally) enabled by QtQuick: https://github.com/qt/qtdeclarative/blob/d9ceacd4126ff48d5f1e6d5b7b0f3be426d2cf35/src/quick/scenegraph/qsgrhisupport.cpp#L941 strace for me showed second invocation did not touch the QML cache files, as expected. RHI pipeline was hit a lot. RHI pipeline cache can be disabled, see: https://doc.qt.io/qt-6/qquickgraphicsconfiguration.html#the-automatic-pipeline-cache There are env vars for easy profiling. Interetsingly there is a line that the automatic pipeline cache naming doesn't handle the same UI in mulitple windows in the same app very well, that's something we use in kwin. So there's definitely something to tweak I ran with QCoreApplication::setAttribute(Qt::AA_DisableShaderDiskCache); in main.cpp and it's certainly not visibly worse. Our pipelines aren't exactly complicated, and mesa has a cache anyway, maybe it isn't worth it. Turning this into some actual numbers, total time in pipeline creation for both the overview and cube effect with the cache off gave the result: "Total time spent on pipeline creation during the lifetime of the QRhi 0x576293504b00 was 0 ms" on a 5 year old Intel laptop. >Maybe there's a way to avoid QLockFile use there.
I think so. From my reading of the code it can all be guarded with
#if !QT_CONFIG(temporaryfile)
The saving *sometimes* uses QSaveFile which writes to another file and does an atomic move depending on build flags, I think we only need the lock file when not using that.
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/5802 Git commit f700de56f8ae6c15c7fd7d18bdaf47bf1e21b219 by Vlad Zahorodnii, on behalf of David Edmundson. Committed on 31/05/2024 at 15:28. Pushed by vladz into branch 'master'. core: Disable Qt RHI pipeline cache The Qt pipeline cache causes a disk sync on every load and save of a QQuickWindow. This causes a stutter under high disk usage. The gains from this cache are minimal on our simple scenes on PC hardware. Especially given mesa has it's own cache, profiling on my personal laptop showed the pipeline as being 0ms. There is an upstream patch at https://codereview.qt-project.org/c/qt/qtdeclarative/+/564411 . QSaveFile still has a sync, but that should only be hit for the first non-cached run. I'm also adding a flag to QSaveFile to fix the QML cache and first run case. Tested via running kwin with `strace -e inject=fdatasync:delay_enter=10000000` to simulate a slow flush. M +5 -0 src/main_wayland.cpp M +5 -0 src/main_x11.cpp https://invent.kde.org/plasma/kwin/-/commit/f700de56f8ae6c15c7fd7d18bdaf47bf1e21b219 Git commit c747f9c3a7cb9aab74ea07f38eb5de43feb06a2c by Vlad Zahorodnii. Committed on 31/05/2024 at 18:06. Pushed by vladz into branch 'Plasma/6.1'. core: Disable Qt RHI pipeline cache The Qt pipeline cache causes a disk sync on every load and save of a QQuickWindow. This causes a stutter under high disk usage. The gains from this cache are minimal on our simple scenes on PC hardware. Especially given mesa has it's own cache, profiling on my personal laptop showed the pipeline as being 0ms. There is an upstream patch at https://codereview.qt-project.org/c/qt/qtdeclarative/+/564411 . QSaveFile still has a sync, but that should only be hit for the first non-cached run. I'm also adding a flag to QSaveFile to fix the QML cache and first run case. Tested via running kwin with `strace -e inject=fdatasync:delay_enter=10000000` to simulate a slow flush. (cherry picked from commit f700de56f8ae6c15c7fd7d18bdaf47bf1e21b219) ac5aeb67 core: Disable Qt RHI pipeline cache Co-authored-by: David Edmundson <kde@davidedmundson.co.uk> M +5 -0 src/main_wayland.cpp M +5 -0 src/main_x11.cpp https://invent.kde.org/plasma/kwin/-/commit/c747f9c3a7cb9aab74ea07f38eb5de43feb06a2c |
Created attachment 169491 [details] Logs of different Kwin hangs happening from both Kwin's perspective and the kerne'ls. SUMMARY If your "~/.cache" directory is stored on slow storage, such as a spinning disk or an LVM pool, using QML-based desktop effects like Tiling Editor and Alt+Tab causes extreme multi-second Kwin hangs. STEPS TO REPRODUCE 1. Add some kind of slow storage to your system (spinning HDD, LVM pool made of HDDs, slow network filesystem, etc.) 2. Move "~/.cache" to the slow storage medium and symlink it. Alternatively, move your entire /home to the slow storage device. 3. Bring up Tiling Editor with Meta+T. OBSERVED RESULT Depending on how slow/busy the storage medium is, Kwin will hang for at least 2 seconds, sometimes up to 15 in really bad cases. During this hang, the system is completely unresponsive - no mouse or keyboard input whatsoeever, and if the hang is long enough, Kwin will warn in the logs about DRM pageflips taking too long. EXPECTED RESULT The system should stay responsive and Kwin shouldn't hang, even if opening Tiling Editor takes slightly longer. SOFTWARE/OS VERSIONS Linux: 6.8.9-arch1-2 (64-bit) KDE Plasma Version: 6.0.4 KDE Frameworks Version: 6.2.0 Qt Version: 6.7.0 ADDITIONAL INFORMATION So far, known-affected effects are: - Tiling Editor (meta+T) - Window Overview (meta+W) - Alt+Tab, when Alt is held down (which brings up the window switcher menu) It seems to be any effect that uses Qt QML, and other than those three, I haven't personally tested many. This is also distro-independent. Other users than myself have reported the exact same hangs occurring on their system, with the common configuration being their home directory being stored on slow storage. I have attached three logs that show the issue happening. One of them is of what Kwin sees when a hang happens. The two dmesg logs are with extreme verbose DRM logging enabled, one with two screens plugged in and one with one screen. These were captured with Xaver Hugil's help, before I suspected it was disk-related, however maybe there's something useful in there.