497424 – fd leak with explicit sync (nvidia)

Bug 497424 - fd leak with explicit sync (nvidia)

Summary: fd leak with explicit sync (nvidia)

Status:	RESOLVED UPSTREAM

Alias:	None

Product:	kwin
Classification:	Plasma
Component:	wayland-generic (other bugs)
Version First Reported In:	master
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	KWin default assignee

URL:
Keywords:

Depends on:
Blocks:

Reported:	2024-12-13 19:52 UTC by daron439
Modified:	2025-10-13 02:31 UTC (History)
CC List:	5 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
A script to repeatedly send notifications, resulting in plasmashell crashing. (1.40 KB, text/plain) 2025-03-01 20:09 UTC, Steve Therrien	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description daron439 2024-12-13 19:52:35 UTC

SUMMARY
I'm not sure where I should report this (nvidia, kwin, plasmashell or somewhere else). Every notification, opening/closing plasmoids cause a lot of sync_file leaks in plasmashell:

❯ lsof -p $(pidof plasmashell)
...
396r  a_inode               0,16        0      1062 sync_file
397r  a_inode               0,16        0      1062 sync_file
399r  a_inode               0,16        0      1062 sync_file
400r  a_inode               0,16        0      1062 sync_file

And plasmashell eventually crashes with:
plasmashell[2053]: error marshalling arguments for get_icon: dup failed: Too many open files
plasmashell[2053]: Error marshalling request: Too many open files
plasmashell[2053]: The Wayland connection experienced a fatal error: Too many open files
plasma-plasmashell.service: Main process exited, code=exited, status=255/EXCEPTION

If I set __NV_DISABLE_EXPLICIT_SYNC=1 in /etc/environment this doesn't happen.

STEPS TO REPRODUCE
1. Open/close Kickoff multiple times

OBSERVED RESULT
sync_file leaks in lsof -p $(pidof plasmashell)

EXPECTED RESULT
No leaks

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
(available in the Info Center app, or by running `kinfo` in a terminal window)
Linux/KDE Plasma: 
KDE Plasma Version: 6.2.80
KDE Frameworks Version: 6.9.0
Qt Version: 6.8.1

ADDITIONAL INFORMATION
nvidia driver: 565.77

Comment 1 Zamundaaa 2024-12-17 23:00:49 UTC

Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux/

Comment 2 Roman 2025-01-31 11:15:29 UTC

Bug it still present and it's told to be KDE fault, not NVidia. So please fix it. And I agree with what's told in NVidia forum, that plasmashell had memory leaks for years and no one does anything to fix this. So please fix it at least for KDE 6.3. Except buggy KDE's plasma shell as owner of NVidia gpu I insist there are no any sort of such bugs anywhere else happening except plasma shell + no any logs generated by your software which would be usefull

Comment 3 Zamundaaa 2025-01-31 11:22:32 UTC

Plasmashell doesn't have anything to do with explicit sync. This is a driver bug, the fact that noone from NVidia has looked at it is annoying but there is nothing we can do about it.

Comment 4 Roman 2025-01-31 15:38:09 UTC

There must be simply correct driver use. And people say intel has freeze issues too. + for a long time plasma shell had memory leaks. So investigation instead of just dropping and closing would be welcomed. Would be good to fix finally. And as owner of nvidia GPU I can tell that all the rest works ideally, even games over wine like World of Tanks and nothing crashes/freezes.

Comment 5 Steve Therrien 2025-03-01 20:09:37 UTC

Created attachment 179017 [details]
A script to repeatedly send notifications, resulting in plasmashell crashing.

I experience regular crashes due to my notification-heavy workflow before finding this bug. It appears like plasmashell is leaking descriptors when using the NVIDIA driver with explicit sync.

I don't know whether it's NVIDIA or Plasma that's responsible, but I've attached a script that easily trigger this crash. Maybe it will help someone identify the root cause.

This is a partial output from the script:

[user@fedroa-pc:[~]> ./leak.sh
Explicit sync is enabled. Descriptors should leak.

Notification    PID  Limit  Open descriptors  Until limit
------------  -----  -----  ----------------  -----------
           1   2563   1024               157          867
           2   2563   1024               157          867
           3   2563   1024               168          856
           4   2563   1024               177          847
[snip]
         216   2563   1024              1016            8
         217   2563   1024              1020            4
         218   2563   1024              1017            7
         219   2563   1024              1024            0
plasmashell crashed after 219 notifications

Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db036bdf90
Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db036bdf90
Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03438840
Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03438840
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: error marshalling arguments for import_timeline: dup failed: Too many open files
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: Error marshalling request: Too many open files
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03f04800
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03f04800
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: The Wayland connection experienced a fatal error: Too many open files
Mar 01 12:45:14 fedroa-pc systemd[1983]: Starting grub-boot-success.service - Mark boot as successful...
Mar 01 12:45:14 fedroa-pc systemd[1983]: Finished grub-boot-success.service - Mark boot as successful.
Mar 01 12:45:14 fedroa-pc systemd[1983]: plasma-plasmashell.service: Main process exited, code=exited, status=255/EXCEPTION
Mar 01 12:45:14 fedroa-pc systemd[1983]: plasma-plasmashell.service: Failed with result 'exit-code'.
Mar 01 12:45:14 fedroa-pc systemd[1983]: plasma-plasmashell.service: Consumed 23.288s CPU time, 331.5M memory peak.


As a workaround, increase plasmashell's open file limit and setting a large `LimitNOFILE` value:

> vim systemctl edit --user plasma-plasmashell.service

[Service]
# https://access.redhat.com/solutions/1257953
LimitNOFILE=50000

Save the file and log out.

----

Operating System: Fedora Linux 41
KDE Plasma Version: 6.3.2
KDE Frameworks Version: 6.11.0
Qt Version: 6.8.2
Kernel Version: 6.13.5-200.fc41.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 16 × 11th Gen Intel® Core™ i7-11850H @ 2.50GHz
Memory: 62.6 GiB of RAM
Graphics Processor: NVIDIA RTX A3000 Laptop GPU/PCIe/SSE2
NVIDIA Driver Version: 570.124.04

Comment 6 pallaswept 2025-10-13 02:31:48 UTC

Just in case, I should add that the above workaround is not necessarily safe. From the systemd docs:

https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Process%20Properties
See Table 1:

"Do not use. Be careful when raising the soft limit above 1024, since select(2) cannot function with file descriptors above 1023 on Linux... Typically applications should increase their soft limit to the hard limit on their own, if they are OK with working with file descriptors above 1023, i.e. do not use select(2). "

I'll leave it to a KDE developer to inform us whether this actually effects plasma (does plasma use select()?), and possibly consider implementing the advised means to appropriately modify the limit itself (is there any desire to put this workaround in plasma to prevent crashes temporarily until nvidia fix the driver?).