Bug 504861 - powerdevil crash from libddcutil, no restart happens
Summary: powerdevil crash from libddcutil, no restart happens
Status: RESOLVED DUPLICATE of bug 500843
Alias: None
Product: plasmashell
Classification: Plasma
Component: Power management & brightness (other bugs)
Version First Reported In: 6.3.90
Platform: Gentoo Packages Linux
: NOR crash
Target Milestone: 1.0
Assignee: Plasma Bugs List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-05-27 13:29 UTC by Martin Bednar
Modified: 2025-06-27 18:41 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
powerdevil log (9.94 KB, text/plain)
2025-05-27 21:54 UTC, Martin Bednar
Details
log with assertion (32.81 KB, text/plain)
2025-06-05 21:52 UTC, Martin Bednar
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Bednar 2025-05-27 13:29:36 UTC
Powerdevil sometimes crashes, and doesn't restart, leaving the laptop with no power management.

This happens most often during resume from suspend. But I also have seen it happen after GPU hangs.

I realize this is probably more a ddcutil bug, and am open to help coordinate with upstream, but one thing that KDE could do is updating the systemd service file to try restarting, because systemctl --user restart plasma-powerdevil.service brings power management back.

kscreen.kwayland: removing output "DP-2"
[ 26306] Removing connected display on bus 14
[ 26306] Emitting DDCA_Display_Status_Event[52910.779:  DDCA_EVENT_DISPLAY_DISCONNECTED, card0-DP-2, dref: DDCA_Display_Ref[20], io_path:/dev/i2c-14, ddc working: false]
[ 26306] libddcutil callback thread 0x7ffa300130e0 started
[ 26306] Started 1 event callback thread(s)
[ 26307] (ddc_open_display) Delay before rechecking attribute status: Slept for 1000 millisec
[ 26307] Assertion failed: "bus_info" in file ddc_packet_io.c at line 266
QtDBus: cannot relay signals from parent QObject(0x55d070f60740 "") unless they are emitted in the object's thread QThread(0x55d070efffd0 "Qt mainThread"). Current thread is QThread(0x55d0710bb2b>
[ 26307] Watch thread terminated.
QSocketNotifier: Socket notifiers cannot be enabled or disabled from another thread
QMutex: destroying locked mutex
libddcutil terminating.
g_source_unref_internal: assertion 'source != NULL' failed
systemd[3420] plasma-powerdevil.service: Consumed 18.713s CPU time, 47.3M memory peak.


SOFTWARE/OS VERSIONS
Operating System: Gentoo 2.17
KDE Plasma Version: 6.3.90
KDE Frameworks Version: 6.14.0
Qt Version: 6.8.3
Kernel Version: 6.14.7-gentoo-dist-hardened (64-bit)
Graphics Platform: Wayland
Processors: 24 × AMD Ryzen AI 9 HX 370 w/ Radeon 890M
Memory: 64 GiB of RAM (62.1 GiB usable)
Graphics Processor: AMD Radeon Graphics
Comment 1 Sanfod Rockowitz 2025-05-27 15:22:21 UTC
Regarding PowerDevil recovery from libddcutil failure, see the discussion on https://invent.kde.org/plasma/powerdevil/-/merge_requests/542.    The consensus is that PowerDevil should recover from libddcutil failure by restarting.

Regarding the libddcutil bug, from the line number it appears that you are using libddcutil from release 2.2.0.  Can I get you to build from ddcutil branch 2.2.1-dev and see if the assert() is still triggered?  2.2.1-dev has seen a lot of work in this area.  I'm on the verge of releasing 2.2.1, and it would be helpful to know if this problem still exists.
Comment 2 Martin Bednar 2025-05-27 15:55:43 UTC
installed 2.2.1-dev . all links to libddcutil.so.5.3.0 .
It's kind of rare, so I'm not sure what timeframe to give it. So far I've seen it about once a day.
Comment 3 Martin Bednar 2025-05-27 15:56:42 UTC
(In reply to Sanfod Rockowitz from comment #1)
> Regarding PowerDevil recovery from libddcutil failure, see the discussion on
> https://invent.kde.org/plasma/powerdevil/-/merge_requests/542.    The
> consensus is that PowerDevil should recover from libddcutil failure by
> restarting.

Thanks for this link. I just stumbled over powerdevil having exited with success after the assert...
Comment 4 Sanfod Rockowitz 2025-05-27 16:45:56 UTC
(In reply to Martin Bednar from comment #3)
> (In reply to Sanfod Rockowitz from comment #1)
> > Regarding PowerDevil recovery from libddcutil failure, see the discussion on
> > https://invent.kde.org/plasma/powerdevil/-/merge_requests/542.    The
> > consensus is that PowerDevil should recover from libddcutil failure by
> > restarting.
> 
> Thanks for this link. I just stumbled over powerdevil having exited with
> success after the assert...

I believe the problem was that powerdevil  didn't restart after having exited with success.  Perhaps that has been fixed.
Comment 5 Nate Graham 2025-05-27 17:56:45 UTC
It's fixed in the upcoming ddcutil-2.2.1 release.

*** This bug has been marked as a duplicate of bug 500843 ***
Comment 6 Martin Bednar 2025-05-27 18:24:36 UTC
(In reply to Nate Graham from comment #5)
> It's fixed in the upcoming ddcutil-2.2.1 release.
> 

I disagree, a big part of the issue is that powerdevil doesn't restart when the assertion makes it exit with exit code 0. (Or whichever it was, considered as success by systemd)
Comment 7 Sanfod Rockowitz 2025-05-27 20:31:07 UTC
I've looked more closely at the specific  assertion failure cited.  It's not actually an assert() call.   Rather it's a macro TRACED_ASSERT() that I defined to facilitate remote debugging.  For implementation and cross-platform reasons, I defined it to call exit(1) instead of actually asserting failure, which is why, I expect, powerdevil terminates with "success".  

If this makes sense, I can replace the TRACED_ASSERT() statements with normal assert() calls.  It makes the assert failures less informative, but I can live with that if it's what it takes to make powerdevil terminate with failure instead of success.  

Of course, the assertion shouldn't be triggered in the first place - it indicates inconsistent data structures. It's hard to prove a negative, but hopefully the changes in 2.2.1-dev will have addressed this situation.
Comment 8 Martin Bednar 2025-05-27 20:40:28 UTC
I can't comment on how to fix it in code, not my place.
I can report that I've had 2 GPU resets, and powerdevil didn't stop. But as you say, proving negative...
Comment 9 Martin Bednar 2025-05-27 21:48:42 UTC
Got something. powerdevil survived, to there's that. 
Situation: 2 external displays, laptop in clamshell mode. Laptop idle, displays suspended. Upon resume, powerdevil log:

May 27 23:40:49 ged org_kde_powerdevil[3058]: No traced function stack for current thread
May 27 23:40:49 ged org_kde_powerdevil[3058]: [  3827] (i2c_check_open_bus_alive) Retrying i2c_check_edid_exists_by_dh() (c): Sleeping for 1000 milliseconds
May 27 23:40:49 ged org_kde_powerdevil[3058]: kscreen.kwayland: removing output "DP-2"
May 27 23:40:50 ged org_kde_powerdevil[3058]: [  3827] (i2c_check_open_bus_alive) Retrying i2c_check_edid_exists_by_dh() (c): Sleeping for 1000 milliseconds
May 27 23:40:50 ged org_kde_powerdevil[3058]: [  3826] Removing connected display on bus 14
May 27 23:40:50 ged org_kde_powerdevil[3058]: [  3826] Emitting DDCA_Display_Status_Event[16468.891:  DDCA_EVENT_DISPLAY_DISCONNECTED, card0-DP-2, dref: DDCA_Display_Ref[2], io_path:/dev/i2c-14, ddc working: false]
May 27 23:40:50 ged org_kde_powerdevil[3058]: [  3826] Starting 1 callback threads
May 27 23:40:50 ged org_kde_powerdevil[3058]: [  3826] libddcutil callback thread 0x7f0520012ba0 started
May 27 23:40:50 ged org_kde_powerdevil[3058]: [  3826] Started 1 event callback thread(s)
May 27 23:40:50 ged org_kde_powerdevil[3058]: [ 36422] Invoking callback function 0x7f056112ba00 for event DDCA_Display_Status_Event[16468.891:  DDCA_EVENT_DISPLAY_DISCONNECTED, card0-DP-2, dref: DDCA_Display_Ref[2], io_path:/dev/i2c-14, ddc>
May 27 23:40:50 ged org_kde_powerdevil[3058]: [ 36422] Callback function 0x7f056112ba00 for event DDCA_Display_Status_Event[16468.891:  DDCA_EVENT_DISPLAY_DISCONNECTED, card0-DP-2, dref: DDCA_Display_Ref[2], io_path:/dev/i2c-14, ddc working:>
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3827] /dev/i2c-14: Checking EDID succeeded after 4 tries (G)
May 27 23:40:51 ged org_kde_powerdevil[3058]: org.kde.powerdevil: [DDCutilDisplay]: ddca_get_non_table_vcp_value -5
May 27 23:40:51 ged org_kde_powerdevil[3058]: org.kde.powerdevil: [DDCutilDisplay]: ddca_set_non_table_vcp_value -3029
May 27 23:40:51 ged org_kde_powerdevil[3058]: kscreen.kwayland: adding output 78
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3826] [  3826]Locked bus /dev/i2c-14 after 3 tries
May 27 23:40:51 ged org_kde_powerdevil[3058]: Current call stack:
May 27 23:40:51 ged org_kde_powerdevil[3058]: 
May 27 23:40:51 ged org_kde_powerdevil[3058]: 
May 27 23:40:51 ged org_kde_powerdevil[3058]: 
May 27 23:40:51 ged org_kde_powerdevil[3058]: 
May 27 23:40:51 ged org_kde_powerdevil[3058]: 
May 27 23:40:51 ged org_kde_powerdevil[3058]: 
May 27 23:40:51 ged org_kde_powerdevil[3058]: 
May 27 23:40:51 ged org_kde_powerdevil[3058]: 
May 27 23:40:51 ged org_kde_powerdevil[3058]: 
May 27 23:40:51 ged org_kde_powerdevil[3058]: No traced function stack for current thread
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3826] Adding connected display with bus 14
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3826] Emitting DDCA_Display_Status_Event[16470.362:  DDCA_EVENT_DISPLAY_CONNECTED, card0-DP-2, dref: DDCA_Display_Ref[4], io_path:/dev/i2c-14, ddc working: true]
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3826] Starting 1 callback threads
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3826] libddcutil callback thread 0x7f0520014790 started
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3826] Started 1 event callback thread(s)
May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 36489] Invoking callback function 0x7f056112ba00 for event DDCA_Display_Status_Event[16470.362:  DDCA_EVENT_DISPLAY_CONNECTED, card0-DP-2, dref: DDCA_Display_Ref[4], io_path:/dev/i2c-14, ddc wo>
May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 36489] Callback function 0x7f056112ba00 for event DDCA_Display_Status_Event[16470.362:  DDCA_EVENT_DISPLAY_CONNECTED, card0-DP-2, dref: DDCA_Display_Ref[4], io_path:/dev/i2c-14, ddc working: tr>
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3058] Quiescing libddcutil API...
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3058] Quiesce libddcutil API complete
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3058] Display redetection starting.
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3825] recheck thread terminating because watch thread terminated
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3058] Watch thread terminated.
May 27 23:40:51 ged org_kde_powerdevil[3058]: [  3058] (i2c_detect_x37) Extra x37 sleep: Sleeping for 400 milliseconds
May 27 23:40:52 ged org_kde_powerdevil[3058]: [  3058] (i2c_detect_x37) Extra x37 sleep: Sleeping for 400 milliseconds
May 27 23:40:52 ged org_kde_powerdevil[3058]: [  3058] (i2c_detect_x37) Extra x37 sleep: Sleeping for 400 milliseconds
May 27 23:40:55 ged org_kde_powerdevil[3058]: [ 36493] (check_supported_feature) busno=15, sleep-multiplier =  2.00. Testing for supported feature 0x10 returned Error_Info[DDCRC_RETRIES in ddc_write_read_with_retry, causes: ETIMEDOUT(10)]
May 27 23:40:55 ged org_kde_powerdevil[3058]: [  3058] Watching for display connection changes, resolved watch mode = Watch_Mode_Xevent, poll loop interval = 100 millisec
May 27 23:40:55 ged org_kde_powerdevil[3058]: [  3058]                                          extra_stabilization_millisec: 0,  stabilization_poll_millisec: 100
May 27 23:40:55 ged org_kde_powerdevil[3058]: [  3058] libddcutil recheck thread 0x563fbca6b1c0 started
May 27 23:40:55 ged org_kde_powerdevil[3058]: [  3058] libddcutil watch thread 0x563fbcbb2d30 started
May 27 23:40:55 ged org_kde_powerdevil[3058]: [  3058] Display redetection finished.
May 27 23:40:55 ged org_kde_powerdevil[3058]: [  3058] Unquiescing libddcutil API...
May 27 23:40:55 ged org_kde_powerdevil[3058]: [ 36498] (dw_recheck_displays_func) Recheck interval: Sleeping for 200 milliseconds
Comment 10 Martin Bednar 2025-05-27 21:54:59 UTC
Created attachment 181800 [details]
powerdevil log

Complete log from start of instance. Copying apparently truncated some lines.
Wake from idle starts at time 23:40:49.
Also might be relevant: the 2 displays have a very big difference in the time they need to power up from idle.
Comment 11 Sanfod Rockowitz 2025-05-28 13:21:50 UTC
I have made a fix to ddcutil 2.2.1-dev that resolves the issue, at least in the case that Martin documented, where PowerDevil doesn't restart when an "assertion" failed in libddcutil.  The problem was that, for historical reasons, macro TRACED_ASSERT() used exit(1) instead of __failed_assert() to terminate library execution. This has been changed.

Martin, thank you for the log that reported the assert failure.  It changed the problem from "libddcutil fails for some reason and powerdevil doesn't restart" to "libddcutil fails for a specific reason and powerdevil doesn't restart".
Comment 12 Martin Bednar 2025-05-28 19:13:10 UTC
Running 2.2.0 with 9c55c47 and 119df96 applied to confirm it helped. Will report back when it happens.
Comment 13 Martin Bednar 2025-06-03 13:47:17 UTC
I'm sorry, haven't seen the assert happening ever since.
Might be because amdgpu fixes in 6.14.9, maybe because I reseated the USB-C dongle...
I'll keep running it and monitoring, just can't promise it'll happen.
Comment 14 Sanfod Rockowitz 2025-06-05 20:55:40 UTC
Martin, with the current ddcutil branch 2.2.1-dev the problematic assertion has been removed.  Instead the API call that triggered the assertion returns status DDCRC_DISCONNECTED.   So the question has become whether powerdevil properly handles this error code.
Comment 15 Martin Bednar 2025-06-05 21:52:04 UTC
Created attachment 182056 [details]
log with assertion

I'm running 2.2.0 with the commits switching exit(1) to __assert_fail() applied, to confirm hitting an assertion now results in powerdevil being restarted, and guess what just happened? hit the/an assert.
Now if ddcutil asserts, powerdevil exits with 6/ABRT and systemd restarts it. So that part of the bug is fixed. 
Whether the rest is, that's out of my current scope of knowledge...
If you're interested in the log, it's attached.
Comment 16 Martin Bednar 2025-06-05 22:09:33 UTC
rereading my comment, I'm not sure whether I'm clear: 
I was only testing whether the assert behavior change fixed the "powerdevil not restarting automatically after assert in ddcutil" part of the bug.
And that is definitely fixed.
Comment 17 Sanfod Rockowitz 2025-06-06 09:36:19 UTC
Martin, that is useful confirmation.  Do you recall what you were doing/what was happening when the assert was hit?
Comment 18 Martin Bednar 2025-06-06 10:31:49 UTC
(In reply to Sanfod Rockowitz from comment #17)
> Martin, that is useful confirmation.  Do you recall what you were doing/what
> was happening when the assert was hit?

I returned to the PC after > 10 minutes, meaning the displays were waking up from idle.
It's weird: sometimes kscreen doesn't recognize the display, and sets up a new configuration. I then power cycle the display, and the saved configuration gets applied.
Comment 19 Sanfod Rockowitz 2025-06-06 17:18:57 UTC
Now that ddca_open_display(), in branch 2.2.1-dev, returns status DDCRC_DISCONNECTED instead of asserting failure, you might try running the development branch.  Every so often, or when there's a powerdevil failure, look in the powerdevil journal for string "Display_Ref.detail == NULL, but DREF_REMOVED not set".   There will also be a backtrace.  This should give a clue as to how this error state arises, and more importantly whether powerdevil handles the new return code.
Comment 20 Martin Bednar 2025-06-06 20:52:25 UTC
Ok, will create a filter with notification. and report back. It'll take a few days.
Comment 21 Martin Bednar 2025-06-26 09:37:35 UTC
Hi, I haven't forgotten
> journalctl --user -u plasma-powerdevil.service -u systemd  -f| grep "Display_Ref.detail"

hasn't yielded anything in the past 2 weeks... continuing to let it run
Comment 22 Sanfod Rockowitz 2025-06-27 01:03:50 UTC
After a painful upgrade saga, I have a test  environment for this bug and I believe it can be marked resolved. 

Test environment:

laptop: 
- Lenovo Intel Workstation, Gen 1
- openSUSE turmbleweed 
- Plasma 6.4, with lbiddcutil.so.5.2.0 from ddcutil 2.2.0 installed

docking station
- Dell WD 19S
- DisplayPort attached monitor

With libddcutil.so.5.2.0 from ddcutil 2.2.0, I see the failure.
With libddcutil.so.5.3.0 built from the ddcutil 2.2.1 development branch, the failure does not occur.

This was the only significant issue outstanding for ddcutil 2.2.1.  It's time to get it out the door.
Comment 23 Nate Graham 2025-06-27 18:41:03 UTC
Great, thanks!