| Summary: | powerdevil crash from libddcutil, no restart happens | ||
|---|---|---|---|
| Product: | [Plasma] plasmashell | Reporter: | Martin Bednar <martin+kde> |
| Component: | Power management & brightness | Assignee: | Plasma Bugs List <plasma-bugs-null> |
| Status: | RESOLVED DUPLICATE | ||
| Severity: | crash | CC: | martin+kde, nate, rockowitz |
| Priority: | NOR | ||
| Version First Reported In: | 6.3.90 | ||
| Target Milestone: | 1.0 | ||
| Platform: | Gentoo Packages | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: |
powerdevil log
log with assertion |
||
|
Description
Martin Bednar
2025-05-27 13:29:36 UTC
Regarding PowerDevil recovery from libddcutil failure, see the discussion on https://invent.kde.org/plasma/powerdevil/-/merge_requests/542. The consensus is that PowerDevil should recover from libddcutil failure by restarting. Regarding the libddcutil bug, from the line number it appears that you are using libddcutil from release 2.2.0. Can I get you to build from ddcutil branch 2.2.1-dev and see if the assert() is still triggered? 2.2.1-dev has seen a lot of work in this area. I'm on the verge of releasing 2.2.1, and it would be helpful to know if this problem still exists. installed 2.2.1-dev . all links to libddcutil.so.5.3.0 . It's kind of rare, so I'm not sure what timeframe to give it. So far I've seen it about once a day. (In reply to Sanfod Rockowitz from comment #1) > Regarding PowerDevil recovery from libddcutil failure, see the discussion on > https://invent.kde.org/plasma/powerdevil/-/merge_requests/542. The > consensus is that PowerDevil should recover from libddcutil failure by > restarting. Thanks for this link. I just stumbled over powerdevil having exited with success after the assert... (In reply to Martin Bednar from comment #3) > (In reply to Sanfod Rockowitz from comment #1) > > Regarding PowerDevil recovery from libddcutil failure, see the discussion on > > https://invent.kde.org/plasma/powerdevil/-/merge_requests/542. The > > consensus is that PowerDevil should recover from libddcutil failure by > > restarting. > > Thanks for this link. I just stumbled over powerdevil having exited with > success after the assert... I believe the problem was that powerdevil didn't restart after having exited with success. Perhaps that has been fixed. It's fixed in the upcoming ddcutil-2.2.1 release. *** This bug has been marked as a duplicate of bug 500843 *** (In reply to Nate Graham from comment #5) > It's fixed in the upcoming ddcutil-2.2.1 release. > I disagree, a big part of the issue is that powerdevil doesn't restart when the assertion makes it exit with exit code 0. (Or whichever it was, considered as success by systemd) I've looked more closely at the specific assertion failure cited. It's not actually an assert() call. Rather it's a macro TRACED_ASSERT() that I defined to facilitate remote debugging. For implementation and cross-platform reasons, I defined it to call exit(1) instead of actually asserting failure, which is why, I expect, powerdevil terminates with "success". If this makes sense, I can replace the TRACED_ASSERT() statements with normal assert() calls. It makes the assert failures less informative, but I can live with that if it's what it takes to make powerdevil terminate with failure instead of success. Of course, the assertion shouldn't be triggered in the first place - it indicates inconsistent data structures. It's hard to prove a negative, but hopefully the changes in 2.2.1-dev will have addressed this situation. I can't comment on how to fix it in code, not my place. I can report that I've had 2 GPU resets, and powerdevil didn't stop. But as you say, proving negative... Got something. powerdevil survived, to there's that. Situation: 2 external displays, laptop in clamshell mode. Laptop idle, displays suspended. Upon resume, powerdevil log: May 27 23:40:49 ged org_kde_powerdevil[3058]: No traced function stack for current thread May 27 23:40:49 ged org_kde_powerdevil[3058]: [ 3827] (i2c_check_open_bus_alive) Retrying i2c_check_edid_exists_by_dh() (c): Sleeping for 1000 milliseconds May 27 23:40:49 ged org_kde_powerdevil[3058]: kscreen.kwayland: removing output "DP-2" May 27 23:40:50 ged org_kde_powerdevil[3058]: [ 3827] (i2c_check_open_bus_alive) Retrying i2c_check_edid_exists_by_dh() (c): Sleeping for 1000 milliseconds May 27 23:40:50 ged org_kde_powerdevil[3058]: [ 3826] Removing connected display on bus 14 May 27 23:40:50 ged org_kde_powerdevil[3058]: [ 3826] Emitting DDCA_Display_Status_Event[16468.891: DDCA_EVENT_DISPLAY_DISCONNECTED, card0-DP-2, dref: DDCA_Display_Ref[2], io_path:/dev/i2c-14, ddc working: false] May 27 23:40:50 ged org_kde_powerdevil[3058]: [ 3826] Starting 1 callback threads May 27 23:40:50 ged org_kde_powerdevil[3058]: [ 3826] libddcutil callback thread 0x7f0520012ba0 started May 27 23:40:50 ged org_kde_powerdevil[3058]: [ 3826] Started 1 event callback thread(s) May 27 23:40:50 ged org_kde_powerdevil[3058]: [ 36422] Invoking callback function 0x7f056112ba00 for event DDCA_Display_Status_Event[16468.891: DDCA_EVENT_DISPLAY_DISCONNECTED, card0-DP-2, dref: DDCA_Display_Ref[2], io_path:/dev/i2c-14, ddc> May 27 23:40:50 ged org_kde_powerdevil[3058]: [ 36422] Callback function 0x7f056112ba00 for event DDCA_Display_Status_Event[16468.891: DDCA_EVENT_DISPLAY_DISCONNECTED, card0-DP-2, dref: DDCA_Display_Ref[2], io_path:/dev/i2c-14, ddc working:> May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3827] /dev/i2c-14: Checking EDID succeeded after 4 tries (G) May 27 23:40:51 ged org_kde_powerdevil[3058]: org.kde.powerdevil: [DDCutilDisplay]: ddca_get_non_table_vcp_value -5 May 27 23:40:51 ged org_kde_powerdevil[3058]: org.kde.powerdevil: [DDCutilDisplay]: ddca_set_non_table_vcp_value -3029 May 27 23:40:51 ged org_kde_powerdevil[3058]: kscreen.kwayland: adding output 78 May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3826] [ 3826]Locked bus /dev/i2c-14 after 3 tries May 27 23:40:51 ged org_kde_powerdevil[3058]: Current call stack: May 27 23:40:51 ged org_kde_powerdevil[3058]: May 27 23:40:51 ged org_kde_powerdevil[3058]: May 27 23:40:51 ged org_kde_powerdevil[3058]: May 27 23:40:51 ged org_kde_powerdevil[3058]: May 27 23:40:51 ged org_kde_powerdevil[3058]: May 27 23:40:51 ged org_kde_powerdevil[3058]: May 27 23:40:51 ged org_kde_powerdevil[3058]: May 27 23:40:51 ged org_kde_powerdevil[3058]: May 27 23:40:51 ged org_kde_powerdevil[3058]: May 27 23:40:51 ged org_kde_powerdevil[3058]: No traced function stack for current thread May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3826] Adding connected display with bus 14 May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3826] Emitting DDCA_Display_Status_Event[16470.362: DDCA_EVENT_DISPLAY_CONNECTED, card0-DP-2, dref: DDCA_Display_Ref[4], io_path:/dev/i2c-14, ddc working: true] May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3826] Starting 1 callback threads May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3826] libddcutil callback thread 0x7f0520014790 started May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3826] Started 1 event callback thread(s) May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 36489] Invoking callback function 0x7f056112ba00 for event DDCA_Display_Status_Event[16470.362: DDCA_EVENT_DISPLAY_CONNECTED, card0-DP-2, dref: DDCA_Display_Ref[4], io_path:/dev/i2c-14, ddc wo> May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 36489] Callback function 0x7f056112ba00 for event DDCA_Display_Status_Event[16470.362: DDCA_EVENT_DISPLAY_CONNECTED, card0-DP-2, dref: DDCA_Display_Ref[4], io_path:/dev/i2c-14, ddc working: tr> May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3058] Quiescing libddcutil API... May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3058] Quiesce libddcutil API complete May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3058] Display redetection starting. May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3825] recheck thread terminating because watch thread terminated May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3058] Watch thread terminated. May 27 23:40:51 ged org_kde_powerdevil[3058]: [ 3058] (i2c_detect_x37) Extra x37 sleep: Sleeping for 400 milliseconds May 27 23:40:52 ged org_kde_powerdevil[3058]: [ 3058] (i2c_detect_x37) Extra x37 sleep: Sleeping for 400 milliseconds May 27 23:40:52 ged org_kde_powerdevil[3058]: [ 3058] (i2c_detect_x37) Extra x37 sleep: Sleeping for 400 milliseconds May 27 23:40:55 ged org_kde_powerdevil[3058]: [ 36493] (check_supported_feature) busno=15, sleep-multiplier = 2.00. Testing for supported feature 0x10 returned Error_Info[DDCRC_RETRIES in ddc_write_read_with_retry, causes: ETIMEDOUT(10)] May 27 23:40:55 ged org_kde_powerdevil[3058]: [ 3058] Watching for display connection changes, resolved watch mode = Watch_Mode_Xevent, poll loop interval = 100 millisec May 27 23:40:55 ged org_kde_powerdevil[3058]: [ 3058] extra_stabilization_millisec: 0, stabilization_poll_millisec: 100 May 27 23:40:55 ged org_kde_powerdevil[3058]: [ 3058] libddcutil recheck thread 0x563fbca6b1c0 started May 27 23:40:55 ged org_kde_powerdevil[3058]: [ 3058] libddcutil watch thread 0x563fbcbb2d30 started May 27 23:40:55 ged org_kde_powerdevil[3058]: [ 3058] Display redetection finished. May 27 23:40:55 ged org_kde_powerdevil[3058]: [ 3058] Unquiescing libddcutil API... May 27 23:40:55 ged org_kde_powerdevil[3058]: [ 36498] (dw_recheck_displays_func) Recheck interval: Sleeping for 200 milliseconds Created attachment 181800 [details]
powerdevil log
Complete log from start of instance. Copying apparently truncated some lines.
Wake from idle starts at time 23:40:49.
Also might be relevant: the 2 displays have a very big difference in the time they need to power up from idle.
I have made a fix to ddcutil 2.2.1-dev that resolves the issue, at least in the case that Martin documented, where PowerDevil doesn't restart when an "assertion" failed in libddcutil. The problem was that, for historical reasons, macro TRACED_ASSERT() used exit(1) instead of __failed_assert() to terminate library execution. This has been changed. Martin, thank you for the log that reported the assert failure. It changed the problem from "libddcutil fails for some reason and powerdevil doesn't restart" to "libddcutil fails for a specific reason and powerdevil doesn't restart". Running 2.2.0 with 9c55c47 and 119df96 applied to confirm it helped. Will report back when it happens. I'm sorry, haven't seen the assert happening ever since. Might be because amdgpu fixes in 6.14.9, maybe because I reseated the USB-C dongle... I'll keep running it and monitoring, just can't promise it'll happen. Martin, with the current ddcutil branch 2.2.1-dev the problematic assertion has been removed. Instead the API call that triggered the assertion returns status DDCRC_DISCONNECTED. So the question has become whether powerdevil properly handles this error code. Created attachment 182056 [details]
log with assertion
I'm running 2.2.0 with the commits switching exit(1) to __assert_fail() applied, to confirm hitting an assertion now results in powerdevil being restarted, and guess what just happened? hit the/an assert.
Now if ddcutil asserts, powerdevil exits with 6/ABRT and systemd restarts it. So that part of the bug is fixed.
Whether the rest is, that's out of my current scope of knowledge...
If you're interested in the log, it's attached.
rereading my comment, I'm not sure whether I'm clear: I was only testing whether the assert behavior change fixed the "powerdevil not restarting automatically after assert in ddcutil" part of the bug. And that is definitely fixed. Martin, that is useful confirmation. Do you recall what you were doing/what was happening when the assert was hit? (In reply to Sanfod Rockowitz from comment #17) > Martin, that is useful confirmation. Do you recall what you were doing/what > was happening when the assert was hit? I returned to the PC after > 10 minutes, meaning the displays were waking up from idle. It's weird: sometimes kscreen doesn't recognize the display, and sets up a new configuration. I then power cycle the display, and the saved configuration gets applied. Now that ddca_open_display(), in branch 2.2.1-dev, returns status DDCRC_DISCONNECTED instead of asserting failure, you might try running the development branch. Every so often, or when there's a powerdevil failure, look in the powerdevil journal for string "Display_Ref.detail == NULL, but DREF_REMOVED not set". There will also be a backtrace. This should give a clue as to how this error state arises, and more importantly whether powerdevil handles the new return code. Ok, will create a filter with notification. and report back. It'll take a few days. Hi, I haven't forgotten
> journalctl --user -u plasma-powerdevil.service -u systemd -f| grep "Display_Ref.detail"
hasn't yielded anything in the past 2 weeks... continuing to let it run
After a painful upgrade saga, I have a test environment for this bug and I believe it can be marked resolved. Test environment: laptop: - Lenovo Intel Workstation, Gen 1 - openSUSE turmbleweed - Plasma 6.4, with lbiddcutil.so.5.2.0 from ddcutil 2.2.0 installed docking station - Dell WD 19S - DisplayPort attached monitor With libddcutil.so.5.2.0 from ddcutil 2.2.0, I see the failure. With libddcutil.so.5.3.0 built from the ddcutil 2.2.1 development branch, the failure does not occur. This was the only significant issue outstanding for ddcutil 2.2.1. It's time to get it out the door. Great, thanks! |