Bug 491401

Summary: SMART notification reporting is delayed
Product: [Unmaintained] plasma-disks Reporter: Colin S <bugs.kde.org>
Component: generalAssignee: Plasma Bugs List <plasma-bugs-null>
Status: RESOLVED INTENTIONAL    
Severity: normal CC: sitter
Priority: NOR    
Version First Reported In: unspecified   
Target Milestone: ---   
Platform: Debian testing   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Colin S 2024-08-07 17:04:22 UTC
SUMMARY
My system experienced a disk hardware failure. The disk fell off the bus and then came back after a reboot with SMART failed status. smartd noticed this immediately and triggered its own reporting mechanism, and now emits critical errors to syslog every 30 minutes. plasma-disks took a whole day to notice that there was a SMART error and raise a notification.

Glancing at the source code for plasma-disks, it does look like there is a fixed 24-hour timer in there to scan devices, so I guess that is probably how it finally managed to trigger a check and notify. Restarting the plasma-disks service causes it to report immediately, so I am not sure why it did not trigger on reboot, since the failing device already had to be initialised and on the bus for the system to even boot that far.

STEPS TO REPRODUCE
1. Experience a hardware vendor trying to ruin your day

OBSERVED RESULT
No notification from the OS

EXPECTED RESULT
plasma-disks notifies immediately (or at least shortly thereafter) about the failure

SOFTWARE/OS VERSIONS
Operating System: Debian GNU/Linux 12
KDE Plasma Version: 5.27.11
KDE Frameworks Version: 5.115.0
Qt Version: 5.15.13
Kernel Version: 6.9.10-amd64 (64-bit)

ADDITIONAL INFORMATION
If that 24 hour timer indeed is the only thing doing re-checks, and there is no easy way or desire to use smartd, it seems like it would be trivial and advisable to reduce the check time to something like 30 minutes.
Comment 1 Harald Sitter 2024-08-07 17:18:10 UTC
24h is the time we settled on.
Comment 2 Colin S 2024-08-07 17:22:14 UTC
Thanks for your quick reply! Could you help me out and provide a link where the discussion for this decision was made so I can read that instead of making you write it out here? From my perspective, this is a time critical situation for end-users, and smartd has no problem polling on 30 minute interval, so I would like to understand why plasma-disks would decide that 24 hours is a necessary amount of time to wait between checks. Thanks!
Comment 3 Colin S 2024-08-13 19:03:33 UTC
Hi, just wanted to make sure my question didn’t get lost in follow-up. If it is easier, just say what would need to be different to make a lower interval make sense to you. Always happy to understand the thoughts from an expert. Thanks and apologies in advance for the noise!