Bug 428844

Summary: No support for nvme disks
Product: [Plasma] plasma-disks Reporter: Kwon-Young Choi <kwon-young.choi>
Component: generalAssignee: Plasma Bugs List <plasma-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: justin.zobel, sitter
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Manjaro   
OS: Linux   
Latest Commit: Version Fixed In: 5.20.4
Sentry Crash Report:

Description Kwon-Young Choi 2020-11-08 15:06:27 UTC
SUMMARY

I have a nvme ssd PC300 NVMe SK hynix 256GB but it does not appear in the smart monitoring of the info center.

STEPS TO REPRODUCE
1. open info center, go to devices -> SMART Status

OBSERVED RESULT

No S.M.A.R.T devices found

EXPECTED RESULT

Smart status of my nvme disks

SOFTWARE/OS VERSIONS

Operating System: Manjaro Linux
KDE Plasma Version: 5.20.2
KDE Frameworks Version: 5.75.0
Qt Version: 5.15.1
Kernel Version: 5.9.3-1-MANJARO
OS Type: 64-bit
Processors: 8 × Intel® Core™ i7-7820HQ CPU @ 2.90GHz
Memory: 15.2 GiB of RAM
Graphics Processor: Mesa Intel® HD Graphics 630

ADDITIONAL INFORMATION

$ solid-hardware5 details /org/freedesktop/UDisks2/drives/PC300_NVMe_SK_hynix_256GB_EJ77N46141180792O
udi = '/org/freedesktop/UDisks2/drives/PC300_NVMe_SK_hynix_256GB_EJ77N46141180792O'
  parent = '/org/freedesktop/UDisks2'  (string)
  vendor = ''  (string)
  product = 'PC300 NVMe SK hynix 256GB'  (string)
  description = '238.5 GiB Hard Drive'  (string)
  icon = 'drive-harddisk'  (string)
  Block.major = 259  (0x103)  (int)
  Block.minor = 1  (0x1)  (int)
  Block.device = '/dev/nvme0n1p1'  (string)
  StorageDrive.bus = 'Platform'  (0x5)  (enum)
  StorageDrive.driveType = 'HardDisk'  (0x0)  (enum)
  StorageDrive.removable = false  (bool)
  StorageDrive.hotpluggable = false  (bool)
  StorageDrive.inUse = true  (bool)
  StorageDrive.size = 256060514304  (0x3b9e656000)  (qulonglong)

$ solid-hardware5 details /org/freedesktop/UDisks2/block_devices/nvme0n1p1                                                                           
udi = '/org/freedesktop/UDisks2/block_devices/nvme0n1p1'
  parent = '/org/freedesktop/UDisks2/drives/PC300_NVMe_SK_hynix_256GB_EJ77N46141180792O'  (string)
  vendor = ''  (string)
  product = 'PC300 NVMe SK hynix 256GB'  (string)
  description = '238.5 GiB Hard Drive'  (string)
  icon = 'drive-harddisk-root'  (string)
  Block.major = 259  (0x103)  (int)
  Block.minor = 1  (0x1)  (int)
  Block.device = '/dev/nvme0n1p1'  (string)
  StorageAccess.accessible = true  (bool)
  StorageAccess.filePath = '/'  (string)
  StorageAccess.ignored = true  (bool)
  StorageVolume.ignored = false  (bool)
  StorageVolume.usage = 'FileSystem'  (0x2)  (enum)
  StorageVolume.fsType = 'ext4'  (string)
  StorageVolume.label = ''  (string)
  StorageVolume.uuid = '31563a99-fa20-4ee4-8bea-6c6e005a703f'  (string)
  StorageVolume.size = 256059229184  (0x3b9e51c400)  (qulonglong)
Comment 1 Justin Zobel 2020-11-09 01:53:38 UTC
I have just tested on KDE Neon User and Unstable Edition for my two devices and each has an NVMe and are detected and show the status.
Comment 2 Kwon-Young Choi 2020-11-09 08:01:06 UTC
ok, I don't really know what to do to understand what is not working on my system.

Does plasma-disks has log somewhere I could get ?
Maybe it will tell us what is going wrong ?
Comment 3 Justin Zobel 2020-11-09 08:03:54 UTC
Does smartctl show any info on it?

sudo smartctl --all /dev/nvme0n1
Comment 4 Kwon-Young Choi 2020-11-09 08:13:15 UTC
(In reply to Justin Zobel from comment #3)
> Does smartctl show any info on it?
> 
> sudo smartctl --all /dev/nvme0n1

$ sudo smartctl --all /dev/nvme0n1
[sudo] Mot de passe de kwon-young : 
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.9.3-1-MANJARO] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       PC300 NVMe SK hynix 256GB
Serial Number:                      EJ77N46141180792O
Firmware Version:                   20005A00
PCI Vendor/Subsystem ID:            0x1c5c
IEEE OUI Identifier:                0xace42e
Controller ID:                      0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            ace42e 719006ffbc
Local Time is:                      Mon Nov  9 09:04:24 2020 CET
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
Optional NVM Commands (0x001e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     88 Celsius
Critical Comp. Temp. Threshold:     90 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.87W       -        -    0  0  0  0        5       5
 1 +     2.40W       -        -    1  1  1  1       30      30
 2 +     1.90W       -        -    2  2  2  2      100     100
 3 -   0.1000W       -        -    3  3  3  3     1000    1000
 4 -   0.0060W       -        -    3  3  3  3     1000    5000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        25 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    2%
Data Units Read:                    5,174,435 [2.64 TB]
Data Units Written:                 20,997,211 [10.7 TB]
Host Read Commands:                 66,438,160
Host Write Commands:                279,794,847
Controller Busy Time:               1,007
Power Cycles:                       1,822
Power On Hours:                     157
Unsafe Shutdowns:                   112
Media and Data Integrity Errors:    0
Error Information Log Entries:      218,352
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               25 Celsius

# stops for a long time here

Read Error Information Log failed: NVME_IOCTL_ADMIN_CMD: Appel système interrompu

Something seems to be wrong at the end.
Do you think it is a problem with smartctl ?
Comment 5 Justin Zobel 2020-11-09 08:16:31 UTC
Looks like Appel système interrompu means System call interrupted.

If it were me I'd check the NVMe diagnostics (if they exist) from your EFI/BIOS.
Comment 6 Kwon-Young Choi 2020-11-09 08:18:34 UTC
(In reply to Justin Zobel from comment #5)
> Looks like Appel système interrompu means System call interrupted.
> 
> If it were me I'd check the NVMe diagnostics (if they exist) from your
> EFI/BIOS.

Ok, I'll try to do that and report back
Comment 7 Kwon-Young Choi 2020-11-09 09:54:59 UTC
I have just run the dell diagnostics on my laptop and the smart, dst short and long test all passed without any problems.
Comment 8 Harald Sitter 2020-11-09 11:55:12 UTC
You'll want to take this upstream https://www.smartmontools.org/#BugReports this is either a bug in smartmontools or, I guess more likely, the drive firmware itself as it fails to respond (in time).

If you could attach a file with the output of `sudo smartctl --all --json /dev/nvme0n1` that'd be helpful though. Depending on its contents we could choose to ignore the return value of smartctl I suppose.
Comment 9 Kwon-Young Choi 2020-11-10 09:34:01 UTC
(In reply to Harald Sitter from comment #8)
> You'll want to take this upstream https://www.smartmontools.org/#BugReports
> this is either a bug in smartmontools or, I guess more likely, the drive
> firmware itself as it fails to respond (in time).
> 
> If you could attach a file with the output of `sudo smartctl --all --json
> /dev/nvme0n1` that'd be helpful though. Depending on its contents we could
> choose to ignore the return value of smartctl I suppose.

I will try to send something upstream.

Here is the output of `sudo smartctl --all --json /dev/nvme0n1`:

{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      1
    ],
    "svn_revision": "5022",
    "platform_info": "x86_64-linux-5.9.3-1-MANJARO",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "--all",
      "--json",
      "/dev/nvme0n1"
    ],
    "messages": [
      {
        "string": "Read Error Information Log failed: NVME_IOCTL_ADMIN_CMD: Appel système interrompu",
        "severity": "error"
      }
    ],
    "exit_status": 4
  },
  "device": {
    "name": "/dev/nvme0n1",
    "info_name": "/dev/nvme0n1",
    "type": "nvme",
    "protocol": "NVMe"
  },
  "model_name": "PC300 NVMe SK hynix 256GB",
  "serial_number": "EJ77N46141180792O",
  "firmware_version": "20005A00",
  "nvme_pci_vendor": {
    "id": 7260,
    "subsystem_id": 7260
  },
  "nvme_ieee_oui_identifier": 11330606,
  "nvme_controller_id": 0,
  "nvme_number_of_namespaces": 1,
  "nvme_namespaces": [
    {
      "id": 1,
      "size": {
        "blocks": 500118192,
        "bytes": 256060514304
      },
      "capacity": {
        "blocks": 500118192,
        "bytes": 256060514304
      },
      "utilization": {
        "blocks": 500118192,
        "bytes": 256060514304
      },
      "formatted_lba_size": 512,
      "eui64": {
        "oui": 11330606,
        "ext_id": 487747682236
      }
    }
  ],
  "user_capacity": {
    "blocks": 500118192,
    "bytes": 256060514304
  },
  "logical_block_size": 512,
  "local_time": {
    "time_t": 1605000494,
    "asctime": "Tue Nov 10 10:28:14 2020 CET"
  },
  "smart_status": {
    "passed": true,
    "nvme": {
      "value": 0
    }
  },
  "nvme_smart_health_information_log": {
    "critical_warning": 0,
    "temperature": 30,
    "available_spare": 100,
    "available_spare_threshold": 10,
    "percentage_used": 2,
    "data_units_read": 5188835,
    "data_units_written": 21146267,
    "host_reads": 66725315,
    "host_writes": 282306897,
    "controller_busy_time": 1018,
    "power_cycles": 1826,
    "power_on_hours": 158,
    "unsafe_shutdowns": 112,
    "media_errors": 0,
    "num_err_log_entries": 218352,
    "warning_temp_time": 0,
    "critical_comp_time": 0,
    "temperature_sensors": [
      30
    ]
  },
  "temperature": {
    "current": 30
  },
  "power_cycle_count": 1826,
  "power_on_time": {
    "hours": 158
  }
}
Comment 10 Harald Sitter 2020-11-11 10:49:18 UTC
Thanks. There's something weird though...

> "exit_status": 4

shouldn't be filtered right now, so there must be something else going on additionally.

It'd be great if you could get me the output of

`QT_LOGGING_RULES="org.kde.plasma.disks=true" kded5 --replace`
Comment 11 Kwon-Young Choi 2020-11-13 13:03:42 UTC
(In reply to Harald Sitter from comment #10)
> Thanks. There's something weird though...
> 
> > "exit_status": 4
> 
> shouldn't be filtered right now, so there must be something else going on
> additionally.
> 
> It'd be great if you could get me the output of
> 
> `QT_LOGGING_RULES="org.kde.plasma.disks=true" kded5 --replace`

Sorry for the delay.

Here's the output for:

```
$ `QT_LOGGING_RULES="org.kde.plasma.disks=true" kded5 --replace`
kf.kded: Couldn't register name "org.kde.kcookiejar5" with DBUS - another process owns it already!
org.kde.plasma.disks: starting
org.kde.plasma.disks: !!!!  "/org/freedesktop/UDisks2/block_devices/nvme0n1"
org.kde.plasma.disks: evaluating!
org.kde.plasma.disks: true true "org.kde.kded.smart" KAuth::Action::AuthorizedStatus
org.kde.plasma.disks: !!!!  "/org/freedesktop/UDisks2/block_devices/nvme0n1p1"
org.kde.plasma.disks:    bad type Solid::StorageVolume::FileSystem
org.kde.wacomtablet.common: d->name.isEmpty? true
org.kde.wacomtablet.common: d->name.isEmpty? true
bluedevil: Created
Installing the delayed initialization callback.
kcm_touchpad: Using X11 backend
Delayed initialization.
Reloading the khotkeys configuration
Version 2 File!
true
1
Known activities: ("380a83a6-47ae-4f65-87be-08be5d57e561", "4ca8abe1-deed-47b4-9b4b-627cfad25fd9")
Registering "org.kde.StatusNotifierItem-1224-1/StatusNotifierItem" to system tray
Registering "org.kde.StatusNotifierHost-1059" as system tray
bluedevil: Bluetooth operational changed true
bluedevil: ObexManager operational changed true
Registering ":1.167/StatusNotifierItem" to system tray
Registering ":1.42/StatusNotifierItem" to system tray
Registering ":1.48/StatusNotifierItem" to system tray
bluedevil: ObexAgent registered
bluedevil: Agent registered
bluedevil: Requested default Agent
kf.auth: "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
org.kde.plasma.disks: looks like we got no data back for "/dev/nvme0n1" 0 true
Warning from helper: QProcess: Destroyed while process ("smartctl") is still running.
```

So, after contacting the smartmontools team, it turns out that this is a firmware bug of my ssd but fortunately, it is easy to modify smartctl to hide this bug. See the mail thread here: https://listi.jpberlin.de/pipermail/smartmontools-support/2020-November/000544.html

Since we now know that this is an upstream bug, I believe we can close this bug report.
Comment 12 Harald Sitter 2020-11-13 13:42:46 UTC
Kind of. The data we care about is available so we could still show the device, the output you posted suggests our timeout is just too short (well, it's 30 seconds - short is relative here ^^)
Comment 13 Kwon-Young Choi 2020-11-13 13:44:26 UTC
(In reply to Harald Sitter from comment #12)
> Kind of. The data we care about is available so we could still show the
> device, the output you posted suggests our timeout is just too short (well,
> it's 30 seconds - short is relative here ^^)

Yeah, you would need a timeout longer than 120s I think.
Comment 14 Bug Janitor Service 2020-11-13 14:19:45 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/plasma-disks/-/merge_requests/9
Comment 15 Harald Sitter 2020-11-13 14:25:39 UTC
(In reply to Kwon-Young Choi from comment #13)
> (In reply to Harald Sitter from comment #12)
> > Kind of. The data we care about is available so we could still show the
> > device, the output you posted suggests our timeout is just too short (well,
> > it's 30 seconds - short is relative here ^^)
> 
> Yeah, you would need a timeout longer than 120s I think.

How do you figure 120? Looking at the code the timeouts are 60, or does it in fact take 120 for the smartctl to return when you run it manually?
Comment 16 Kwon-Young Choi 2020-11-13 14:27:02 UTC
(In reply to Harald Sitter from comment #15)
> (In reply to Kwon-Young Choi from comment #13)
> > (In reply to Harald Sitter from comment #12)
> > > Kind of. The data we care about is available so we could still show the
> > > device, the output you posted suggests our timeout is just too short (well,
> > > it's 30 seconds - short is relative here ^^)
> > 
> > Yeah, you would need a timeout longer than 120s I think.
> 
> How do you figure 120? Looking at the code the timeouts are 60, or does it
> in fact take 120 for the smartctl to return when you run it manually?

yes, see https://listi.jpberlin.de/pipermail/smartmontools-support/2020-November/000538.html
Comment 17 Harald Sitter 2020-12-16 13:29:46 UTC
Git commit 9ba7806f5a5d99f53d9b4adbcf698bcc055521b3 by Harald Sitter.
Committed on 16/12/2020 at 13:28.
Pushed by sitter into branch 'master'.

bump smartctl timeout in helper

bad firmwares may cause ioctl calls to time out which takes notably
longer than the standard 30 second timeout of qprocess. instead opt for
a 65 second timeout which should be sufficient to allow for internal
smartctl timeouts.

the only downside to a longer timeout is that potentially malfunctioning
firmwares will slow down the refresh checks of other devices from
finishing as well (since queries are sequential because kauth itself
isn't re-entrant). on the plus side we'll then cover devices so long as
they have data
FIXED-IN: 5.20.4

M  +7    -1    src/helper.cpp

https://invent.kde.org/plasma/plasma-disks/commit/9ba7806f5a5d99f53d9b4adbcf698bcc055521b3
Comment 18 Harald Sitter 2020-12-16 13:30:13 UTC
Git commit 3930245874ead2dead32b04b2af9442bf2f04141 by Harald Sitter.
Committed on 16/12/2020 at 13:29.
Pushed by sitter into branch 'Plasma/5.20'.

bump smartctl timeout in helper

bad firmwares may cause ioctl calls to time out which takes notably
longer than the standard 30 second timeout of qprocess. instead opt for
a 65 second timeout which should be sufficient to allow for internal
smartctl timeouts.

the only downside to a longer timeout is that potentially malfunctioning
firmwares will slow down the refresh checks of other devices from
finishing as well (since queries are sequential because kauth itself
isn't re-entrant). on the plus side we'll then cover devices so long as
they have data
FIXED-IN: 5.20.4

M  +7    -1    src/helper.cpp

https://invent.kde.org/plasma/plasma-disks/commit/3930245874ead2dead32b04b2af9442bf2f04141