Created attachment 113834 [details] crash backtrace Running kauth git version of partitionmanager in Fedora 29 by using self compiled RPMs available in https://copr.fedorainfracloud.org/coprs/mattia/Testing/builds/ shows "Invalid cryptographic signature" error s in console while switching backend. I've also noticed that switching from sfdisk to dummy backend and then switching back to sfdsik, the devices list is not correctly refreshed. In a default installation, I have /dev/vda and /dev/fedora which is a partition of /dev/vda showed at startup in the devices window. After switching to dummy and back to sfdisk, most of the times I get a crappy device list with only /dev/vda or only /dev/fedora with a crappy partition table list which seems a mix between the partition table of the two devices... Only once I could get a crash backtrace while switching backends, which I report in the attached file.
Helper shows "Invalid cryptographic signature" error when for some reason RSA verification of command execution request fails. In that case helper just ignores that request. It could be that devices are poorly scanned because some commands are now skipped. Scanning works quite reliably on my system (maybe not 100% reliably, but at least 99% of times devices are detected). I've never seen this happening on my Gentoo system but I can reproduce it on Fedora. If you find a way to reproduce crash, could you try to get debug symbols?
I can reproduce this with qca openssl backend. Since I was using botan on my system, I didn't see this. Now the questions is whose fault is this when using qca-ossl...
I can't reproduce the crash, it only happened once, so maybe it was a coincidence. I've updated Fedora RPMs on COPR with today snapshot of kauth branch. The error is still showed, but I've noticed that I get it only switching backend: if I refresh the device list with F5, rescanning the partition table doesn't trigger that warning in console. Very rarely I can see a QT warning like this: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 12247, resource id: 35651797, major code: 40 (TranslateCoords), minor code: 0 Unfortunately, I also noticed that in device list I get only the LVM device, the real disk (/dev/vda) is not showed anymore.
(In reply to Mattia from comment #3) > I can't reproduce the crash, it only happened once, so maybe it was a > coincidence. > > I've updated Fedora RPMs on COPR with today snapshot of kauth branch. The > error is still showed, but I've noticed that I get it only switching > backend: if I refresh the device list with F5, rescanning the partition > table doesn't trigger that warning in console. > Very rarely I can see a QT warning like this: > qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 12247, > resource id: 35651797, major code: 40 (TranslateCoords), minor code: 0 > > Unfortunately, I also noticed that in device list I get only the LVM device, > the real disk (/dev/vda) is not showed anymore. Hi, yes, I noticed that too that it only happens when changing backend and only with QCA ossl backend. This left me quite confused and stuck on this issue, there shouldn't be anything different when changing backend. Can you check if you can see /dev/vda device in lsblk --nodeps --paths --sort name --json --output type,name and that it has "type": "disk" There were some RAID changes recently but they shouldn't have broken /dev/vda support...
(In reply to Mattia from comment #3) > Very rarely I can see a QT warning like this: > qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 12247, > resource id: 35651797, major code: 40 (TranslateCoords), minor code: 0 Hmm, maybe I'm using qpa incorrectly. I haven't seen this error, but it probably can't appear on my system as I'm not using xcb (plasma wayland here).
(In reply to Andrius Štikonas from comment #4) > > There were some RAID changes recently but they shouldn't have broken > /dev/vda support... Hmm, yes, I think I found RAID change that broke this. I think my GSoC student that if defice model name is empty it would be RAID. Which is actually false even for one of my USB sticks which is also missing. I guess that virtualized /dev/vda also doesn't have model name.
(In reply to Andrius Štikonas from comment #4) > > Can you check if you can see /dev/vda device in > lsblk --nodeps --paths --sort name --json --output type,name > and that it has "type": "disk" > > There were some RAID changes recently but they shouldn't have broken > /dev/vda support... Just tried and yes, /dev/vda is showed: # lsblk --nodeps --paths --sort name --json --output type,name { "blockdevices": [ {"type": "disk", "name": "/dev/vda"} ] } Also note that NTFS partitions are not recognized correctly in kauth branch (they are showed as "unknown" type).
(In reply to Andrius Štikonas from comment #6) > > I think my GSoC student that if defice model name is empty it would be RAID. > Which is actually false even for one of my USB sticks which is also missing. > I guess that virtualized /dev/vda also doesn't have model name. # lsblk --nodeps --paths --sort name --json --output type,name,model { "blockdevices": [ {"type": "disk", "name": "/dev/vda", "model": null} ] }
(In reply to Mattia from comment #8) > # lsblk --nodeps --paths --sort name --json --output type,name,model > { > "blockdevices": [ > {"type": "disk", "name": "/dev/vda", "model": null} > ] > } Yes, so just as I thought in src/plugins/sfdisk/sfdiskbackend.cpp scanDevice function !modelCommand.output().trimmed().isEmpty() check fails. Well, I told my GSoC student about this. This should definitely be fixed. You can remove that check if you want to test today. By the way, yesterday I've implemented stopping kpmcore kauth helper in case of main application crash, so now it should be possible to easily restart KPM without manually killing kpmcore_externalcommand helper.
model : null issue is fixed. (Altough it probably should have been reported as a separate bug...)
(In reply to Mattia from comment #3) > I can't reproduce the crash, it only happened once, so maybe it was a > coincidence. Possibly some command failed due to Invalid cryptographic signature and maybe some nullptr dereference happened.
It could be that "Invalid cryptographic signature" appears due to some race condition.
(In reply to Andrius Štikonas from comment #12) > It could be that "Invalid cryptographic signature" appears due to some race > condition. I think if I disable message counter (counter in app and m_Counter in the helper) then I don't see any errors. Although, without any fix, removing counter would allow message replay attacks.
Somehow there are two threads trying to run commands when you change backends, I thought there is just one. On the other hand, when rescanning with F5, there is just one thread. Not sure why things worked fine when I used botan.
Git commit 938ec7fa8b6084586dd8a006da36c46bff1508ce by Andrius Štikonas. Committed on 21/07/2018 at 10:03. Pushed by stikonas into branch 'kauth'. Make ExternalCommandHelper::getNonce() reentrant. Store previously generated values of nonce, and remove them from the container when they are used. M +11 -7 src/util/externalcommand.cpp M +3 -3 src/util/externalcommand.h M +25 -10 src/util/externalcommandhelper.cpp M +7 -5 src/util/externalcommandhelper.h https://commits.kde.org/kpmcore/938ec7fa8b6084586dd8a006da36c46bff1508ce
Mattia, can you test this? Crash(es) might still be there is some particular command fails, it should be good to fix those too. Maybe another bug should be opened for that. But right now command shouldn't be failing in normal usage, so it would be close to impossible to trigger that crash (which was hard even with this bug).
Seems to work well, I don't see warnings or crashes anymore. Thanks