Bug 396312 - [Kauth] "Invalid cryptographic signature" error when using qca with openssl backend
Summary: [Kauth] "Invalid cryptographic signature" error when using qca with openssl b...
Status: RESOLVED FIXED
Alias: None
Product: partitionmanager
Classification: Applications
Component: general (other bugs)
Version First Reported In: Git
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Andrius Štikonas
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-08 12:29 UTC by Mattia
Modified: 2018-07-22 11:23 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
crash backtrace (9.97 KB, text/plain)
2018-07-08 12:29 UTC, Mattia
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mattia 2018-07-08 12:29:05 UTC
Created attachment 113834 [details]
crash backtrace

Running kauth git version of partitionmanager in Fedora 29 by using self compiled RPMs available in https://copr.fedorainfracloud.org/coprs/mattia/Testing/builds/ shows "Invalid cryptographic signature" error s in console while switching backend.

I've also noticed that switching from sfdisk to dummy backend and then switching back to sfdsik, the devices list is not correctly refreshed.
In a default installation, I have /dev/vda and /dev/fedora which is a partition of /dev/vda showed at startup in the devices window.
After switching to dummy and back to sfdisk, most of the times I get a crappy device list with only /dev/vda or only /dev/fedora with a crappy partition table list which seems a mix between the partition table of the two devices...

Only once I could get a crash backtrace while switching backends, which I report in the attached file.
Comment 1 Andrius Štikonas 2018-07-08 13:48:53 UTC
Helper shows "Invalid cryptographic signature" error when for some reason RSA verification of command execution request fails. In that case helper just ignores that request. It could be that devices are poorly scanned because some commands are now skipped. Scanning works quite reliably on my system (maybe not 100% reliably, but at least 99% of times devices are detected).
I've never seen this happening on my Gentoo system but I can reproduce it on Fedora.

If you find a way to reproduce crash, could you try to get debug symbols?
Comment 2 Andrius Štikonas 2018-07-08 17:24:22 UTC
I can reproduce this with qca openssl backend. Since I was using botan on my system, I didn't see this.

Now the questions is whose fault is this when using qca-ossl...
Comment 3 Mattia 2018-07-15 09:32:08 UTC
I can't reproduce the crash, it only happened once, so maybe it was a coincidence.

I've updated Fedora RPMs on COPR with today snapshot of kauth branch. The error is still showed, but I've noticed that I get it only switching backend: if I refresh the device list with F5, rescanning the partition table doesn't trigger that warning in console.
Very rarely I can see a QT warning like this:
qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 12247, resource id: 35651797, major code: 40 (TranslateCoords), minor code: 0

Unfortunately, I also noticed that in device list I get only the LVM device, the real disk (/dev/vda) is not showed anymore.
Comment 4 Andrius Štikonas 2018-07-15 10:58:14 UTC
(In reply to Mattia from comment #3)
> I can't reproduce the crash, it only happened once, so maybe it was a
> coincidence.
> 
> I've updated Fedora RPMs on COPR with today snapshot of kauth branch. The
> error is still showed, but I've noticed that I get it only switching
> backend: if I refresh the device list with F5, rescanning the partition
> table doesn't trigger that warning in console.
> Very rarely I can see a QT warning like this:
> qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 12247,
> resource id: 35651797, major code: 40 (TranslateCoords), minor code: 0
> 
> Unfortunately, I also noticed that in device list I get only the LVM device,
> the real disk (/dev/vda) is not showed anymore.

Hi, yes, I noticed that too that it only happens when changing backend and only with QCA ossl backend. This left me quite confused and stuck on this issue, there shouldn't be anything different when changing backend.

Can you check if you can see /dev/vda device in
lsblk --nodeps --paths --sort name --json --output type,name
and that it has "type": "disk"

There were some RAID changes recently but they shouldn't have broken /dev/vda support...
Comment 5 Andrius Štikonas 2018-07-15 10:59:49 UTC
(In reply to Mattia from comment #3)
> Very rarely I can see a QT warning like this:
> qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 12247,
> resource id: 35651797, major code: 40 (TranslateCoords), minor code: 0

Hmm, maybe I'm using qpa incorrectly. I haven't seen this error, but it probably can't appear on my system as I'm not using xcb (plasma wayland here).
Comment 6 Andrius Štikonas 2018-07-15 14:23:18 UTC
(In reply to Andrius Štikonas from comment #4)
> 
> There were some RAID changes recently but they shouldn't have broken
> /dev/vda support...

Hmm, yes, I think I found RAID change that broke this.

I think my GSoC student that if defice model name is empty it would be RAID. Which is actually false even for one of my USB sticks which is also missing.
I guess that virtualized /dev/vda also doesn't have model name.
Comment 7 Mattia 2018-07-15 14:32:39 UTC
(In reply to Andrius Štikonas from comment #4)
> 
> Can you check if you can see /dev/vda device in
> lsblk --nodeps --paths --sort name --json --output type,name
> and that it has "type": "disk"
> 
> There were some RAID changes recently but they shouldn't have broken
> /dev/vda support...

Just tried and yes, /dev/vda is showed:
# lsblk --nodeps --paths --sort name --json --output type,name
{
   "blockdevices": [
      {"type": "disk", "name": "/dev/vda"}
   ]
}

Also note that NTFS partitions are not recognized correctly in kauth branch (they are showed as "unknown" type).
Comment 8 Mattia 2018-07-15 14:34:26 UTC
(In reply to Andrius Štikonas from comment #6)
> 
> I think my GSoC student that if defice model name is empty it would be RAID.
> Which is actually false even for one of my USB sticks which is also missing.
> I guess that virtualized /dev/vda also doesn't have model name.

# lsblk --nodeps --paths --sort name --json --output type,name,model
{
   "blockdevices": [
      {"type": "disk", "name": "/dev/vda", "model": null}
   ]
}
Comment 9 Andrius Štikonas 2018-07-15 14:38:34 UTC
(In reply to Mattia from comment #8)
> # lsblk --nodeps --paths --sort name --json --output type,name,model
> {
>    "blockdevices": [
>       {"type": "disk", "name": "/dev/vda", "model": null}
>    ]
> }

Yes, so just as I thought in src/plugins/sfdisk/sfdiskbackend.cpp scanDevice function
!modelCommand.output().trimmed().isEmpty() check fails.

Well, I told my GSoC student about this. This should definitely be fixed. You can remove that check if you want to test today.

By the way, yesterday I've implemented stopping kpmcore kauth helper in case of main application crash, so now it should be possible to easily restart KPM without manually killing kpmcore_externalcommand helper.
Comment 10 Andrius Štikonas 2018-07-15 15:51:31 UTC
model : null issue is fixed.
(Altough it probably should have been reported as a separate bug...)
Comment 11 Andrius Štikonas 2018-07-17 19:05:58 UTC
(In reply to Mattia from comment #3)
> I can't reproduce the crash, it only happened once, so maybe it was a
> coincidence.

Possibly some command failed due to Invalid cryptographic signature and maybe some nullptr dereference happened.
Comment 12 Andrius Štikonas 2018-07-17 19:13:33 UTC
It could be that "Invalid cryptographic signature" appears due to some race condition.
Comment 13 Andrius Štikonas 2018-07-17 19:24:47 UTC
(In reply to Andrius Štikonas from comment #12)
> It could be that "Invalid cryptographic signature" appears due to some race
> condition.

I think if I disable message counter (counter in app and m_Counter in the helper) then I don't see any errors.

Although, without any fix, removing counter would allow message replay attacks.
Comment 14 Andrius Štikonas 2018-07-20 21:56:27 UTC
Somehow there are two threads trying to run commands when you change backends, I thought there is just one.

On the other hand, when rescanning with F5, there is just one thread. Not sure why things worked fine when I used botan.
Comment 15 Andrius Štikonas 2018-07-21 10:05:15 UTC
Git commit 938ec7fa8b6084586dd8a006da36c46bff1508ce by Andrius Štikonas.
Committed on 21/07/2018 at 10:03.
Pushed by stikonas into branch 'kauth'.

Make ExternalCommandHelper::getNonce() reentrant.

Store previously generated values of nonce, and remove them from
the container when they are used.

M  +11   -7    src/util/externalcommand.cpp
M  +3    -3    src/util/externalcommand.h
M  +25   -10   src/util/externalcommandhelper.cpp
M  +7    -5    src/util/externalcommandhelper.h

https://commits.kde.org/kpmcore/938ec7fa8b6084586dd8a006da36c46bff1508ce
Comment 16 Andrius Štikonas 2018-07-21 10:08:08 UTC
Mattia, can you test this?

Crash(es) might still be there is some particular command fails, it should be good to fix those too. Maybe another bug should be opened for that. But right now command shouldn't be failing in normal usage, so it would be close to impossible to trigger that crash (which was hard even with this bug).
Comment 17 Mattia 2018-07-22 11:23:49 UTC
Seems to work well, I don't see warnings or crashes anymore.
Thanks