Bug 379205 - luks::mapperName sometimes picks incorrectly
Summary: luks::mapperName sometimes picks incorrectly
Status: RESOLVED FIXED
Alias: None
Product: partitionmanager
Classification: Applications
Component: general (show other bugs)
Version: 3.0
Platform: Other Linux
: NOR crash
Target Milestone: ---
Assignee: Andrius Štikonas
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-25 15:33 UTC by kaguru
Modified: 2017-04-27 15:31 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In: 3.1.0


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description kaguru 2017-04-25 15:33:53 UTC
I believe that luks encryption is broken for all NVMe devices, as of https://github.com/KDE/kpmcore/commit/3bfb0b4b5b6ee4504944433fe3208707f37df82e, so affecting all versions past 2.2.0

I'm using kpmcore via Calameres 3.1. When I attempt to create a luks-encrypted ext4 or btrfs partition, the process is consistently failing when it tries to format the inner filesystem.

> The Installer failed to create file system on partition /dev/nvme0n1p2
> Command: cryptsetup -s 512 --batch-mode ..force-password luksFormat /dev/nvme0n1p2
> Command: cryptsetup open /dev/nvme0n1p2 luks-............................
> Command: mkfs.ext4 -qf /dev/mapper/nvme0n1p2


I believe I've determined root cause:

Notice that the mkfs command is trying to use the name of the original partition, not the newly opened luks container. I believe that this is because of the way that mapperName() parses lsblk output. The volumes associated with the specified partition are listed, and the function assumes that the encrypted mapper volume will be the second in the list. The default sort for lsblk is based on device numbers, and unlike hd and sd devices, NVMes have a higher major device number (259) than the local device number associated with the logical volume (254). This means that when mapperName() trims the first value, it's throwing away the name we need, and then inappropriately setting /dev/mapper/{original_partition}, a path that doesn't even exist.

I believe that sorting the lsblk output on pkname should solve the problem for me and others (https://github.com/billwanjohi/kpmcore/commit/9abaa89bdd6584e9942e00bfef6f39feeb7b9e1a), but I haven't been able to confirm due to unfamiliarity with the build process.

Even better might be to ditch mapperName() altogether, and just reuse the suggestedMapperName() function that's used for the cryptsetup open step.
Comment 1 Andrius Štikonas 2017-04-25 15:48:49 UTC
suggestedMapperName can't be used. If luks volume was opened during the boot or manually with some other name then it wouldn't work.

We need to fix lsblk parsing then. I think I see from your comments what is happening but can you paste your lsblk output. And also  lsblk --raw --noheadings --output name /dev/nvme0n1p2
Comment 2 kaguru 2017-04-25 15:55:18 UTC
This is the output of these commands after a failed luks partition creation via Calamares:

$ lsblk

NAME                                          MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop0                                           7:0    0  50.3M  1 loop  /run/miso/sfs/livefs
loop1                                           7:1    0 225.3M  1 loop  /run/miso/sfs/mhwdfs
loop2                                           7:2    0 541.9M  1 loop  /run/miso/sfs/desktopfs
loop3                                           7:3    0 373.4M  1 loop  /run/miso/sfs/rootfs
sda                                             8:0    1  29.3G  0 disk  
├─sda1                                          8:1    1   1.3G  0 part  /run/miso/bootmnt
└─sda2                                          8:2    1    46M  0 part  
nvme0n1                                       259:0    0 238.5G  0 disk  
├─nvme0n1p6                                   259:1    0  81.9G  0 part  
│ └─luks-4953c5dd-99b6-4b9f-9ef4-75a16ed6207d 254:0    0  81.9G  0 crypt 
├─nvme0n1p1                                   259:7    0   500M  0 part  
├─nvme0n1p2                                   259:8    0   128M  0 part  
├─nvme0n1p3                                   259:10   0  47.4G  0 part  
├─nvme0n1p4                                   259:11   0    48G  0 part  
├─nvme0n1p5                                   259:12   0    48G  0 part  
├─nvme0n1p7                                   259:16   0   450M  0 part  
├─nvme0n1p8                                   259:17   0  11.1G  0 part  
└─nvme0n1p9                                   259:18   0     1G  0 part 

$ lsblk --raw --noheadings --output name /dev/nvme0n1p6

luks-4953c5dd-99b6-4b9f-9ef4-75a16ed6207d
nvme0n1p6

$ lsblk --sort pkname --noheadings --output name /dev/nvme0n1p6

nvme0n1p6
luks-4953c5dd-99b6-4b9f-9ef4-75a16ed6207d
Comment 3 Andrius Štikonas 2017-04-25 15:55:56 UTC
Hmm, sorting by pkname doesn't seem to work on my system :(. LVM devices that have pkname dm-# jump in front... Need some other solution.
Comment 4 kaguru 2017-04-25 16:02:47 UTC
We could output more details (lsblk -O) and awk for the parent…
Comment 5 Andrius Štikonas 2017-04-25 16:03:52 UTC
(In reply to kaguru from comment #4)
> We could output more details (lsblk -O) and awk for the parent…

Looks like somebody already found a solution here:
https://github.com/cmorlok/kpmcore/commit/2bebc0d523069a9d1e110dd43a24243f38c41754
Can you test with it?
Comment 6 kaguru 2017-04-25 17:59:33 UTC
Oh wow, I missed that completely. I can try, but I might need some assistance. My efforts so far to fix did not go very well.

I tried a number of things (individually, not all at once):

- I overrode lsblk with a function defined in rc.local, but it was ignored
- I made a new lsblk bash executable further up in the $PATH (/usr/bin/local)
- I renamed /usr/bin/lsblk and pointed my /usr/bin/local executable back at it
- I installed kpmcore-git via yaourt based on my branch, and compiled manjaro/calamares with defaults
- I compiled manjaro/calamares with a local checkout of kpmcore in the same git directory

Through it all, I wasn't able to change the behavior of the installer.

Are you at all familiar with AUR? If I installed kpmcore-git, and then partitionmanager-git (accepting all dependencies), would that take advantage of that new fix and be enough for me to attempt creation of a new luks-encrypted volume?
Comment 7 kaguru 2017-04-25 18:01:51 UTC
Ah, and I didn't see it because it's on a fork. So I'll have to change the remote that kpmcore-git is cloning from too (which I had already done in an attempt to test my patch).
Comment 8 Andrius Štikonas 2017-04-25 18:06:51 UTC
(In reply to kaguru from comment #7)
> Ah, and I didn't see it because it's on a fork. So I'll have to change the
> remote that kpmcore-git is cloning from too (which I had already done in an
> attempt to test my patch).

Well, I guess that patch looks fine. I'll just pull into main repo.

Yeah, I also didn't know about that fork. It is not nice when fixes are not sent upstream :(.

I'm not really familiar with AUR but if you replace your libkpmcore.so file in /usr (which kpmcore-git from AUR should do) then the fix will be in. But recent kpmcore has some ABI incompatibilities compared to 3.0.3, so partitionmanager/calamares also need to be recompiled.
Comment 9 Andrius Štikonas 2017-04-25 18:09:18 UTC
Git commit b10577e1c7ba0c856c96c96a80187b977f88ef34 by Andrius Štikonas, on behalf of Christian Morlok.
Committed on 25/04/2017 at 18:08.
Pushed by stikonas into branch 'master'.

Fix creation of encrypted volumes

We can't rely on the order of lsblk. Check for the type=crypt instead.

M  +4    -5    src/fs/luks.cpp

https://commits.kde.org/kpmcore/b10577e1c7ba0c856c96c96a80187b977f88ef34
Comment 10 kaguru 2017-04-27 15:17:16 UTC
I and others can confirm that this patch fixed the problem: https://github.com/calamares/calamares/issues/697

I could see a potential edge case where someone's creating a luks volume in a luks volume, in which case checking for the expected parent volume would be better, but for the non-hypothetical, things are in good shape.

Any chance we could get a patch version bump, in hopes that we can get it into package managers sooner?
Comment 11 Andrius Štikonas 2017-04-27 15:22:36 UTC
(In reply to kaguru from comment #10)
> I and others can confirm that this patch fixed the problem:
> https://github.com/calamares/calamares/issues/697
> 
> I could see a potential edge case where someone's creating a luks volume in
> a luks volume, in which case checking for the expected parent volume would
> be better, but for the non-hypothetical, things are in good shape.
> 
> Any chance we could get a patch version bump, in hopes that we can get it
> into package managers sooner?

Unlikely before June. I can try to do it then...

luks inside luks is not supported even now. A lot of other luks code was not designed with this in mind. In any case luks inside luks doesn't make sense. Well, luks->lvm->luks is supported though, so will have to test it later. Well, that's why I said I wouldn't have time for release soon.
Comment 12 Andrius Štikonas 2017-04-27 15:31:10 UTC
Well, we still check expected parent. After all we supply deviceNode to lsblk. So I think it would work.