installimage sometimes confuses one disk for another #14

Closed
opened 2019-04-18 12:13:09 +02:00 by fillest · 3 comments
fillest commented 2019-04-18 12:13:09 +02:00 (Migrated from github.com)

Hello.
PX62-NVMe server has two disks: nvme0n1 and nvme1n1. I need them separately - not in a raid. I also need the first one as some partitions and the second one as a raw device.
I run installimage -a -r no -p /boot:ext3:512M,/:ext4:20G,/kafka:ext4:all -d nvme0n1 -f no -s en -i /root/.oldroot/nfs/install/../images/Ubuntu-1804-bionic-64-minimal.tar.gz
Sometimes it works as expected and sometimes (quite often apparently) installimage thinks that nvme0n1 is nvme1n1:

# cat /etc/fstab
proc /proc proc defaults 0 0
# /dev/nvme0n1p1 during Installation (RescueSystem)
UUID=ab20d81c-c59b-4ec6-8399-1b79c750f429 /boot ext3 defaults 0 0
# /dev/nvme0n1p2 during Installation (RescueSystem)
UUID=de1eab9c-7dea-44c6-bc77-5ce34d436755 / ext4 defaults 0 0
# /dev/nvme0n1p3 during Installation (RescueSystem)
UUID=c6278d98-3c85-4f04-bb15-86b4ee5cef7e /kafka ext4 defaults 0 0

# ls -lha /dev/disk/by-uuid
lrwxrwxrwx 1 root root  15 Apr 18 09:47 ab20d81c-c59b-4ec6-8399-1b79c750f429 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root  15 Apr 18 09:47 c6278d98-3c85-4f04-bb15-86b4ee5cef7e -> ../../nvme1n1p3
lrwxrwxrwx 1 root root  15 Apr 18 09:47 de1eab9c-7dea-44c6-bc77-5ce34d436755 -> ../../nvme1n1p2

Looks like installimage is relying somewhere on some strict order, which actually can be random

Hello. PX62-NVMe server has two disks: nvme0n1 and nvme1n1. I need them separately - not in a raid. I also need the first one as some partitions and the second one as a raw device. I run `installimage -a -r no -p /boot:ext3:512M,/:ext4:20G,/kafka:ext4:all -d nvme0n1 -f no -s en -i /root/.oldroot/nfs/install/../images/Ubuntu-1804-bionic-64-minimal.tar.gz` Sometimes it works as expected and sometimes (quite often apparently) `installimage` thinks that nvme0n1 is nvme1n1: ``` # cat /etc/fstab proc /proc proc defaults 0 0 # /dev/nvme0n1p1 during Installation (RescueSystem) UUID=ab20d81c-c59b-4ec6-8399-1b79c750f429 /boot ext3 defaults 0 0 # /dev/nvme0n1p2 during Installation (RescueSystem) UUID=de1eab9c-7dea-44c6-bc77-5ce34d436755 / ext4 defaults 0 0 # /dev/nvme0n1p3 during Installation (RescueSystem) UUID=c6278d98-3c85-4f04-bb15-86b4ee5cef7e /kafka ext4 defaults 0 0 # ls -lha /dev/disk/by-uuid lrwxrwxrwx 1 root root 15 Apr 18 09:47 ab20d81c-c59b-4ec6-8399-1b79c750f429 -> ../../nvme1n1p1 lrwxrwxrwx 1 root root 15 Apr 18 09:47 c6278d98-3c85-4f04-bb15-86b4ee5cef7e -> ../../nvme1n1p3 lrwxrwxrwx 1 root root 15 Apr 18 09:47 de1eab9c-7dea-44c6-bc77-5ce34d436755 -> ../../nvme1n1p2 ``` Looks like installimage is relying somewhere on some strict order, which actually can be random
fillest commented 2019-04-18 18:52:59 +02:00 (Migrated from github.com)

Probably the random mapping comes from the rescue system. I've just started two hosts in the rescue mode and they show different mappings:

# lsblk   #on the first
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
nvme1n1 259:0    0 953,9G  0 disk 
nvme0n1 259:1    0 953,9G  0 disk

# lsblk  #on the second
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0     7:0    0     3G  1 loop 
nvme0n1 259:0    0 953,9G  0 disk 
nvme1n1 259:1    0 953,9G  0 disk 
Probably the random mapping comes from the rescue system. I've just started two hosts in the rescue mode and they show different mappings: ``` # lsblk #on the first NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme1n1 259:0 0 953,9G 0 disk nvme0n1 259:1 0 953,9G 0 disk # lsblk #on the second NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 3G 1 loop nvme0n1 259:0 0 953,9G 0 disk nvme1n1 259:1 0 953,9G 0 disk ```
joaooliveirap commented 2019-04-25 10:44:55 +02:00 (Migrated from github.com)

I'm having the same problem.
Also want to install both disks without RAID (0) and the server uses one disk or another randomly. Meaning, after install and a reboot, the server can boot with nvme0e1 or nvme1e1 (there is no way to mark a disk as primary). I tried to use the installimage conf with one disk only and the same happens.
I was using EX-62-nvme and it happened with the 5 servers I bought.

Example:

root@hostname ~ # lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 953.9G 0 disk
├─nvme0n1p3 259:4 0 948.9G 0 part
│ └─vg0-root 253:0 0 948.9G 0 lvm /
├─nvme0n1p1 259:2 0 1G 0 part /boot
└─nvme0n1p2 259:3 0 4G 0 part [SWAP]
nvme1n1 259:1 0 953.9G 0 disk

root@hostname ~ #
Broadcast message from root@hostname (Wed 2019-04-24 17:23:53 CEST):

Ansible-triggered Reboot
The system is going down for reboot at Wed 2019-04-24 17:24:53 CEST!

Broadcast message from root@hostname(Wed 2019-04-24 17:24:53 CEST):

Ansible-triggered Reboot
The system is going down for reboot NOW!

Connection to ****** closed by remote host.
Connection to ****** closed.
jpereira$ ssh root@****** -i rsa_key
Warning: Permanently added '******' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 16.04.6 LTS (GNU/Linux 4.15.0-48-generic x86_64)

Last login: Wed Apr 24 17:20:56 2019 from ******
root@hostname ~ # lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:1 0 953.9G 0 disk
nvme1n1 259:0 0 953.9G 0 disk
├─nvme1n1p2 259:3 0 4G 0 part [SWAP]
├─nvme1n1p3 259:4 0 948.9G 0 part
│ └─vg0-root 253:0 0 948.9G 0 lvm /
└─nvme1n1p1 259:2 0 1G 0 part /boot

My installimage file:

DRIVE1 /dev/nvme0n1
DRIVE2 /dev/nvme1n1
SWRAID 0
BOOTLOADER grub
HOSTNAME hostname
PART /boot ext3 1G
PART swap swap 4G
PART lvm vg0 all
LV vg0 root / ext4 all
IMAGE /root/.oldroot/nfs/install/../images/Ubuntu-1604-xenial-64-minimal.tar.gz

also tried:

DRIVE1 /dev/nvme0n1
SWRAID 0
BOOTLOADER grub
HOSTNAME hostname
PART /boot ext3 1G
PART swap swap 4G
PART lvm vg0 all
LV vg0 root / ext4 all
IMAGE /root/.oldroot/nfs/install/../images/Ubuntu-1604-xenial-64-minimal.tar.gz

I'm having the same problem. Also want to install both disks without RAID (0) and the server uses one disk or another randomly. Meaning, after install and a reboot, the server can boot with nvme0e1 or nvme1e1 (there is no way to mark a disk as primary). I tried to use the installimage conf with one disk only and the same happens. I was using EX-62-nvme and it happened with the 5 servers I bought. Example: root@hostname ~ # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 953.9G 0 disk ├─nvme0n1p3 259:4 0 948.9G 0 part │ └─vg0-root 253:0 0 948.9G 0 lvm / ├─nvme0n1p1 259:2 0 1G 0 part /boot └─nvme0n1p2 259:3 0 4G 0 part [SWAP] nvme1n1 259:1 0 953.9G 0 disk root@hostname ~ # Broadcast message from root@hostname (Wed 2019-04-24 17:23:53 CEST): Ansible-triggered Reboot The system is going down for reboot at Wed 2019-04-24 17:24:53 CEST! Broadcast message from root@hostname(Wed 2019-04-24 17:24:53 CEST): Ansible-triggered Reboot The system is going down for reboot NOW! Connection to ****** closed by remote host. Connection to ****** closed. jpereira$ ssh root@****** -i rsa_key Warning: Permanently added '******' (ECDSA) to the list of known hosts. Welcome to Ubuntu 16.04.6 LTS (GNU/Linux 4.15.0-48-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage Last login: Wed Apr 24 17:20:56 2019 from ****** root@hostname ~ # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:1 0 953.9G 0 disk nvme1n1 259:0 0 953.9G 0 disk ├─nvme1n1p2 259:3 0 4G 0 part [SWAP] ├─nvme1n1p3 259:4 0 948.9G 0 part │ └─vg0-root 253:0 0 948.9G 0 lvm / └─nvme1n1p1 259:2 0 1G 0 part /boot My installimage file: DRIVE1 /dev/nvme0n1 DRIVE2 /dev/nvme1n1 SWRAID 0 BOOTLOADER grub HOSTNAME hostname PART /boot ext3 1G PART swap swap 4G PART lvm vg0 all LV vg0 root / ext4 all IMAGE /root/.oldroot/nfs/install/../images/Ubuntu-1604-xenial-64-minimal.tar.gz also tried: DRIVE1 /dev/nvme0n1 SWRAID 0 BOOTLOADER grub HOSTNAME hostname PART /boot ext3 1G PART swap swap 4G PART lvm vg0 all LV vg0 root / ext4 all IMAGE /root/.oldroot/nfs/install/../images/Ubuntu-1604-xenial-64-minimal.tar.gz
asciiprod commented 2019-09-16 10:06:41 +02:00 (Migrated from github.com)

The kernel does not guarantee the order in which devices are detected. This is not an installimage issue. However, installimage will always use UUIDs for that reason. If UUIDs are also used to reference the second disk, the order should not be an issue. Nevertheless, converting at least /boot to a RAID1 and installing grub on both drives, should provide a bootable system no matter in which order the disks are detected.

The kernel does not guarantee the order in which devices are detected. This is not an installimage issue. However, installimage will always use UUIDs for that reason. If UUIDs are also used to reference the second disk, the order should not be an issue. Nevertheless, converting at least /boot to a RAID1 and installing grub on both drives, should provide a bootable system no matter in which order the disks are detected.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
DiamantTh/installimage#14
No description provided.