But with the article I will show you the steps to perform online HDD swap in case any one of your disk drive is broken.
hpssacli rpm can be downloaded from HPE webpage so for the sake of this article I will assume you already have downloaded and installed one on your blade.
NOTE: hpssacli is renamed to ssacli recently due to the HPE name rebranding and split of the industry but since I had older version of hpssacli installed the commands would use 'hpssacli' but the same commands can be used with 'ssacli'
My setup:
- HP Proliant BL460c Gen9
- Two Internal Disks each 900 GB
- Hardware RAID 0 is configured with two Array (each having one disk)
- Software RAID 1 is configured on top of these arrays
Normally the HDD to logical drive mapping is like below
Array A -> Logical Drive 1 (/dev/sda) -> Bay 1
Array B -> Logical Drive 2 (/dev/sdb) -> Bay 2
But still it is good to validate the mapping before starting with the disk swap to make sure correct disk is replaced.
# hpssacli ctrl slot=0 show config detail | grep 'Array:|Logical Drive:|Bay:|Disk'
Array: A
Logical Drive: 1
Disk Name: /dev/sda Mount Points: None
Bay: 1
Array: B
Logical Drive: 2
Disk Name: /dev/sdb Mount Points: None
Bay: 2
Reverse Disk Maps
Array A -> Logical Drive 1 (/dev/sda) -> Bay 2
Array B -> Logical Drive 2 (/dev/sdb) -> Bay 1
Here the output would like below
# hpssacli ctrl slot=0 show config detail | grep 'Array:|Logical Drive:|Bay:|Disk'
Array: A
Logical Drive: 1
Disk Name: /dev/sda Mount Points: None
Bay: 2
Array: B
Logical Drive: 2
Disk Name: /dev/sdb Mount Points: None
Bay: 1
How to check if my disk is faulty?
There are multiple locations (logs) where enough evidence can be collected to get more details on the faulty disk.
On the iLO logs below message would be available
Right Disk:
Internal Storage Enclosure Device Failure (Bay 1, Box 1, Port 1I, Slot 0)
Left Disk:
Internal Storage Enclosure Device Failure (Bay 2, Box 1, Port 1I, Slot 0)
The Syslog of the OS should contain below messages (assuming hp-ams tool is installed as they report all the hardware relaed alarms)
Right Disk:
Aug 27 07:27:31 mylinux hp-ams[12332]: CRITICAL: Internal Storage Enclosure Device Failure (Bay 1, Box 1, Port 1I, Slot 0)
Left Disk:
Aug 27 21:36:29 mylinux hp-ams[12854]: CRITICAL: Internal Storage Enclosure Device Failure (Bay 2, Box 1, Port 1I, Slot 0)
One can also check the Logical Drive status using below command
Logical Drive 1 Failed Status
my-linux-box: # hpssacli ctrl slot=0 ld all show status
logicaldrive 1 (838.3 GB, 0): Failed
logicaldrive 2 (838.3 GB, 0): OK
Logical Drive 2 Failed Status
my-linux-box: # hpssacli ctrl slot=0 ld all show status
logicaldrive 1 (838.3 GB, 0): OK
logicaldrive 2 (838.3 GB, 0): Failed
Logical Drive 1 (/dev/sda) Replacement
Check the raid status
Next re validate the raid status
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid1 sda8[0](F) sdb8[1]
870112064 blocks super 1.0 [2/1] [_U]
bitmap: 3/7 pages [12KB], 65536KB chunk
md0 : active raid1 sda5[0](F) sdb5[1]
529600 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
md3 : active raid1 sda7[0](F) sdb7[1]
4200640 blocks super 1.0 [2/1] [_U]
bitmap: 0/1 pages [0KB], 65536KB chunk
md1 : active raid1 sda6[0](F) sdb6[1]
4200640 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
unused devices: <none>
Now remove the failed raid partition
my-linux-box:~ # mdadm /dev/md0 --remove /dev/sda5
mdadm: hot removed /dev/sda5 from /dev/md0
my-linux-box:~ # mdadm /dev/md1 --remove /dev/sda6
mdadm: hot removed /dev/sda6 from /dev/md1
my-linux-box:~ # mdadm /dev/md3 --remove /dev/sda7
mdadm: hot removed /dev/sda7 from /dev/md2
my-linux-box:~ # mdadm /dev/md2 --remove /dev/sda8
mdadm: hot removed /dev/sda8 from /dev/md3
Next check the raid status to validate if all the failed partition have been removed
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid1 sdb8[1]
870112064 blocks super 1.0 [2/1] [_U]
bitmap: 3/7 pages [12KB], 65536KB chunk
md0 : active raid1 sdb5[1]
529600 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
md3 : active raid1 sdb7[1]
4200640 blocks super 1.0 [2/1] [_U]
bitmap: 0/1 pages [0KB], 65536KB chunk
md1 : active raid1 sdb6[1]
4200640 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
unused devices: <none>
Replace the failed disk with the new one, the syslog should contain similar message as to below
Aug 18 15:53:12 my-linux-box kernel: [ 8365.422069] hpsa 0000:03:00.0: added scsi 0:2:0:0: Direct-Access HP EG0900FBVFQ RAID-UNKNOWN SSDSmartPathCap- En- Exp=2 qd=30
Re-enable the logical drive using hpssacli
After re-enabling the logical drive, its required to verify the status which should return “OK”.
my-linux-box: # hpssacli ctrl slot=0 ld 1 modify reenable forced
my-linux-box:# hpssacli ctrl slot=0 ld all show status
logicaldrive 1 (838.3 GB, 0): OK
logicaldrive 2 (838.3 GB, 0): OK
sdax is now missing from RAID as expected.
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid1 sdb8[1]
870112064 blocks super 1.0 [2/1] [_U]
bitmap: 5/7 pages [20KB], 65536KB chunk
md0 : active raid1 sdb5[1]
529600 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
md3 : active raid1 sdb7[1]
4200640 blocks super 1.0 [2/1] [_U]
bitmap: 0/1 pages [0KB], 65536KB chunk
md1 : active raid1 sdb6[1]
4200640 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
unused devices: <none>
Now copy partition table from sdb to sda.
my-linux-box:~ # sfdisk -d /dev/sdb | grep -v ten | sfdisk /dev/sda –force –no-reread
Checking that no-one is using this disk right now ...
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
OK
Disk /dev/sda: 109437 cylinders, 255 heads, 63 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
Old situation:
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/sda1 * 0+ 109437- 109438- 879054336 f W95 Ext'd (LBA)
/dev/sda2 0 - 0 0 0 Empty
/dev/sda3 0 - 0 0 0 Empty
/dev/sda4 0 - 0 0 0 Empty
/dev/sda5 0+ 66- 66- 529664 fd Linux raid autodetect
/dev/sda6 66+ 588- 523- 4200704 fd Linux raid autodetect
/dev/sda7 589+ 1111- 523- 4200704 fd Linux raid autodetect
/dev/sda8 1112+ 109435- 108324- 870112256 fd Linux raid autodetect
New situation:
Units = sectors of 512 bytes, counting from 0
Device Boot Start End #sectors Id System
/dev/sda1 * 512 1758109183 1758108672 f W95 Ext'd (LBA)
/dev/sda2 0 - 0 0 Empty
/dev/sda3 0 - 0 0 Empty
/dev/sda4 0 - 0 0 Empty
/dev/sda5 1024 1060351 1059328 fd Linux raid autodetect
/dev/sda6 1060864 9462271 8401408 fd Linux raid autodetect
/dev/sda7 9462784 17864191 8401408 fd Linux raid autodetect
/dev/sda8 17864704 1758089215 1740224512 fd Linux raid autodetect
Warning: partition 1 does not end at a cylinder boundary
Successfully wrote the new partition table
Re-reading the partition table ...
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
Erase possible RAID config data ( from a reused disk)
my-linux-box:~ # mdadm --zero-superblock /dev/sda5
my-linux-box:~ # mdadm --zero-superblock /dev/sda6
my-linux-box:~ # mdadm --zero-superblock /dev/sda7
my-linux-box:~ # mdadm --zero-superblock /dev/sda8
Afterwards the logical volumes can be added again to the SW RAIDs.
my-linux-box:~ # mdadm /dev/md0 --add /dev/sda5
mdadm: added /dev/sdb5
my-linux-box:~ # mdadm /dev/md1 --add /dev/sda6
mdadm: added /dev/sdb6
my-linux-box:~ # mdadm /dev/md3 --add /dev/sda7
mdadm: added /dev/sdb7
my-linux-box:~ # mdadm /dev/md2 --add /dev/sda8
NOTE: Add individual raid partition only once the last added shows as [UU]
How to install GRUB on the disk?
Once md0 has synchronised the grub should be installed again on both disks calling the grub installer.
Finally use the command grub-install which should without error message install the grub on both disks (hd0 and hd1).
# grub-install
GNU GRUB version 0.97 (640K lower / 3072K upper memory)
[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename. ]
grub> setup --stage2=/boot/grub/stage2 --force-lba (hd0) (hd0,4)
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/e2fs_stage1_5" exists... yes
Running "embed /boot/grub/e2fs_stage1_5 (hd0)"... 17 sectors are embedded.
succeeded
Running "install --force-lba --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0) (hd0)1+17 p (hd0,4)/boot/grub/stage2 /boot/grub/menu.lst"... succeeded
Done.
grub> setup --stage2=/boot/grub/stage2 --force-lba (hd1) (hd1,4)
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/e2fs_stage1_5" exists... yes
Running "embed /boot/grub/e2fs_stage1_5 (hd1)"... 17 sectors are embedded.
succeeded
Running "install --force-lba --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd1) (hd1)1+17 p (hd1,4)/boot/grub/stage2 /boot/grub/menu.lst"... succeeded
Done.
grub> quit
Finally validate the raid status
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid1 sda8[2] sdb8[1]
870112064 blocks super 1.0 [2/2] [UU]
bitmap: 6/7 pages [24KB], 65536KB chunk
md0 : active raid1 sda5[2] sdb5[1]
529600 blocks super 1.0 [2/2] [UU]
bitmap: 1/1 pages [4KB], 65536KB chunk
md3 : active raid1 sda7[2] sdb7[1]
4200640 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md1 : active raid1 sda6[2] sdb6[1]
4200640 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
unused devices: <none>
Similarly the disk replacement can be performed for second logical drive.
Related Articles:
Collect Virtual Connect Support Dump of HP c-Class Blade Enclosures
How to collect/generate "Show All" report for HP c-Class Blade Enclosures
How to downgrade HP Emluex CNA NIC card firmware version