How to replace a single hard drive in an RSA NetWitness Platform appliance
Issue
Need to replace a hard disk in either an Appliance head unit or a Direct Attached Capacity (DAC) array.Resolution
Notes:
- Steps 1 to 7 may be done remotely either via SSH or DRAC Console.
- If you need to replace more than one disk, then replace them in the following order of preference:
- Disks on-board the appliance head unit (versus a DAC)
- Disks which currently have physical state of Failed, Offline, Unknown state or Missing
- Disks which currently have physical state of Unconfigured or Hotspare
- Ensure that if you are replacing the OS drive on an appliance (In Series 4 and Series4S appliances that are NOT Hybrids, AIO or Warehouse Nodes, these show as 136.732 GB or 160 GB in nwraidutil.pl output), that you shutdown the appliance operating system first. Prior to shutting down, it is always recommended to stop capture on network/log decoders and aggregation on concentrators in either the NetWitness UI or REST interface and stop the services.
CentOS6 would be: stop nwappliance && stop nwconcentrator
- Warning: If you need to replace more than one disk, do not replace all disks simultaneously. If the previous disk was a virtual disk/RAID volume member, then wait for any RAID rebuild to complete (see step 6 for the command).
- For SA Warehouse Nodes (MAPR SAW), please refer to the article entitled Security Analytics | How to replace faulty disk on RSA Security Analytics Warehouse (SAW) node in RSA Security Analytics 10.3 and higher instead of this article for disk replacement.
Steps for Replacing Single Disk
- Confirm the logical location of the disk.
Use nwraidutil.pl to confirm Adapter and whether disk is on-board the appliance head unit or on a DAC
The disks in the appliance head unit are typically shown as either 136.732 GB (146 GB) or 931.512 GB (1 TB) and for Series 4 / S4S Appliances, Adapter 0 is usually one of the models Dell PERC H700/H710.The disks in a JBOD/DAC are typically in enclosures that are attached to Adapter 1. For Series 4 / S4S Appliances, Adapter 1 is usually one of the following models Dell PERC H800/H810.
nwraidutil.pl - How to Download and Use the RSA NetWitness RAID Utility
Note 1: In RSA NetWitness 11.x, a version of nwraidutil.pl has been included in rsa-sa-tools RPM which installs to /opt/rsa/saTools/nwraidutil.pl. This may not be the newest version of this script.
Note 2: In RSA NetWitness 11.3 above, nwraidutil.pl has been deprecated by /opt/rsa/saTools/nwraidtool.py which uses perccli instead of MegaCli64. In this case, You can find hard disk drive replacement guide in https://community.rsa.com/docs/DOC-57956.
Record the enclosure number and the slot number of the disk to be replaced if not already known, for example, Enclosure 15 Slot 2 is referred to as [15:2] in commands below.
Please see Example 2 in the notes below.
In some rare instances, Linux will reverse the Adapters and the onboard disks are on Adapter 1.
- Confirm the current role of the disk - if it is part of a Virtual Disk (VD), a Hotspare or Unconfigured/Failed.
Use nwraidutil.pl and examine Physical Disk Information.- If disk to be replaced shows as GEI or ID-X, for example, ID-0 in the Physical Disk Information, then it is a hotspare.
- If disk to be replaced shows as (U) - Unconfigured, (X) - Offline or (!) - Failed in the Physical Disk Information then it is currently not part of VD or hotspare.
Please see Example 2 in the notes below.
Use nwraidutil.pl and examine Logical Disk Information. - If disk to be replaced shows as (O) in the Logical Disk Information, then it part of a VD set.
Please see Example 1 in the notes below.
- Take Disk Offline
Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -PhysDrv[ENCLOSURE:SLOT] -a<adapter_num>Example - Replacing the 14th disk in 1st DACDisk state may change to (U) for Unconfigured.
Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -PhysDrv[15:13] -a1
Example Output:
Adapter: 1: EnclId-15 SlotId-13 is marked Offline.
Exit Code: 0x00
- Mark drive as missing [optional]
Note: This does not need to be done if the disk is not currently part of a VD and so does not show as (O) in Logical Disk Information.Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PdMarkMissing -physdrv[ENCLOSURE:SLOT] -a<adapter_num>Example - Replacing the 14th disk in 1st DAC
Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PdMarkMissing -physdrv[15:13] -a1
Example Output:
Adapter: 1: EnclId-15 SlotId-13 is marked Missing.
Exit Code: 0x00
- Prepare for removal.
Note: This does not need to be done if the disk is not currently part of a VD and so does not show as (O) in Logical Disk Information.
Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PdPrpRmv -physdrv[ENCLOSURE:SLOT] -a<adapter_num>Example - Replacing the 14th disk in 1st DAC
Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PdPrpRmv -physdrv[15:13] -a1
Example Output:
Prepare for removal Success
Exit Code: 0x00
- Show progress of rebuild [optional].
If there is a hotspare disk available and a member of the VD is taken offline, then the hotspare disk state will change to R - Rebuild once step 3 is run.
This is not applicable if disk taken offline was not previously (O) in the Logical Disk Information.Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv[ENCLOSURE:SLOT] -a<adapter_num>Example - Watching rebuild of hotspare to member of VD
Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv[15:14] -a1
Example Output:
Rebuild Progress on Device at Enclosure 15, Slot 14 Completed 3% in 7 Minutes.
Exit Code: 0x00 - Show the physical location of the drive to be replaced.
Start Flashing a Drives Amber LED: (you may need to remove DAC faceplate to properly see drive LEDs)
/opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[ENCLOSURE:SLOT] -a<adapter_num>Stop Flashing a Drives LED:
/opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[ENCLOSURE:SLOT] -a<adapter_num> - Physically replace the drive.
- Get missing drives Array and Row (required for next step) [optional if Step 4 is skipped or fails].
This is optional if Step4 was skipped or failed but physical disk status shows as (U) - Unconfigured.
Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PdGetMissing -a<adapter_num>Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PdGetMissing -a1
Example Output 1:
Adapter 1 - No Missing Drive is Found.
Exit Code: 0x00
Example Output 2:
Adapter 1 - Missing Physical drives
No. Array Row Size Expected
1 1 13 139392 MB
Exit Code: 0x00
If the output shows "No Missing Drive is Found" then can skip Step 10.
- Replace Missing Drive.
Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PdReplaceMissing -PhysDrv[ENCLOSURE:SLOT] -Array<N> -row<N> -a<adapter_num>
Note: Arrays start a 0 so first array is Array0
Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PdReplaceMissing -PhysDrv[15:13] -Array1 -row13 -a1
Example Output:
Adapter: 1: Missing PD at Array 1, Row 13 is replaced.
Exit Code: 0x00
- If disk shows as (X) for Offline in nwraidutil.pl output, change disk back to Online.
Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PDOnline -PhysDrv[ENCLOSURE:SLOT] -a<adapter_num>
Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PDOnline -PhysDrv[15:13] -a1
Example Output:
EnclId-15SlotId-13 state changed to OnLine.
Exit Code: 0x00
-
- Show progress of rebuild (not applicable if replaced disk will become hotspare)
See Step 6 above for command.
OR
- If the disk was previously a hotspare (or there is currently no hotspare present in the enclosure), then a hotspare needs to be configured.
nwraidutil.pl should show Physical Disk state as(U) - Unconfigured.Example Output:
15 14 (U) 0 1.819 TB HITACHI HUS72302CLAR2000C1D6YGKNKHTK
If the disk is onboard the head unit or comes from the 1st DAC on a S4/S4S/SA appliance, then it is usually set as Global hotspare with Enclosure Affinity:Syntax:After running this command disk status will show as GEI.
/opt/MegaRAID/MegaCli/MegaCli64 pdhsp set enclaffinity physdrv[ENCLOSURE:SLOT] -a<adapter_num>
For disks from enclosures after the 1st DAC, it is usually recommended to configure hotspare as Virtual Disk dedicated hotspare:Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 pdhsp set dedicated Array<N> physdrv[ENCLOSURE:SLOT] –a<adapter_num>Example: Setting 14th SATA disk in the 1st DAC of a packet concentrator as VD dedicated hotspareAfter running this command disk status will show as ID-X. Above example creates a hotspare for 2nd VD (Array1), and so disk status will show as ID-1.
/opt/MegaRAID/MegaCli/MegaCli64 pdhsp set dedicated Array1 physdrv[15:13] -a1
Notes on setting hotspare:- In order for a hotspare to take over the role of a disk in an array, the disk needs to have the same capacity as the disk it is replacing. The impact of this is that a SATA disk cannot be a hotspare for the array of Solid State Disk (SSD) found in concentrator DACs.
- If you are creating a hotspare which is not the last disk in the DAC, consider adding the parameter -nonRevertible to make the hotspare non-revertible.
- If a disk in VD is still in (R) - Rebuild state, then setting hotspare for this VD will likely fail while this is still rebuilding.
- Show progress of rebuild (not applicable if replaced disk will become hotspare)
- If the hotspare was previously a VD dedicated hotspare, then one of the features of this hotspare type is the hotspare is revertible (once bad disk is replaced, the hotspare will go back to being a hotspare). The way this is done is using copyback.
15 14 (?) 0 2.728 TB SEAGATE ST3000NXCLAR3000GS18Z1Y1H6A1
15 14 (C) 0 2.728 TB SEAGATE ST3000NXCLAR3000GS18Z1Y1H6A1
So once old hotspare is a member of VD and is (O) status, then replaced drive will be in copyback status while it copies the contents of the old hotspare in preparation for that drive to revert back to a hotspare.
There are two options:
- Wait for copyback to complete (which will take somewhere in the region of 7 hours)
This can be monitored through the RAID adapter log:Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 adpeventlog getsinceshutdown –f <output_filename.log> –a<adapter_num>Example Command:See Example 3 in Notes below for an example of raid adapter event log.
/opt/MegaRAID/MegaCli/MegaCli64 adpeventlog getsinceshutdown –f raid_events_since_boot.log –a1
Copyback progress can also be monitoring using the following command:Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 pdcpybk showprog physdrv[ENCLOSURE:SLOT] –a<adapter_num>Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 pdcpybk showprog physdrv[15:13] -a1
Example Output:
Copyback Progress on Device at Enclosure 15, Slot 13 Completed 5% in 21 minutes
Exit Code: 0x00
- Stop copyback and manually set hotspare using the command in Step 11b.
/opt/MegaRAID/MegaCli/MegaCli64 pdcpybk stop physdrv[ENCLOSURE:SLOT] –a<adapter_num>
/opt/MegaRAID/MegaCli/MegaCli64 pdcpybk stop physdrv[15:13] -a1
If you have any questions, concerns, or feedback about this article, please contact RSA Support quoting this KB number.
Notes
Assumptions:This article assumes the Security Analytics/NetWitness appliance hardware is either a Series 4 appliance, S4S Appliance or Series 5 appliance.
Some of output may be incorrect for some larger disk types on Series 6 hardware.
For Series 3 appliances which have reached End of Product Support (EOPS), you may need to exchange
in the above commands.
Example 1: Example of Logical Disk Information in nwraidutil.pl output from a packet concentrator appliance
Logical Disk Information
------------------------
Physical Drive State Legend
-------------------------------------------------------------------------------------
B Unconfigured(Bad) O Online
D Dedicated Hotspare and associated virtual drive number R Rebuild
E Hotspare prefers same enclosure S Solid-State Drive
F Foreign U Unconfigured(Good)
G Global hotspare X Offline
I Hotspare is revertible ! Failed
M Missing ? Unknown state
-------------------------------------------------------------------------------------
NOTE: 'E' does not prohibit a hotspare from being used in another enclosure, it is merely a preference
Logical Drive State Legend
------------------------------------------------
D Degraded X Offline
O Optimal ! Failed
P Partially Degraded ? Unknown state
R Rebuild
------------------------------------------------
Adapter: 0 - PERC H710P Mini
Virtual Disk: 0 (O) - Found 2 of 2 {Raid Level 1, 136.125 GB, 128 KB Stripe Size, WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU}
PD: 0 Enclosure: 32 Slot: 0 (O) 136.732 GB SEAGATE ST9146853SS YS0A6XM3H6HP
PD: 1 Enclosure: 32 Slot: 1 (O) 136.732 GB SEAGATE ST9146853SS YS0A6XM3H7DG
Virtual Disk: 1 (O) - Found 2 of 2 {Raid Level 1, 931.0 GB, 128 KB Stripe Size, WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU}
PD: 0 Enclosure: 32 Slot: 2 (O) 931.512 GB SEAGATE ST91000640SS AS099XG5SYBG
PD: 1 Enclosure: 32 Slot: 3 (O) 931.512 GB SEAGATE ST91000640SS AS099XG5SXBT
Adapter: 1 - PERC H810 Adapter
Virtual Disk: 0 (O) - Found 7 of 7 {Raid Level 5, 10.913 TB, 128 KB Stripe Size, WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU}
PD: 0 Enclosure: 15 Slot: 7 (O) 1.819 TB HITACHI HUS72302CLAR2000C1D6YFKV9MJK
PD: 1 Enclosure: 15 Slot: 8 (O) 1.819 TB HITACHI HUS72302CLAR2000C1D6YFKUYZDK
PD: 2 Enclosure: 15 Slot: 9 (O) 1.819 TB HITACHI HUS72302CLAR2000C1D6YFKTAXWK
PD: 3 Enclosure: 15 Slot: 10 (O) 1.819 TB HITACHI HUS72302CLAR2000C1D6YGKU290K
PD: 4 Enclosure: 15 Slot: 11 (O) 1.819 TB HITACHI HUS72302CLAR2000C1D6YFKV6S9K
PD: 5 Enclosure: 15 Slot: 12 (O) 1.819 TB HITACHI HUS72302CLAR2000C1D6YGKNP24K
PD: 6 Enclosure: 15 Slot: 13 (O) 1.819 TB HITACHI HUS72302CLAR2000C1D6YGKNKHTK
Virtual Disk: 1 (O) - Found 7 of 7 {Raid Level 5, 1.087 TB, 128 KB Stripe Size, WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU}
PD: 0 Enclosure: 15 Slot: 0 (OS) 186.310 GB HITACHI HUSRL402 CLAR200C190XTVVDY5A
PD: 1 Enclosure: 15 Slot: 1 (OS) 186.310 GB HITACHI HUSRL402 CLAR200C190XTVVZHMA
PD: 2 Enclosure: 15 Slot: 2 (OS) 186.310 GB HITACHI HUSRL402 CLAR200C190XTVUX9RA
PD: 3 Enclosure: 15 Slot: 3 (OS) 186.310 GB HITACHI HUSRL402 CLAR200C190XTVVZ9LA
PD: 4 Enclosure: 15 Slot: 4 (OS) 186.310 GB HITACHI HUSRL402 CLAR200C190XTVVW9RA
PD: 5 Enclosure: 15 Slot: 5 (OS) 186.310 GB HITACHI HUSRL402 CLAR200C190XTVW1VEA
PD: 6 Enclosure: 15 Slot: 6 (OS) 186.310 GB HITACHI HUSRL402 CLAR200C190XTVVAG6A
No logical disk problems found.
Example 2: Example of Physical Disk Information in nwraidutil.pl output from a packet concentrator appliance
Physical Disk Information
-------------------------
Physical Drive State Legend
-------------------------------------------------------------------------------------
B Unconfigured(Bad) O Online
D Dedicated Hotspare and associated virtual drive number R Rebuild
E Hotspare prefers same enclosure S Solid-State Drive
F Foreign &nbs