Although the actual disk fault management process will vary between organizations, depending on the policies, tools...
and personnel expertise available, there are some common elements of the disk replacement process that Windows administrators can follow.
First, you need to identify the faulty disk. Windows Server 2012 R2 provides several resources for disk fault and identification data including Event Viewer logs, through the Physical Disks report in Server Manager, through an alerts dialog in System Center Operations Manager (SCOM) or through Windows PowerShell queries. Where tools such as SCOM can report the specific location of a disk fault -- slot, tray and position -- other tools report a disk failure as a physical disk number or globally unique identifier (GUID). GUIDs can be translated into physical disk numbers using PowerShell Get-PhysicalDisk commands.
After determining which disk has failed, find it in the storage array enclosure. Many storage arrays provide LEDs that blink when a corresponding disk fails. If not, technicians will need extra time to find the correct physical disk or serial number.
Next, many technicians will first check the disk connections by attempting to reseat the troubled disk in its slot or cable connections. If this works, clear the blinking LED by resetting the physical disk use or removing the disk from the storage pool through a PowerShell PhysicalDisk command. If disk problems persist, replace the disk using the instructions for the particular storage array. Typical best practice states the new disk's characteristics should match the failed disk to prevent performance mismatches that might cause storage problems later. Replace the physical disk before removing the disk from any storage pool configuration. Give the new disk a chance to rebuild otherwise there may be data loss.
Make sure that each identical disk in the group or array is using the same firmware version. Once the new disk is in place, update its firmware to the latest accepted version used on other disks in the group or greater array. Remember that each new firmware version can introduce changes in timing and access. While this should improve the disk itself, firmware version differences can also introduce performance differences that might trigger unexpected or intermittent storage errors. Tools such as Server Manager or Windows PowerShell can report on disk firmware versions, and updates should follow the disk manufacturer's instructions.
At this point, use Server Manager or Windows PowerShell to add the new physical disk to the storage pool, and then retire and remove the old disk from the storage pool. In the event of a complete disk failure, the failed disk should have been retired automatically. If the disk is being replaced pre-emptively -- such as in response to intermittent problems -- retire the disk first through PowerShell.
As a final step in disk fault management, technicians can run a storage health test to verify the storage pool or cluster, and then dismiss any alerts.
Tips to stretch drive longevity
Failures can be the best training exercise
Techniques to handle server issues
Dig Deeper on Windows Server storage management
Related Q&A from Stephen J. Bigelow
Blade servers come in a variety of configurations. In order to effectively manage your data center, you'll want to consider your storage needs, blade... Continue Reading
When properly implemented -- and understood -- a cloud migration factory combines the right mix of people, processes and tools to smoothly transition... Continue Reading
Scaling up or scaling out is not a decision to be made lightly. Use monitoring data to determine the strategy that fits your use case and to inform ... Continue Reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.