Windows Server How-To
Using PowerShell To Rebuild a Windows Storage Spaces Array
Disk failures happen. Here's how to get back up and running with minimal effort.
Recently, I had a hard disk failure in one of my production servers. That particular server uses Windows Storage Spaces to combine four physical disks into a single parity virtual disk. The process of replacing a failed hard disk is normally pretty straightforward. In fact, this server started out running Windows Server 2012. Back then, a hard disk failed and I was able to replace the disk with minimal effort.
Today, this server is running Windows Server 2012 R2, and the process of swapping hard disks proved to be a lot more challenging. In fact, an operating system bug prevented me from being able to complete the process through the GUI. I had to delve into PowerShell to fix the disk. But I'm actually getting a little bit ahead of myself. Let me start at the beginning.
My production environment consists of two physical servers, both of which run Hyper-V. I have one production virtual machine that is being replicated at the Hyper-V level to the second server, which acts as a standby spare.
When the hard disk failed, the first thing that I had to do was to switch over to my spare server. Sure, the VM was still running, but I didn't want to risk data loss if additional drives were to fail. Moving the virtual machine was easy. I opened the Hyper-V Manager, shut down the virtual machine, and then right clicked on the virtual machine and chose the Replication | Planned Failover option from the shortcut menu. Once the failover had completed, I reversed the replication process and restarted the virtual machine. Now the virtual machine was running on my healthy server and treating the server with the disk problem as a replica.
When I was ready to replace the failed disk, I paused the replication process on both servers (on both servers). I then replaced the disk, powered the server back up, and resumed replication. Now, I just had to tell Windows to make use of the new hard disk.
When I booted the server after replacing the drive, Windows still reports the bad disk as being a part of the storage pool, as shown in Figure 1. The new disk appears within the primordial pool. Therefore, the trick is to remove the bad disk from the storage pool and replace it with the new disk.
The normal method for performing this task is to go to the list of physical disks within the storage pool and then choose the Add Physical Disk command from the Tasks drop down list, as shown in Figure 2. At this point, Windows should display the new disk that exists with within the primordial pool.
Select the check box next to the disk, verify that Allocation is set to Automatic, as shown in Figure 3, and then click OK. Under normal circumstances, the bad disk's Usage should change to reflect a status of Retired. Soon, the retired disk should disappear from the storage pool. At least that's the way that things have worked for me in the past.
In this case, the new disk was successfully added to the storage pool. Since the old disk didn't reflect a change of status, I right clicked on the disk and chose the Remove Disk command from the shortcut menu, as shown in Figure 4. Upon doing so, I received the prompt shown in Figure 5, asking if I wanted to remove the disk. I clicked Yes and Windows displayed the message shown in Figure 6, indicating that the disk was being rebuilt and that I shouldn't make any changes. So in this case, Windows did what it was supposed to.
In this case, it would appear that everything worked out fine. But the message shown above was actually deceptive. Not only did I have to manually launch a repair of the virtual hard disk, but after the repair completed, the virtual hard disk was still in a warning state. Attempting to manually remove the failed hard disk produced the error message shown in Figure 7. Even though I was replacing a failed 3TB hard disk with another 3TB disk, Windows is essentially saying that there are no disks in the server that are large enough to take the place of the failed disk. Obviously this is some sort of bug since the old disk and the new disk are the same size.
The solution to this problem is to repair the disk through PowerShell. To do so, you must begin by retrieving the names of your physical and virtual disks. You can accomplish this by using the following cmdlets:
As you can see in Figure 8, the friendly name of my failed physical disk is PhysicalDisk-1. My virtual disk is named Prod2 Virtual Disk.
Now that I have this information, I can work through the rest of the process. Unfortunately, I can't do this for real because Windows is repairing my virtual disk at the moment, but here are the commands that I would use if I were performing the procedure from PowerShell:
Set-PhysicalDisk –FriendlyName 'PhysicalDisk-1' –Usage Retired
Repair-VirtualDisk –FriendlyName 'Prod2 Virtual Disk'
Remove-PhysicalDisk –FriendlyName 'PhysicalDisk-1'
You will notice that I used the friendly names that had been assigned to my disks. You would obviously substitute the names that I used with the friendly names that had been assigned to your disks.
The first line of code above sets the status of the failed disk to Retired. The second line of code repairs the virtual hard disk. This repair process produces the screen shown in Figure 9. This process can take a long time to complete, depending on the size of your virtual hard disk.
It is worth noting that there are those who have reported that the Remove-PhysicalDisk cmdlet failed after the virtual disk was repaired. Oddly enough, in these situations, the physical disk was usually able to be removed through the GUI.
As you can see, replacing a failed disk within a storage pool isn't always as easy of a process as it should be. However, Windows PowerShell can help you to get the job done, even when the GUI fails.