Posey's Tips & Tricks
Coping with a Battery Backup Failure
Brien discusses his process of replacing a backup battery and gives some insight on how to avoid this issue in the future.
Even though my organization is essentially a one-man operation, I have always tried to take advantage of any available safeguards to keep my systems up and running and to keep my data protected. One of the safeguards that I use is battery backups for every device on my network.
A few days ago, one of my backup batteries failed. The failed battery made an obnoxious beeping noise that I couldn't mute, so I immediately went to the store to get a replacement battery. Upon my return, I checked to see which battery backup had failed. It turned out to be the only battery on my network that was supporting two devices -- my network switch and my wireless access point. This is where things got interesting.
Even though the old battery backup had failed, the devices that were plugged into the battery were still receiving power. Unfortunately, every device on my entire network was connected to either the switch or to the wireless access point that was plugged into the failed battery. Hence, replacing the battery would mean losing connectivity across my entire network.
In a way I was lucky because I work out of my home and nobody uses the network except for my wife and I. Even so, various logistical factors related to network usage made it impossible to replace the battery for three days (that was three days of the old battery beeping constantly).
Before I swapped the battery, I took a few precautions. First, I paused the replication process on some Hyper-V servers so that the loss of connectivity would not result in replication errors. I also have a Hyper-V cluster that I knew would lose quorum as soon as the network switch went offline, so I took the cluster offline and shut down the nodes.
It took me less than an hour to take everything offline, replace the battery and bring everything back up. Even so, the whole event left me thinking about what I might do differently in the future so that I might avoid the type of situation that I just described.
In large, this type of failure in enterprise-class organizations is typically a non-issue. Enterprise class networking hardware often contains redundant power supplies. This means that a single switch could theoretically be plugged into two separate batteries. That way, if a battery were to fail then it could be replaced without having to take the switch offline.
The problem with this approach is that I work out of my home. Even though most of the hardware that I use is much higher end than what the typical home user would purchase, using enterprise-class hardware is overkill. It's hard to justify the cost of enterprise-class hardware when you only have two users on your network. This made me stop and think about what a small- to mid-sized business might do to prevent the type of situation that I described earlier.
Rather than using high-end hardware with redundant power supplies, a smaller organization could effectively achieve the same end result by using redundant switches (each plugged into a separate battery). Of course for this concept to work, each network device would need a minimum of two connections -- one to each switch. That way, if a switch failed (or had to be taken offline) there would still be a path that network packets could use to reach their destination.
Windows Server 2012 can actually help out a lot with making this type of design work. I have seen a couple of examples in the past of redundant switched networks. The problem with such networks is that getting network devices to use an alternate network path when a failure occurs is sometimes tricky. For whatever reason, some devices try to continue using the old network path, even though an alternate path is available.
With Windows Server 2012, you can use the NIC teaming feature to make a network switchover seamless when a failure occurs. A NIC team can be designed to include a standby NIC. A standby NIC is a NIC that remains unused unless a hardware failure occurs. This means that you could connect a standby NIC on each server to a switch whose only job it is to provide an alternate communications path in the event that the primary switch goes offline. The nice thing about this approach is that if configured correctly, you can get away with using commodity network hardware.
As you can see, something as simple as a battery backup failure can have a major impact on your network. It's tough to do anything about the disruptive nature of such a failure after it happens, but with a little advanced planning, you can completely avoid having a major outage while you replace a battery.
Brien Posey is a 16-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.