Posey's Tips & Tricks
Business Continuity and Disaster Recovery Lessons Learned
Test, test and test some more.
As someone who is self-employed, I spend a lot of time thinking about disaster recovery and continuity of business. After all, if I were to suffer a major IT disaster and had no contingency plan, I could very quickly find myself out of business and with no source of income. Needless to say, I have always invested heavily in data backups and IT redundancy.
Earlier this week, I had a chance to test my business continuity plan. Unfortunately, this was no scheduled test. I suffered a disaster that rendered my entire server rack useless (it's a long story). That single incident wiped out both my production environment and my lab environment.
As bad as this particular incident might have been, things could have been a lot worse. My data backups did exactly what they were supposed to do, and I did not lose any data. The server hardware on the other hand, was a total loss.
Fearing that such an incident might one day occur, I have long made it a point to keep spare hardware on hand. That spare hardware not only worked, it's what I am running on right now as I wait for replacement hardware to arrive.
In many ways, my business continuity plan worked beautifully. After all, I was only offline for a few hours and I did not suffer any data loss. In spite of these successes, there were definitely some lessons learned. I wanted to go ahead and share those lessons, just in case they might benefit anyone else.
The first lesson that I learned was that going forward I probably need to do a better job of keeping my spare hardware up to date. Even though I have always made it a point to keep extra hardware on hand that I can use in an emergency situation, I haven't done the best job of making sure that my spare hardware is current.
There was a time when if I needed a server, I would order two identical servers and keep one of them in the box only to be used in an emergency. The problem with that approach is of course, that it is expensive. It meant spending good money for hardware that might never be used. As such, I haven't been super diligent about replacing spare hardware as a part of my planned hardware refreshes.
All of this is to say that although I am running off of spare hardware right now, that spare hardware is older than what I had been using and the performance isn't nearly as good as what I had become accustomed to. To put it another way, the spare hardware is getting the job done, but I wouldn't want to have to use it for an extended period of time.
This leads me to the next lesson learned. My server hardware was insured and I had an emergency fund that could be used to cover the cost of acquiring replacement hardware. What I hadn't taken into account however, was the current supply chain issues and their impact on hardware availability.
When I realized that I was going to have to replace my server hardware, I went to a vendor's Website, picked out some hardware, and got ready to place an order. The problem was however, that the aforementioned disaster occurred in July. Some of the servers that I had picked out would not be available until October. That being the case, I had no choice but to look elsewhere. I was eventually able to find a reseller who could get me what I needed within a matter of a few days, but I had to make some compromises and wasn't able to get exactly what I wanted.
A third lesson that I learned through this whole series of events is that it's not always in your best interest to economize on operating system licenses. At least some of my servers had been running on OEM Windows Server licenses. It seemed like a good deal at the time, because the hardware vendor had bundled the license with the hardware at a cost that was lower than that of a retail license. The problem with this was that OEM licenses are non-transferable. That meant that I had to purchase new licenses for my new servers as opposed to being able to reuse the licenses that I already had.
Ultimately, there was no real harm done. I didn't lose any data and the outage didn't last for very long thanks to spare hardware. Even so, My contingency plan wasn't perfect and I am sure that I will probably be rethinking the way that I do things as a result of lessons learned during this disaster.
About the Author
Brien Posey is a 22-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.