Posey's Tips & Tricks
Where Air Gapped Backups Actually Fail, Part 1
Air gapped backups can provide critical ransomware resilience, but restore testing gaps, documentation drift and media rotation mistakes can leave organizations unsure whether they can actually recover.
Creating air gapped backups are one of the easiest and most cost-effective ways that an organization can add an extra layer of resiliency to its backups. After all, air gapped backups are physically disconnected from the system once completed, meaning that there is no exploit path for attackers. Even if your entire system were to become encrypted by ransomware, your air gapped backups would remain unaffected (assuming that you do not mount them until after the infection has been removed) and represent one of your best options for recovering.
Unfortunately, air gapped backups can and sometimes do fail. Ironically, these failures usually don't stem from the backup becoming compromised by an attacker, but rather simple human error.
The Untested Restore Problem
The single biggest reason why air gapped backups fail is a lack of restoration testing. Unlike conventional backups, restoring an air gapped backup tends to be a slow and possibly even manual process. The restoration may even require physical access to multiple machines. Admins can easily fall into the trap of skipping testing because it's a pain and assuming that the backup worked last time, so it's probably fine.
I recently had someone challenge me on this particular point. Their reasoning was something along the lines of, "just because a backup hasn't been tested doesn't automatically mean that the backup is invalid. Nine times out of ten, the backup is probably fine."
This is actually a fair point, so let's delve into it a bit. Let's pretend for a moment that for years, an organization has been diligently supplementing its primary backups with air gapped backups. Now, something has happened and the air gapped backup needs to be restored. But let's pretend that the air gapped backup is absolutely fine, even though it hasn't been tested in ages.
The problem that occurs at that point is that because nobody has tested the air gapped backup in a really long time, there is a good chance that nobody actually remembers how to perform the restoration. Sure, they might have a general idea of how the restore works, but probably won't recall all of the details.
Some might be quick to point out that situations like this are the reason why documentation exists. However, procedures tend to drift over time. Depending on how long it has been since an air gapped backup has been restored, there is a good chance that the documentation is going to be at least partially wrong.
So does this mean that the backup can't be restored? No, of course not. It does however, mean that the recovery process will probably end up taking a lot longer than it would if the backup team had been in the habit of testing the air gapped backup (which would familiarize them with the recovery procedure).
Rotation Drift
Another reason why air gapped backups sometimes fail has to do with rotation drift. Rotation drift is what happens when people forget which media they should be writing backups to. This can happen when labeling becomes inconsistent over time, or when a disk gets skipped "just this once," or when a backup admin goes on vacation, leaving someone else to create the backups.
The end result is that the air gapped backup exists, but nobody is quite sure which backup copy is the current one. This type of failure doesn't result in data loss, but it can lead to ambiguity and a lack of trust.
Admittedly, I really struggled with rotation drift in my own organization at one time. I write air gapped backups to removable hard drives and would occasionally lose track of which drives had been written to most recently. My solution was actually really simple, yet very effective.
There is a company named ORICO that makes protective storage boxes for hard drives (you can get them on Amazon). I simply ordered storage boxes in various colors and I use the box color as a means of keeping track of my disk rotation scheme. I won't go into the full rotation scheme, but I always use the purple case to designate the disk that is next to be overwritten. Similarly, I use the blue case for the most recently created backup.
In part 2 of this series, I want to talk about some other ways that air gapped backups sometimes fail and some things that you can do to improve your odds of creating backups that you can actually use.
About the Author
Brien Posey is a 22-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.