Posey's Tips & Tricks
Data Hoarding: The Backup Problem that Nobody Wants to Admit To
Letting data pile up may feel safer than deleting it, but unchecked accumulation can make backups slower, costlier and harder to recover when something goes wrong.
A few weeks ago, I had something go wrong with my backup system and had to reseed the backup. That essentially meant restoring my latest backup and then synchronizing everything that had changed since the time that the backup was created. It sounds simple enough, but the reseeding process took days to complete. As I waited for the restoration to complete, I began to realize just how much data I had accumulated. The restore operation was taking forever because I had allowed data growth to occur without restraint. That being the case, I wanted to take a super honest look at why data hoarding is a thing.
Let me begin by saying that data hoarding is not irrational. It has been said that data hoarding is what happens when the powers that be punish deletion more than accumulation. It's the idea that there are no immediate consequences associated with hanging on to unnecessary data, but deleting something important can have immediate and painful consequences. In other words, the potential for losing something important feels worse than the burden of hanging on to too much data. To put it another way, people rarely get into trouble for hanging on to too much data, but people absolutely get in trouble for deleting something that they shouldn't.
Another reason for data hoarding is that it is really difficult to prove that a file is useless and that it will never be needed again. Think about that one for a second... It can be really easy to prove that you need to hang on to a file. As an example, I just finished filing my taxes and I know that I need to keep a copy of the tax return and all of the receipts that go along with it just in case my return were ever to be called into question. In other words, it is easy to justify the need to keeping important data.
On the other hand, if you have got data that seems to be unimportant, how can you ever know for sure that the data will never be needed? A file that seems unimportant today might later become useful as evidence in a dispute, data that can be included in a report, or the file might even contain something that becomes personally meaningful at some point in the future.
If the idea of a random file becoming personally meaningful later on seems strange, let me give you an example. I have a Microsoft Word document saved on my file server. It has been there for 31 years! The document is just a walk through explaining how to use some of the tools that are built into Windows 95 to diagnose memory problems. For a while, I kept this document because I try to store an archival copy of everything that I write (you wouldn't believe how often I have to refer back to old articles). What makes that particular piece special is that it was the first thing that I ever had published. I didn't really think all that much about it at first, but as time went on, I began to realize that the article had personal meaning.
Another reason why I think that data hoarding is a thing is because people (myself included) sometimes become personally invested in their data. Data might contain anything from the work that someone has done. It might be a reflection of decisions that someone has made, or it might even be a piece of their work history (like my 31-year-old Word document).
Deleting this type of data can feel as though you are losing a part of yourself. After all, you are erasing things that you had previously put effort into creating. And you are not only erasing work, you are erasing any proof that the work had ever even existed. If I am to be brutally honest, I still have a copy of a business plan that I wrote way back in 2001. I seriously doubt that I will ever need it again, but I likely spent hundreds of hours doing research, compiling data, writing, and editing the business plan. It's hard to delete something that you have worked so hard on, even if it was a long time ago.
Still another reason why people tend to keep data is because of mistrust. At some point, we have probably all had a file go missing or a restore operation fail. That being the case, many of us have been conditioned to "keep an extra copy, just in case."
Finally, I think that the illusion of cheap storage sometimes comes into play. In my case for example, I have three enterprise grade storage arrays in my server rack, and I have barely made a dent in the available capacity. When I find myself considering whether to keep a file, I sometimes catch myself thinking, "why is this even a question? I have plenty of space."