Random Access
Decoding Modern Data Storage Features
For those who might have not been paying close attention lately to the ever-changing storage landscape, here's some key terms to keep in mind.
- By Scott D. Lowe
- 05/21/2013
If you're new to the world of storage or if you haven't traversed the enterprise storage landscape, get prepared for a hike! There are quite a number of features available in enterprise-class storage and the business value can sometimes be hard to grasp. This column will explain some of the most popular storage features and attempts to place some business value on each.
Deduplication
Deduplication is a feature found in many enterprise-class arrays sold today and its business value is the second easiest to quantify. This technology looks for commonality in data on the array and, if there is duplicate data, discards the duplicates and, in their place, writes a pointer to the original copy of the data. In the old days, this used to be called "single instance storage," and it simply means that data is written only once to the array; subsequent copies simply reference the master.
Deduplication can have significant capacity savings. The exact impact of deduplication is workload dependent, though. If you're running systems wherein there is little data commonality, the impact will be less.
It's also important to note that there are a couple types of common deduplication:
- Inline: With inline deduplication, as data makes its way through the storage array, it is analyzed and, if possible, deduplicated on the fly. This method has the benefit that the data on the array is always deduplicated and maximize storage benefit ensues. This method does impose additional burden on array processing resources and might introduce additional latency into the I/O stream. That said, with today's overpowered processors, deduplication is becoming much less impactful.
- Post-process: With post-process deduplication, data is initially written to the array as-is. It's not modified in any way. Later, in a defined schedule or when array processing resources become available, the array scans new data to determine if any can be deduplicated. If it can, the process deletes the duplicated data and writes a pointer to the original data in its place. This process doesn't impose as large a constant burden on ongoing I/O operations, except while the deduplication process is running. However, from a capacity perspective, the array must have enough capacity to house all of the data copies until the dedupe cycle starts.
The business benefit for deduplication is pretty clear. With it, companies can conserve space on their arrays and extend the life of what is generally a pretty expensive investment.
Compression
Another data reduction feature, compression, is an additional way by which storage devices can conserve space. Whereas deduplication works across an entire volume, compression operates solely at the file level and actively works to identify and eliminate redundant data inside that file. In that way, compression is much like deduplication. The primary difference between the two technologies is one of scope; deduplication operates on many files at once while compression reduces the size of single files.
As is the case with deduplication, compression's business value lies in its ability to help organizations extend the life of their storage investments.
Thin Provisioning
Thin provisioning -- another way to conserve storage space -- is in an increasingly common way that organizations choose to extend the life of their storage investments. Thin provisioning enables administrators to assign to systems all of the storage they believe they will require for the life of those systems while not using the space on the storage until it's actually needed. Think of it like this: if an administrator creates a 200 TB volume just in case it's needed at some point, but, at the onset, the volume holds only 5 TB of data, 195 TB of capacity just sits there, basically wasted. What if, while it was not being used, you could assign that 195 TB of "free space" to other services? That's the thinking behind thin provisioning.
As with all good things, thin provisioning does have a dark side. Because space might be "in use" by multiple systems, tracking "real" available space can sometimes be difficult. Administrators run the risk of overprovisioning their storage, and if capacity eventually runs out, the array can crash, resulting in possible data loss. Therefore, it's important to keep an eye on actual usage when using thin provisioning.
On the business side, thin provisioning is a third method by which capacity can be maximized.
Encryption
I saved for last the feature that I think is the most important one: encryption. Data security is of paramount concern to many organizations as breaches continue to be published all over the media. If a hard drive or array is lost or stolen, and if that data is not encrypted, anyone can simply read the contents of that device. Those contents can become public and do great damage to the company and to individuals whose information is stolen.
Many enterprise customers encrypt the contents of their storage devices while the data is at rest. This means that, while the data resides on the array, it's encrypted and unreadable without a decryption key. A thief could steal all of the hard drives from the array and not be able to recover the data.
The business benefit here is clear. The company stays out of the news and doesn't have to go into PR panic mode and pay out millions of dollars for a data breach.
About the Author
Scott D. Lowe is the founder and managing consultant of The 1610 Group, a strategic and tactical IT consulting firm based in the Midwest. Scott has been in the IT field for close to 20 years and spent 10 of those years in filling the CIO role for various organizations. He's also either authored or co-authored four books and is the creator of 10 video training courses for TrainSignal.