In-Depth

How To Get Started with Deduplication in Window Server 2012

Reduce duplicate data and free up space quickly with this Windows Server feature.

Deduplication is a feature introduced in Windows Server 2012 that can be enabled without the use of expensive hardware capabilities such as SAN or NAS. It helps to reduce your storage need by eliminating duplicate data and takes just a few minutes to set up. Duplicate blocks of data are identified and a single copy of the data is kept. The duplicate data blocks are replaced by a reference that points to the original copy. Even though storage space is cheap these days and getting cheaper, you may still find a need to use deduplication in your organization. It has advantages and disadvantages, which I've outlined here.

Disadvantages:

  • Implementing deduplication can create a load on your processor if it takes place on a production server and the server's performance may suffer. One way to avoid this is to offload the deduplication process an appliance or a backup target.
  • If some of your data is unique, it will not be included in the deduplication process. There are also certain types of files where much of the redundancy is already removed, such as MP3, MP4, CAB and Jpeg formats.
  • Copying a single file that is large can take longer on a volume that is deduplicated.
  • Hard quotas are not supported on a volume that is running data deduplication.
  • A service which cannot exist with data deduplication is the SIS (Single Instance Store) option. Single Instance Storage is the predecessor to deduplication that was available on Windows Storage Server.

Advantages:

  • Reduces overall usage of hard disk space.
  • There should be no noticeable differences for users who are opening an Office document. 
  • If the deduplication filter notices a corruption, it logs the error in the event log.
  • If there is a block of data that is referenced 100 times, an extra copy of it is made.
  • There is a weekly scrubbing job that checks the event log for instances of logged corruptions and fixes the data from alternate copies if they exist.
  • A volume that has deduplication enabled can be backed up and restored to another Windows 2012 server. All of the deduplication settings will be maintained on the volume and it will be automatically picked up by the deduplication filter when the volume is mounted.

Installing and Configuring Data Deduplication on Windows 2012 Server
It is a pretty simple process to install and configure deduplication. After you log in to your server as administrator, launch the Windows Server Manager. You have to enable deduplication through the Add Roles and Features wizard. After launching the wizard, you will find the feature under File and Storage Services/File and iSCSI Services/ Data Deduplication (see Figure 1).

[Click on image for larger view.]  Figure 1.

After the installation wizard completes you will want to configure deduplication on the server. To configure deduplication, you need to open the File and Storage Services section in Server Manager and right-click your volume and select Configure Data Deduplication. The Deduplication Settings page uses a checkbox to enable it (see Figure 2).

[Click on image for larger view.]  Figure 2.

The default time for Deduplicating files is five days, which can be changed but not necessarily recommended. If the time is changed to one day, you may be deduplicating files that are temporary and will be deleted in a short time. It is also possible to exclude certain file extensions or to exclude selected folders at this time. As mentioned above, certain file types such as MP3, MP4, CAB, and JPEG do not play well with deduplication so you could exclude those at this point if you want to.

Once you have indicated the extensions and folders to exclude, you can click the Set Deduplication Schedule button. The schedule gives you the ability to set the days of the week and the time you want deduplication to run. It is a good idea to have it run after hours if possible and make sure that it does not conflict with any other after-hours activity you already have scheduled, such as backups. Deduplication jobs will verify that there is enough memory to perform the task and if not, it will stop the task and try again at the next scheduled interval (see Figure 3).

[Click on image for larger view.]  Figure 3.

In order to take advantage of data deduplication, you must have one computer or virtual machine running Windows Server 2012 R2 or Windows Server 2012 with at least one data volume. As an option, you can also have another computer or virtual machine running Windows Server 2012 R2 or Windows Server 2012 that is connected to the server over a network.

About the Author

Troy Thompson has worked in network administration for over 25 years, serving as a network engineer and Microsoft Exchange administration in Department of Defense, writing technology articles, tutorials, and white papers and technical edits. Troy is a Cisco Certified Academy Instructor (CCAI), and has numerous other certifications including CCNA, MSCE+I, Network+, A+ and Security+. Troy has also traveled the world playing music as the guitarist for the band Bride. Contact information is [email protected].

Featured

comments powered by Disqus

Subscribe on YouTube