One of the most significant changes Microsoft has made to Hyper-V-based virtual desktop infrastructure in Windows Server 2012 R2 is allowing native data deduplication on volumes running virtual desktops.
Still, taking advantage of native Windows data deduplication for VDI requires a little bit of planning. It is common for Hyper-V-based VDI deployments to be constructed with clustered Hyper-V servers, where each cluster node is connected to a storage area network (SAN)-based Cluster Shared Volume (CSV). Although this basic architecture can be used in conjunction with native file system deduplication, one additional component is required.
With Windows Server data deduplication, the Hyper-V servers cannot be connected directly to the physical storage. They must connect to the file server, which in turn provides access to the physical storage. Microsoft requires the storage to be managed by a physical file server that runs Windows Server 2012 R2 and that does not run Hyper-V. There aren't any requirements for the underlying physical storage beyond those that exist for any other file server. File servers can make use of local storage, or the file servers can be connected to a SAN.
Keep in mind that you do not want the file server to become a single point of failure, so it should exist as a cluster. Once constructed, you need to configure the file server to host a volume that will serve as a CSV for Hyper-V. In other words, a Hyper-V cluster connects to a CSV that is hosted by a clustered file server.
Requirements for native data deduplication
Microsoft introduced the data deduplication feature in Windows Server 2012. In that release, enabling deduplication involved a few steps. First, you had to make sure that the volume that you were planning to deduplicate was configured to use the NTFS file system. It did not support deduplication of volumes formatted with other file systems (including the new ReFS file system).
Then you had to install the File and Storage Services role and the Data Deduplication subrole, which is not installed by default. Finally, you could enable deduplication for the volume by using the Enable-DedupVolume PowerShell cmdlet.
All these requirements are the same in Windows Server 2012 R2. However, if you are planning to deduplicate a volume that you will use for VDI, you must tell Windows how the volume will be used. You can do so through a new command line parameter called UsageType. In a Microsoft VDI environment, the usage type must be set to Hyper-V.
Also, be aware of how CSVs are exposed to the operating system. Regardless of where the CSV actually resides, it is exposed as a subdirectory of C:\clusterstorage. With that in mind, let's pretend that you want to enable deduplication for a set of virtual desktops, and the virtual hard disks for those desktops reside on a CSV exposed as C:\clusterstorage\Volume1. In this situation, the PowerShell command that you would use to enable data deduplication is:
Enable-DedupVolume C:\ClusterStorage\Volume1 –UsageType Hyper-V
Manually enabling data deduplication
Enabling deduplication for VDI is a fairly straightforward process. However, there is one last thing to think about. If you are planning to use deduplication, there is a good chance that your CSV might not be large enough to accommodate all the virtual desktops in their uncompressed form. After all, why would you allocate disk space that you aren't going to need?
More on data deduplication
Data dedupe options for server virtualization
What to expect from deduplication
How primary dedupe affects backup deduplication
Windows uses post-processing deduplication. This means that all the data (in this case, the virtual desktops) will have to initially be stored in an uncompressed form. So, what do you do if your CSV is too small?
The solution to this problem is to copy a few virtual desktops to the CSV , then initiate the deduplication process manually. Once the process completes, you can copy a few more virtual desktops, then do another manual deduplication. Repeat the process until all the virtual desktops are copied to the CSV. Keep in mind that the deduplication process requires some workspace of its own, so you should always keep at least 10 GB of disk space free.
To manually start the deduplication process, use the following command:
Start-DedupJob C:\ClusterStorage\Volume1 –Type Optimization
This was first published in November 2013