One of the most powerful technologies today in Storage is a feature called Deduplication. Deduplication is technology that references data once instead of multiple times. This allows for insane Data Reduction. For instance, if you had 600GB of data with a 6:1 Dedup ratio, only 100GB would actually be stored! Dedupe within vSAN is done on 4KB Block level.
Compression is another great technology which allows you to squeeze more within the same footprint.
Deduplication and Compression in vSAN: How does Deduplication and Compression work within vSAN? The process begins with the cache tier. In order to achieve the best performance, vSAN keeps the most frequently referenced data within the cache device so that the most utilized data is hot and available for request; however, deduplication isn’t done in the cache tier, but rather the capacity tier. Once data is no longer being referenced for cache it becomes cold, and vSAN moves this data to the capacity tier. vSAN then dedupes the data within the capacity tier (where the deduplication actually occurs). Next, vSAN fingerprints the block using SHA1 hashing. If vSAN finds a match, the data is referenced to the existing hash and updated to start referencing that block of data; however, if there is no match, the block is fingerprinted and the data is kept. Finally, the last step is with regards to compression. If vSAN can compress the deduped 4KB block down below 2KB or less, it compresses. If not its original size is kept. This allows for even further data reduction.
So, as you can see, this technology can allow for great space savings within your environment. How do we get started? Fortunately, vSAN has made enabling this feature super simple. How simple? It’s as simple as checking a box to enable. The space savings alone will make a huge impact on the available storage needed, which can save you a significant amount of storage related dollars by utilizing this product.
Important facts regarding Dedupe/Compression within vSAN
- Must be enabled together, you can’t enable each seperately
- When enabled, must be enabled as a group within disks on the same disks group
- Dedupe is done on the 4KB block level
- If vSAN can compress the deduped 4KB block down below 2KB or less, it compresses. If not its original size is kept
- Sha1 hash is used for deduplication
- LZ4 is the compression algorithm used for compression
- Single device failures will make the entire disk group appear unhealthy
- Deduplication/Compression is done at the disk group level
With that said, this is something that should be strongly considered when deploying vSAN; however, it’s important to consider the performance implications. The first requirement is that your vSAN cluster must be all flash, which means not only must your cache devices be flash, but your capacity devices must also be flash. This requirement is due to the performance required to dedupe data. Special consideration should be considered with respect to the flash device chosen, specifically flash endurance. Since dedupe and compression will be busy on the disk, you want to make sure you select a flash device that has the proper characteristics needed for greater longevity (important to consider since a single device failure will make the entire disk group appear unhealthy).
Another important aspect to consider is data locality. When Deduplication/Compression is enabled, it’s enabled at the cluster level; however, dedupe and compression is achieved per disk group level. This means only items within that specific disk group will be able to dedupe against each other. In other words, if you have the same VM within another disk group, the two identical VMs on separate disk groups won’t be able to dedupe against each other, but if the two identical VMs are within the same disk group then they will be able to dedupe against each other.
Now that we have a good understating of the feature we can head to the lab and see how simple it is to enable. As my previous post have mentioned, this is something that VMware has designed with simplicity in mind. To show you how easy it is to enable, let’s begin.
First, as I mentioned earlier, we will need to be all flash to enable this feature. Since this is a lab setting, I’ve marked the capacity disk as flash in order to test this feature. For demonstration purposes, I tore down my disk groups and will recreate them as all flash this time. This section will be a repeat from Wednesday’s blog, but all disk are marked as flash instead. Follow the steps from Wednesday’s blog to before moving to the next step, but instead of marking your disk as “HDD”, mark all as “Flash”. I am going to just insert the pictures since my previous blog posts covers this.
*Note, this step is not needed if your disk groups are already all flash.
As you can see, we now have all flash devices for our cache and capacity.
Now that we have repeated the steps in order to create new disk groups, we can move forward with enabling dedupe and compression.
To continue, select edit under “configure” on your vSAN cluster.
Under duplication and compression, select the “disabled” drop down and select “enabled”.
“Enabled” is now selected under Deduplication and Compression.
**Extremely important to note!** This change will require a rolling reformat of all disk in the vSAN cluster. This is best to be setup during the beginning since it will take some time to migrate the data, format, then move the data back. As always, it’s also best to have a good backup if you plan on enabling this with live data! While no data loss should occur, always prepare for the worse!
The reconfiguration will now take place. You will now see disk groups removed/added back, formatted, etc.
That’s it, you have just successfully enabled deduplication and compression. It really was just that easy!