Becoming a vSAN Specialist: Section 2 – vSAN Fundamentals

VSAN-featured

In this section, I will go over the following objectives found within the VMware vSAN Specialist Blueprint: Section 2 – vSAN Fundamentals

Objective 2.1 – Provide a high-level description of vSAN

Introduction to vSAN

vSAN is an enterprise-class software storage solution built directly into the VMware platform. It runs on commodity hardware (x86) or vSAN Ready nodes.  What does this mean? Instead of having a separate software solution controlling the storage, the actual ESXi hosts alongside with vSphere have the vSAN technology (Software Defined Storage) built directly into the kernel/software. This software then utilizes the commodity hardware (compute, storage, network) within the host/appliance to create the perfect marriage of virtualization and software defined storage. It utilizes storage policies to intelligently place VM objects on underlying local storage. This is the special sauce that makes vSAN so great. It automates storage on many levels, which in turn leads to significant simplification with regards to how storage is provisioned and managed.

Why is this important? Instead of having to buy separate software, you can utilize this software since it’s already a part of vSphere. This in combination with local disk installed within x86 hardware makes the vSAN solution a truly modern and software defined solution. This reduces costs and complexity.

Summary and benefits

Back in the day Mainframes had everything in one rack.  We have moved from that to complicated infrastructures that include hosts, storage, and network. These are all separate silos throughout the Data Center. This has lead to increased costs and complexity. Fast forward to today, and we have come full circle. Instead of separate silos, we are now seeing the benefits of having one unified silo under the software stack, also known as the “single pane of glass” 🙂

For example, today if you need more storage, you have to buy a new storage array and set it up to the existing array. Then you have to carve out new LUNS, migrate the VMs, and finally decommission the old storage array. This all adds up to one word…Expense! What if I told you if you needed more storage that you could just bolt on another “block”? Need more compute? There’s a block for that. Instead of separate verticals, you now have combined verticals, which can lead to increased system efficiences. Instead of making large investments on the front end (over-provisioning) you now incrementally increase. This is radically different from today’s traditional setups, and is one of the many reasons why vSAN is quickly gaining popularity.

Objective 2.2 – Describe vSAN requirements

vSAN Requirements are something that’s going to vary depending depending on each deployment scenario. Below are the minimum requirements required to run vSAN. Ensure all hardware used within vSAN is documented within the VMware HCL.

Hardware Requirements (Cluster/Memory/Storage)

  1. Minimum of three ESXi hosts for standard deployments, minimum of two ESXi hosts and a witness hosts for the smallest deployment (ROBO). Each of the three ESXi hosts must contribute it’s local storage
  2. Minimum 32GB per ESXi Host in order to use the maximum number of disk groups (5) and capacity devices (7).
  3. VMware vCenter server
  4. One device for cache tier. Cache device can be either SSD (SAS/SATA) or PCIe Flash
  5. One device for capacity tier. Capacity device can be either SSD Flash (SAS/SATA), Magnetic Disk (SAS/NL-SAS), or PCIe Flash.
  6. One disk controller utilizing Pass through mode, which is also known as JBOD mode. This method is best practice and most recommended; however, RAID-0 can be utilized if necessary.
  7. One Flash Boot Device. USB/SD/SATADOM devices can all be used to install ESXi on. This frees up your local disks in order to be used for vSAN. Ensure you have a device over 4GB. If your ESXi has more than 512GB of memory you will need to use a local disk or SATADOM device that’s at least 16GB in size (to ensure enough space is present for logs/dumps).  If using SATADOM, ensure it’s SLC or better.

Network Requirements

  1. Dedicated network port for vSAN traffic.
  2. 10GB (dedicated or shared) highly recommended, required for all flash deployments
  3. 1GB dedicated for hybrid setups. Real work environments would suffer with 1GB (Minus ROBO)
  4. vSAN vmkernal port required for each ESXi host, even if it isn’t contributing storage.
  5. ESXi hosts within a vSAN cluser must all utilize Layer 2/3 upstream

License Requirements

The below graphic was taken from official VMware documentation.

vSAN License

Source: vSAN Licensing Documentation

Objective 2.3 – Understand how vSAN stores and protects data

VM Namespace

Within vSAN, each VM has its information stored  in a VM namespace directory. This directory/container is comprised of files, metadata and other components. When you create a VM on the vSAN datastore the VMs “objects” are placed here. This is an enabled to vSAN being able to break up a VM and protect it based on Storage Policies.

Storage Policy-Based Management (SPBM)

According to VMware Official documentation, VM Storage policies are specific requirements placed upon underlying storage to ensure desired characteristics are met.

Within vSAN, VM storage policies are not applied to the underlying storage, but to the VM/VMDKs themselves. This is a huge step forward within the Software Defined Datacenter. The biggest benefit is that it creates flexibility in allowing the VMs to move and keep its policies. This in turn creates greater availability and protection since you can now surgically create polices and assign them to specific VMs. Also, you could create a Storage Policy for Mission Critical VMs to ensure VM availability after x number of failures within your vSAN cluster. The best part is that the policy lives with the VM, which increases automation and reduces risks.

For example, say you have a Test VM that isn’t mission critical. If you wish to create a failures to tolerate=1 policy within vSAN, then you would create a policy and assign it to the specified VM. Once you have created the policy and deployed the VM to the associated policy, vSAN is responsible for maintaining the requirements on the Test VM. Another great feature is the ability to change these policies with no downtime or migrations needed. This leads to increased flexibility since the VM themselves dictate which storage it can be on, based on it’s assigned policy. Listed below are the capabilities that can be utilized to create vSAN specific Storage policies. This list was taken from my book Essential Virtual SAN (VSAN).

VM Storage Policies within vSAN 6.2

  • Number of failures to tolerate
  • Number of disk stripes per object
  • Failure tolerance method
  • IOPS limit for object
  • Disable object checksum
  • Flash read cache reservation
  • Object space reservation
  • Force provisioning

*Re-Synchronization, Important Note Regarding Storage Policies*

While Storage Policies can be changed in real time with no downtime, caution should be used when making many changes at once, etc. When you change an existing storage policy, the cluster will initiate re-synchronization. During this time, vSAN copies the necessary data to ensure compliance is met. Once completed, the original is discarded. This will also occur during certain host failure scenarios, etc. It’s important to note the performance impacts this could cause during production hours. Additionally, this could cause a temporary increase  in capacity. This is important when sizing out your environment. Make sure you factor this in!

Failures to tolerate

Failures to tolerate is a method vSAN employs to protect against outages. The greater the FTT, the less risk. The trade off being added expense in terms of extra copy data. FTT can vary greatly, from number of disk that have to fail in order to cause unavailability, to failure of an entire host or rack. If primary failures to tolerate is set to one within a vSAN cluster, than the cluster can withstand a single failure (hosts, disk, etc). If the PFTT=2, then there can be two failures without impacting availability, assuming fault domains are setup correctly utilizing the appropriate amount of ESXi hosts. Fault domains are critical to ensuring vSAN objects are not stored within the same hosts or rack. This ensures availability because protection components are stored within separate fault domains.

Fault domains and Rack Awareness

Fault domains are collections of components spread evenly across ESXi Hosts. By default, each host acts as a fault domain. However, if you have multiple hosts in the same rack you are at risk of unavailability if the rack itself is impacted. Protection can be further increased with rack awareness, which takes fault domains and spans them over many racks. This in turn allows for an entire rack to fail and still not impact VM availability within the vSAN environment. If enabled, vSAN will match the storage policy of the appropriate VM to the fault domain as opposed to matching it with an ESXi host. Since the fault domain knows about rack awareness, it ensures components are placed across separate racks. Per VMware documentation, it’s best to utilize at least four fault domains. While three can be used, it is not recommended because there is a risk of not having the necessary capacity to protect data during a data evacuation event. In simple terms, ensure there are enough fault domains to satisfy the number of failures to tolerate. The below image helps visualize VM components placed within different racks as separate fault domains.

vSAN Rack Awareness

Image Source: vcallaway.com

RAID 5/6

This is considered a form of fault tolerance and also encompasses space efficiency. Also known as erasure coding, his form of fault tolerance dramatically improves your space utilization as compared to RAID-1. Instead of having an exact copy which requires twice the capacity (RAID-1), erasure encoding uses parity to guarantee lower utilization. RAID-5 can withstand one disk or one ESXi host failure by default whereas RAID-6 can withstand up to two failures total by default. These numbers may vary if PFTT is set to 2.

RAID 1

This is considered a form of fault tolerance. If RAID-1 is utilized, then an exact replica of the components are kept. VM availability remains as long as less than one ESXi hosts or one disk fail. Anything greater than one will impact VM availability. These numbers may vary if PFTT is set to 2.

The below graphic is taken from VMware’s official vSAN documentation. It shows the capacity required (Mirror or Parity) with relation to each RAID configuration.

vSAN RAID

Finally, this is a great chart from vSAN 6.2 Essentials that shows the number of ESXi hosts required based on FTT/FTM/Object Configuration

 

Raid Config

Objective 2.4 – Describe vSAN space efficiency features

Compression

Another great technology which allows you to squeeze more within the same footprint.

Deduplication and Compression in vSAN

How does Deduplication and Compression work within vSAN? The process begins with the cache tier. In order to achieve the best performance, vSAN keeps the most frequently referenced data within the cache device so that the most utilized data is hot and available for request; however, deduplication isn’t done in the cache tier, but rather the capacity tier. Once data is no longer being referenced for cache it becomes cold, and vSAN moves this data to the capacity tier. vSAN then dedupes the data within the capacity tier (where the deduplication actually occurs). Next, vSAN fingerprints the block using SHA1 hashing. If vSAN finds a match, the data is referenced to the existing hash and updated to start referencing that block of data; however, if there is no match, the block is fingerprinted and the data is kept. Finally, the last step is with regards to compression. If vSAN can compress the deduped 4KB block down below 2KB or less, it compresses. If not its original size is kept. This allows for even further data reduction.

So, as you can see, this technology can allow for great space savings within your environment. How do we get started? Fortunately, vSAN has made enabling this feature super simple. How simple? It’s as simple as checking a box to enable. The space savings alone will make a huge impact on the available storage needed, which can save you a significant amount of storage related dollars by utilizing this product.

Important facts regarding Dedupe/Compression within vSAN

  • Must be enabled together, you can’t enable each seperately
  • When enabled, must be enabled as a group within disks on the same disks group
  • Dedupe is done on the 4KB block level
  • If vSAN can compress the deduped 4KB block down below 2KB or less, it compresses. If not its original size is kept
  • Sha1 hash is used for deduplication
  • LZ4 is the compression algorithm used for compression
  • Single device failures will make the entire disk group appear unhealthy
  • Deduplication/Compression is done at the disk group level

With that said, this is something that should be strongly considered when deploying vSAN; however, it’s important to consider the performance implications. The first requirement is that your vSAN cluster must be all flash, which means not only must your cache devices be flash, but your capacity devices must also be flash. This requirement is due to the performance required to dedupe data. Special consideration should be considered with respect to the flash device chosen, specifically flash endurance. Since dedupe and compression will be busy on the disk, you want to make sure you select a flash device that has the proper characteristics needed for greater longevity (important to consider since a single device failure will make the entire disk group appear unhealthy).

Another important aspect to consider is data locality. When Deduplication/Compression is enabled, it’s enabled at the cluster level; however, dedupe and compression is achieved per disk group level.  This means only items within that specific disk group will be able to dedupe against each other. In other words, if you have the same VM within another disk group, the two identical VMs on separate disk groups won’t be able to dedupe against each other, but if the two identical VMs are within the same disk group then they will be able to dedupe against each other.

Concusion

As you can see vSAN is a very powerful platform. It is very important to scope out exactly what you want to accomplish before setting up vSAN. Many features are specific to all flash, etc. It’s important to ensure you calculate the needed capacity for the various offerings vSAN has. Your deployment decisions and protection methods used will have a direct impact on all components that encompass vSAN.