Maintenance Mode within vSAN

Before building out your vSAN cluster, one thing that you should be conscious of is that of maintenance activities. Some time ago, someone told me “Anyone can make something work, but few take the time to think of the long term implications (years from now) of setting things up a certain way”.

In other words, administrators all to often make it their short term goal to fix something in the moment  “making something work”, but they don’t take into account how these “one offs” can make it very difficult for co-workers or future administrators. Of course you don’t find out about these until something goes terribly wrong, then you realize “someone” did something to “make it work” that more often than not, goes against best practices. The result is a ticking time bomb that normally rears its ugly head at the most inopportune time.

Ok enough already, what does this have to do with vSAN, and more importantly Maintenance mode?  Well, what if someone, unbeknownst to you, had their cluster configured with settings that put your data at risk? For example, what if you had a disk failure? One would normally pull the drive out and just replace it right? With vSAN this is the case, but it is also not the case. Depending on your failures to tolerate, etc, pulling a disk has the potential to impact VM availability. With that said, it is very important to have a complete understanding of your configured fault tolerances. The settings you select today will have a direct impact on what happens when you replace failed devices or place a host into maintenance mode.

Simply put, vSAN is easy to setup and configure, but special consideration needs to be taken into account when determining your fault tolerance method and amount of resources within your vSAN cluster. Resources, in this regard, are hosts/quantity of disks. This will not only dictate the Failure Tolerance Methods available to you (RAID 1 or 5/6), but also certain risks when performing maintenance activities. Ready for me to finally get to the point? Ok here it is!

This brings me to my next point, which is Maintenance Mode!

All VMware administrators are familiar with maintenance mode. You have multiple “Maintenance Mode”  iterations now: Hosts, Datastores and now vSAN hosts (Hosts and Datastore). When you place a vSAN hosts into maintenance mode, you are migrating some or all of the data residing within that hosts to other hosts within the vSAN cluster. Since this is different from your traditional maintenance modes, it’s important the potential risks involved when performing maintenance activities. Since vSAN is a little different, we need go over the process of placing a vSAN node into maintenance mode.

When you place a vSAN hosts in Maintenance mode, you are greeted with a window requesting data migration.

Capture

Available Data Migrations Methods

  • Ensure accessibility
  • Full Data Migration
  • No Data migration

 

Capture2

Each method is designed for specific circumstances. Each method involves a certain level of risks; therefore, it’s the administrators job to understand these risk and use logic and reasoning to make the best decision possible. Let’s dive a little deeper into each one.

  1. Ensure accessibility-This option is the default option when placing a vSAN host into maintenance mode. This method takes a minimum amount of time since it evacuates just enough data to maintain data availability when the hosts goes down. Best practices dictates this option is best utilized for short periods of time, such as a host upgrade or reboot. The reason being is due to the fact you are operating with an increased risk level while the hosts is down. It has the potential to impact VM data availability if you were to have another failure (component or hosts) within that maintenance timeframe.
  2. Full data migration- With this method, all VM data stored within that particular host is migrated off its local disk. Best practice is to utilize this method when decommissioning hosts or when your maintenance activity will continue for an extended period of time.  This method more safer that “Ensure accessibility”, but it could take some time to achieve. However, vSAN will take advantage of replica copies within the cluster to speed up the recompose the data.  This can greatly decrease your migration time. Also,  its dependent on other factors such as networking and the number of available hosts within the vSAN cluster.
  3. No Data Migration- This method speaks for itself. No object data will be migrated. Best practice is to use this method only if VMware support instructs you to.

This concludes today’s blog post. Something as simple as placing a host into maintenance mode can potentially impact VM data availability if the potential risk aren’t fully understood. It’s the administrator’s responsibility to understand these risks, and use logic and reasoning when making their decisions. For instance, applying updates/rebooting a host shouldn’t require a full data migration since it would tax the network and impact VM I/O (especially with large amounts of date), but it’s the administrators responsibility to have self awareness within the situation, weigh the risks vs the costs, and stand by their decision. The end result is making the best decision for not only the end users, but for all of those involved.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s