APD/PDL in vSphere 6 with VMAX and VPLEX

Recently I have seen a number of internal emails here at EMC asking about our recommendations around All Paths Dead (APD) and Permanent Device Loss (PDL) in vSphere 6 on our storage arrays – mostly VPLEX to be specific. There have been significant changes in APD/PDL in vSphere 6 which is why the questions are being posed. It just so happened that I was updating my VMAX TechBook on this topic but as I won’t publish until June I thought I could preview it here.

Let’s start with a quick primer here on the terms.

All paths down or APD, occurs on an ESXi host when a storage device is removed in an uncontrolled manner from the host (or the device fails), and the VMkernel core storage stack does not know how long the loss of device access will last. VMware, however, assumes the condition is temporary. A typical way of getting into APD would be if zoning was removed.

Permanent device loss or PDL, is similar to APD (and hence why initially VMware could not distinguish between the two) except it represents an unrecoverable loss of access to the storage. VMware assumes the storage is never coming back. Removing the device from the storage group could cause this.

VMware’s approach to APD/PDL has evolved over the vSphere versions from not being able to distinguish between the two in vSphere 4, until today in vSphere 6 where a new capability called VM Component Protection is introduced. As the blog is for vSphere 6 I’m going to go straight to that though the VMAX TechBook has more info on the previous vSphere versions (and more in June) if you want to review (http://www.emc.com/collateral/hardware/solution-overview/h2529-vmware-esx-svr-w-symmetrix-wp-ldv.pdf).

vSphere 6 offers some new capabilities around APD and PDL for the HA cluster which allow automated recovery of VMs. The capabilities are enabled through a new feature in vSphere 6 called VM Component Protection or VMCP. When VMCP is enabled, vSphere can detect datastore accessibility failures, APD or PDL, and then recover affected virtual machines. VMCP allows the user to determine the response that vSphere HA will make, ranging from the creation of event alarms to virtual machine restarts on other hosts.

VMCP is enabled in the vSphere HA edit screen of the vSphere Web Client. Note that like all new vSphere 6 features, this functionality is not available in the thick client. Navigate to the cluster, then the Manage tab and Settings sub-tab. Highlight the vSphere HA under Services and select Edit. Check the box for “Protect against Storage Connectivity Loss”. Here is how that looks:

Click to enlarge in new window

Once VMCP is enabled, storage protection levels and virtual machine remediations can be chosen for APD and PDL conditions as shown below:

Click to enlarge in new window

The PDL settings are the simpler of the two failure conditions to configure. This is because there are only two choices: vSphere can issue events, or it can initiate power off of the VMs and restart them on the surviving host(s). As the purpose of HA is to keep the VMs running, the default choice should always be to power off and restart. Once either option is selected, the table at the top of the edit settings is updated to reflect that choice.

Click to enlarge in new window

As APD events are by nature transient, and not a permanent condition like PDL, VMware provides a more nuanced ability to control the behavior within VMCP. Essentially, however, there are still two options to choose from: vSphere can issue events, or it can initiate power off of the VMs and restart them on the surviving host(s) (aggressively or conservatively). The output is similar to PDL.

Click to enlarge in new window

If issue events is selected, vSphere will do nothing more than notify the user through events when an APD event occurs. As such no further configuration is necessary. If, however, either aggressive or conservative restart of the VMs is chosen, additional options may be selected to further define how vSphere is to behave. The formerly grayed-out option “Delay for VM failover for APD” is now available and a minute value can be selected after which the restart of the VMs would proceed. Note that this delay is in addition to the default 140 second APD timeout. The difference in approaches to restarting the VMs is straightforward. If the outcome of the VM failover is unknown, say in the situation of a network partition, then the conservative approach would not terminate the VM, while the aggressive approach would. Note that if the cluster does not have sufficient resources, neither approach will terminate the VM.

In addition to setting the delay for the restart of the VMs, the user can choose whether vSphere should take action if the APD condition resolves before the user-configured delay period is reached. If the setting “Response for APD recovery after APD timeout” is set to “Reset VMs”, and APD recovers before the delay is complete, the affected VMs will be reset which will recover the applications that were impacted by the IO failures. This setting does not have any impact if vSphere is only configured to issue events in the case of APD. VMware and EMC recommend leaving this set to disabled so as to not unnecessarily disrupt the VMs. The additional settings are noted below:

Click to enlarge in new window

As previously stated, since the purpose of HA is to maintain the availability of VMs, VMware and EMC recommend setting APD to power off and restart the VMs conservatively. Depending on the business requirements, the 3 minute default delay can be adjusted higher or lower.

If you are looking to see whether APD/PDL has been detected, VMware has a new view under the monitor tab of the cluster and sub-tab vSphere HA. If either condition occurs, an entry will appear like so:

Click to enlarge in new window

Note that this is a “current” view and does not record historical occurrences. I’ve noticed that sometimes PDL is tough to catch. If you open the vmkernel.log file though and search for “PDL” you’ll see an entry like this:

Click to enlarge in new window

I hope that was helpful to those running an HA cluster with vSphere 6. Remember the recommendations hold true for both VMAX and VPLEX. I will note one specific setting that VMware has gone back and forth on now over this past year. Originally VMware’s best practice for the host parameter Disk.AutoremoveOnPDL, was to disable it for vMSC. VMware has now reversed course and recommends leaving it as default (as we do on VMAX). The KB is here, though it makes no mention of VMware’s previous position: PDL AutoRemove. In any case since as it is now default, nothing needs to be done.

24 thoughts on “APD/PDL in vSphere 6 with VMAX and VPLEX”

Add yours

virtualizer says:

August 19, 2015 at 9:24 am

Reblogged this on Peter van den Bosch Blogsite and commented:
Great blog about the consequences of upgrading to vSphere 6 and new enhanced functionality.

Reply
1. Drew Tonnesen says:
  
  August 19, 2015 at 10:06 am
  
  Thanks, glad you found it helpful.
  
  Reply
Elad says:

November 12, 2015 at 8:45 am

im just moving to metro cluster wancom under Xio and found that article very helpful.
BTW
do i have to be in esxi 6 in order to initiate the process or esx5.5 is compitble as well?

regrads

Reply
1. Drew Tonnesen says:
  
  November 12, 2015 at 11:09 am
  
  Thanks. So the whole concept of VMCP I talk about is a new feature in vSphere 6 so you won’t see it in vSphere 5.5. There are some APD/PDL capabilities in vSphere 5.5 that you can take advantage of, however. Though my TechBook is VMAX-based, the APD/PDL discussion is not array specific so it should help guide you: http://www.emc.com/collateral/hardware/solution-overview/h2529-vmware-esx-svr-w-symmetrix-wp-ldv.pdf (start on page 94).
  
  Reply
kurimargo says:

April 11, 2016 at 2:41 pm

Disabling Disk.AutoremoveOnPDL option is interesting. VMware itself suggest to keep it enabled for vSphere Metro Storage Cluster (vMSC) environments:
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2059622

Reply
1. Drew Tonnesen says:
  
  April 11, 2016 at 2:54 pm
  
  It actually depends on where you look: http://www.vmware.com/resources/techresources/10482
  
  I chose to go with the Duncan’s paper as he is considered the expert on this and had no problems in the VPLEX environment disabling the setting. Conversely with SRDF/Metro and vMSC you have to leave the default (enabled) to prevent issues when suspending the pairs. Leaving the default on VPLEX, however, is not going to be a problem if you prefer to use the KB over the vMSC paper.
  
  Reply
2. Drew Tonnesen says:
  
  April 19, 2016 at 2:06 pm
  
  I had a discussion with the vSphere architects and they told me they have reversed their recommendation to align with the new KB article.
  
  Reply
Cezar L D says:

June 12, 2017 at 10:37 pm

Hi very good article.. I am looking for some Best pract. about deploying Vcenter on a Vplex Metro Cluster.

Reply
1. Drew Tonnesen says:
  
  June 12, 2017 at 11:37 pm
  
  I have a paper on VPLEX Metro which follows best practices – http://www.emc.com/collateral/white-papers/h11767-vmware-vplex-rp-srm-wp.pdf. You can also check out VMware’s KB article on kb.vmware.com – 2007545.
  
  Reply
Cezar L D says:

June 13, 2017 at 11:41 pm

Hi Drew I’m looking for some documentation that talks about implementing the vcenter Server Appliance in a Vplex Metro environment. What is the best practice? Linked mode? I have an environment that is in phase of homolagação and I did not find anything from the side of the EMC on this, nor in the Vmware. The test I performed on PDL worked with normal VMs, but with VCSA it did not bring me good behaviors. Of course, if it falls everything goes together, My doubt is only with this VCSA VM. What to do.

Reply
1. Drew Tonnesen says:
  
  June 14, 2017 at 12:35 am
  
  Well, in a VPLEX Metro environment you normally would not use linked mode (or enhanced linked mode as it is known) unless you are using VMware SRM which requires multiple vCenters. For a typical VPLEX Metro setup you use a single vCenter – that’s the idea behind active/active where all hosts are controlled within the same vCenter but are attached to different arrays/clusters. Whether you use VCSA or a Windows vCenter is up to you. Remember, ideally your vCenter should be in a third location apart from either Metro cluster. I am guessing by your comment that you tried a PDL test against VCSA itself to see how it would respond. I have not specifically tested VCSA for PDL, but the loss of a vCenter doesn’t mean it impacts your ESXi hosts so the other VMs would not be impacted. I’ll see if I can find anything about your particular use case – please explain what the bad behaviors are you see. Also be sure (depending on your VMware version) you are not hitting this issue https://drewtonnesen.wordpress.com/2016/02/11/vmware-nmp-pdl-bug/
  
  Reply
Cezar L D says:

June 14, 2017 at 12:57 am

thanks for the support! In my case I’m facing exactly what you described. Yes it is a typical Vplex metro environment, and I do not have the 3rd site. So, what is the best practice? In this Kb Vmware does not mention anything about it, but Vcenter illustratively appears to be in 3rd place. https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007545

The bad behavior when I simulated is when a total disruption of the WANCOM occurs everything goes down, the HA not works only intervening manually to recover.
I’ve seen your post, but I’m using 6.0.4192238 (update 2). Which would not apply.

If you have something about it, that would be great! Thank you so much

Reply
1. Drew Tonnesen says:
  
  June 14, 2017 at 2:26 pm
  
  Best practice is the vCenter at a third site.
  
  Reply
Cezar L D says:

June 14, 2017 at 1:04 am

Understand “simulating”: did a Vmotion in VCSA and 1 VM to site 2 and intentionally apply a total interruption of communication, isolating the 2nd site .. The Vplex witness remains serving IO on site 1. But the VCSA and the others VMs remain orphaned on the other site, and not recover on site1.

Reply
Cezar L D says:

June 14, 2017 at 1:06 am

if a Vmotion in 1 VM to site 2 and intentionally apply a total interruption of communication, isolating the 2nd site, the Vplex witness remains serving IO on site 1. and the VM recover on site1 succesfully.

you got it?

Reply
1. Drew Tonnesen says:
  
  June 14, 2017 at 2:31 pm
  
  Well not sure I got it exactly. Let me start with what you are doing in the test. Below you said you are running a PDL test but here you are talking about dropping the link between VPLEX clusters which is not a PDL test, rather it sounds like you are doing a regular VPLEX Metro test. A PDL test would be if you removed the distributed device from one of the cluster views. Can you clarify what you are trying to test?
  
  Reply
Cezar L D says:

June 14, 2017 at 3:02 pm

Yes, it is a similar what you say.. I am simulating a vplex disaster. The I/O is being suspended on the 2º site and the vcenter is not going up on the first site.

Reply
1. Drew Tonnesen says:
  
  June 14, 2017 at 3:09 pm
  
  And the vCenter has no relationship to either VPLEX Cluster correct? It is just a VCSA you installed or controls some unrelated environment?
  
  Reply
Cezar L D says:

June 14, 2017 at 3:24 pm

Yes it just a VCSA to manage their our hosts. Nothing a part.

Reply
1. Drew Tonnesen says:
  
  June 14, 2017 at 5:11 pm
  
  I’ve researched this quite a bit and cannot find any similar issues. I assume you are not using DRS Affinity Rules or if you are you have not prevented VCSA from restarting on a node that is not local. Are there any errors reported by the vCenter that is controlling the VPLEX hosts? When VMs fail to properly failover, there are usually events recorded indicating some reason for this. At the very least is there an event that reports an HA incident occurred so we know at that level things work? It may be you need a VMware SR to examine what is failing. There are other options for VCSA like fault tolerance, but all things being equal, there is no difference from VMware’s perspective between HA for VCSA and HA for a different type of VM.
  
  Reply
Cezar L D says:

June 16, 2017 at 8:42 pm

Hi Drew I talked with Vmware guys and they help me on some things.. now I am just facing a curious bit thing..when the volumes come back I always need to run a refresh on my dataStores to make things normal.. any trick about that?

Reply
Drew Tonnesen says:

June 16, 2017 at 8:48 pm

A refresh or a rescan?

Reply
1. Cezar L D says:
  
  June 16, 2017 at 9:01 pm
  
  Refresh. (the hosts stays with a alarm, and if a run a Refresh they get normal.
  
  Reply
  1. Drew Tonnesen says:
    
    June 16, 2017 at 9:16 pm
    
    No, I can’t say I know a way around that.

Leave a comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.