Back in September of last year I did a post on a new VPLEX SRA which supported SRM stretched storage (VPLEX SRM stretched). What can I say they beat us to the punch so credit where credit is due. But today is a new day as we announce the SRDF SRA 6.1 and support for SRM 6.1 in general, but more importantly in an SRDF/Metro environment. Along with the SRDF SRA 6.1 we of course have our complementary (and complimentary) software SRDF Adapter Utilities (SRDF-AU) 6.1 to modify the XML files in an easy to use GUI interface. As my VPLEX post was extensive in terms of explaining the VMware setup, I don’t want to rehash it all; instead I’ll try to keep this in more of an outline form and if anything is not clear, feel free to refer to the VPLEX post. Hopefully the demos will fill in any gaps. I have also updated my SRDF/Metro paper to include an appendix on the SRM stretched storage configuration. It does assume the reader is quite familiar with using the SRDF SRA with SRM, but it includes things I will not cover below. You can find it here: SRDF/Metro with vMSC.
Let’s start by acknowledging that SRM performs the same functions in an active/active or active/passive configuration: Testing, Planned Migration, or Failover. Customers may have different reasons for deploying each configuration, but once setup, SRM automates/orchestrates the movement of resources from one site to the other site. The differences come into play in how the initial setup is done. And even that can be boiled down to (almost) one thing: Storage Policies. SRM stretched storage uses tag-based storage policies in construction of the protection groups, rather than datastore groups.
Now I took a little liberty in my consolidation down to storage policies, hence my “almost”. If you are using SRDF/Metro in a vMSC configuration you know that despite having multiple locations, you have a single vCenter. SRM doesn’t have a role in a single vCenter so in order to use SRM with stretched storage, the ESXi hosts must be broken up by location and put in their own vCenters. That new configuration must use Enhanced Link Mode (ELM) – the sharing of a Single Sign-on (SSO) located in the Platform Services Controller (PSC). When both vCenters share the PSC, they are listed together no matter which one you log into:
So assuming you have your SRDF/Metro setup using Bias – SRDF SRA 6.1 does NOT support Witness, the vCenters in ELM, and SRM and the SRDF SRA installed, you can proceed with the SRM setup. You will:
- Pair the sites
- Configure the array managers
- Configure the inventory mappings (save for Storage Policy Mappings)
The “new” steps you will need to complete:
- Create a new tag and category (just one of each)
- Assign that tag to every SRDF/Metro datastore on each site
- Create 2 new storage policies – one for each site (vCenter) – which are based on the new tag – a caveat here that storage policies do not support the use of RDMs, unlike datastore groups
- Configure the Storage Policy Mappings (if they aren’t the same name you must do manual mapping)
- Apply each storage policy to the VMs at each site
The final steps will be to create the protection group and associated recovery plan. The protection group will require you to select the new radio button in SRM 6.1, Storage policies, and then simply select the appropriate policy for the protection site:
OK enough prelude. Let’s begin with a quick demo where I navigate through the setup we just spoke about. It will provide the basis for the other two demos – test and planned migration. Yes, we do have a test capability (sorry VPLEX). Note best to use 720p for all videos.
Hopefully that provided a good overview of the setup with SRDF/Metro and SRM 6.1. Now that the protection group and recovery plan are in place, let’s start with the test functionality. I am using SnapVX as it is the most efficient and preferred TimeFinder technology on the VMAX3. I have created all my devices and presented them to the R2 side (dsib2117 if you recall from the setup demo). Now I am going to take advantage of the SRDF-AU 6.1.
The test functionality is where that SRDF-AU comes in mighty handy when setting up the pairs. If you’ve used the SRDF-AU there is nothing new for SRDF/Metro, save the support. SRDF/Metro pairs will show as “active”. I did an auto pairing and then downloaded the file to my recovery site in the appropriate directory of the SRA. The SRDF-AU requires that the vCenter is on Windows. It does not support the vCenter Server Appliance (VCSA). In my environment the vCenter and SRM are on the same Windows host. Here is a shot of the SRDF-AU and my pairings.
Just remember that testing in SRDF/Metro is really only testing those VMs from the protection site and not the recovery site. What I mean by that is if you are using your SRDF/Metro to its full advantage you will be running VMs on hosts in both vCenters. Hence when you run a test failover to the recovery site, the only VMs that will enter a test state are those on hosts on the protection site. The ones on the recovery site will not be duplicated, but of course they are running. The demo should make sense of that. I’ve included a quick look at the SRDF-AU also.
Earlier I alluded to the fact that stretched storage SRM might be used for reasons different than an active/passive SRM. One such reason would be to enable an automated failover when conducting site maintenance. So if I had to take down a component of my protection site, I could do a planned migration which will vMotion the VMs over to the recovery site and then suspend the SRDF/Metro pairs. Once the maintenance was complete I would run a reprotect which I then could follow by another planned migration to do maintenance on the recovery site. This demo will walk you through the initial planned migration and reprotect.
I know some of those transitions in the video may have been rushed for those not familiar with SRM, but if you have questions even after viewing it you are welcome to leave a comment/question.
I would be remiss not to mention 2 VMware bugs I found during stretched storage testing (I’m running vSphere 6.0 U2). One impacts SRM but the other seemingly does not. The more serious of the two bugs concerns the amount of free space in the datastore. When VMware does a cross-vMotion of the VMs from one vCenter to another, it checks the amount of free space on the stretched datastores. If the amount of free space is not 2x the size of the vmdks being moved, the vMotion will not proceed. You will get an insufficient disk space error like this:
But you say why in the world would VMware be checking for space in a shared datastore as if it were doing a Storage vMotion? And I would say good question which is why I opened the SR to ask VMware. Unfortunately VMware doesn’t seem to know. They do know where in the code this happens and what they have told me is that the location is one that touches so many other functions that a fix is not forthcoming. This bug does have a workaround, albeit at a cost. You need to increase the amount of free space in your datastores. In my lab environment this was easier said than done since though I work for a storage company, I don’t have all the disk in the world and my datastores are typically packed. Further complicating the situation is that right now SRDF/Metro is not capable of on-the-fly changes to the environment like the one required here, so it took some time to expand the devices and re-instantiate SRDF/Metro. But now that I have warned you, hopefully you can take care of it before it becomes and issue. BTW there is an easy way to find out if you will hit this problem in a planned migration without manually running cross-vMotions on all the VMs. Simply run a test failover. SRM will include a warning about the space if it will impact planned migration (sorry about the color of the text that is all VMware):
VMware apparently did fix this bug in 6.0 U3 despite claiming they would not. The RN is here. Note I have not had a chance to test this so I am taking their word on it. The key entry is here:
The second bug also impacts cross-vMotion. This one, based on my testing, does not affect SRM which is surprising (but good). If your VM has vmdks in more than one datastore (as most complex VMs do), when you attempt a manual cross-vMotion VMware will fail to recognize the remote cluster/ESXi hosts as compatible. I have an example below from my environment where my application server has one datastore and sees both clusters while my Oracle DB server has multiple datastores and only sees the local cluster.
As I said it is surprising that SRM would not have this problem during a planned migration since it is going to vMotion both these VMs over; however during my testing the only problem I ever hit was the disk space one. So whatever checks SRM does before the vMotion, it apparently skips the step that causes the failure to find the remote hosts when running a a manual vMotion. Note however, if you do a lot of SRM testing like I do, this bug is still a pain since after a planned migration test you will want to get the VMs separated again between sites. If you can’t vMotion the VM then you have to shutdown the VM, unregister it and then re-register on the other side.
VMware engineering has finally responded to the bug concerning cross-vMotion. They say VMware will only allow you to migrate a VM (compute resource) if it’s on a single datastore. In the case of multiple datastores the computation is too hard and vSphere cannot handle it. Therefore you are required to select “Change both compute resource and storage” when doing a cross-vMotion with multiple datastores even if those datastores are shared (SRDF/Metro). This means that you have to manually write down in what datastore each of the vmdks are located, and then use the advanced mapping capability in the migrate wizard to select the same datastore for each vmdk when migrating from one vCenter to the other. Not only is it not user-friendly but it is very time consuming.
One more update…VMware now says this is not a bug but rather an unsupported configuration. They have updated their release notes for 6.0 U2 with the following:
And they will not fix the storage issue where you need more than 2x free space of your vmdks in the VM.
So that does it. The SRDF SRA and SRDF-AU can be found below.
The VMware certification is shown here:
Besides the product documentation there is my whitepaper I mentioned above. I have not updated the SRDF SRA TechBook yet for this release – will definitely be this quarter.
Also a final reminder that SRDF/Metro cannot be part of a 3-site implementation. There is no support to put an asynchronous leg onto either site. That is planned for a future release, however.