VPLEX Metro and VMware SRM 6.1

With the release of VMware vCenter Site Recovery Manager 6.1, VMware adds a new type of functionality to their disaster recovery software, support of stretched storage.  SRM has been around for quite a while now, offering the ability to create a DR site for your VMware infrastructure.  The data replication under the covers can be handled by VMware with vSphere Replication, or more typically the storage array using, for example, SRDF on the VMAX.  Depending on the software doing the replication, those two arrays need not even be the same, e.g. RecoverPoint.  SRM handles the orchestration/automation of the solution and when used in conjunction with storage replication software, communicates to the array through a Storage Replication Adapter (SRA).  EMC storage has any number of SRAs for our various replication software solutions, though until now there was none for VPLEX.  If you had VPLEX and wanted to use SRM, you would implement RecoverPoint on top of VPLEX and use its SRA.  What is different about SRM 6.1, however, is that it supports stretched storage, a configuration inherent to VPLEX Metro.  With this support EMC was then able to create an SRA for VPLEX.

So any one reading this who is familiar with SRM and VPLEX Metro might be scratching their head a bit trying to understand why SRM would support stretched storage which by definition is already available on both sites.  It’s a fair question in my mind.  After all VPLEX Metro supports the Metro Stretched Cluster (vMSC) and is generally presented as an HA solution.  I wrote a whitepaper about such a configuration and integrated SRM at a third site with RecoverPoint (SRM RP WP).  In a VPLEX Metro vMSC, all the hosts are in the same vCenter and the VMs can be easily vMotioned from one host to another, one site to another.  In the event of a failure, rules can be configured which tell the VMs where to go.  So why SRM with stretched storage?  Well, the vMSC setup can be a bit complicated.  The rules setup in particular can be tricky to get right.  VMware looked at all this and spoke with customers, and the desire was there to have some better orchestration and automation – and what does that better than SRM.  What SRM 6.1 offers, in conjunction with the VPLEX SRA, is the same capability it offers today but with stretched storage.  It will take care of bringing up all the VMs on the other side, performing those customization (e.g. IP change) as required.  In a DR rather than relying on rules, or user intervention (re-inventory), SRM will take care of it all.  And because it is shared storage, it can do this much faster than traditional SRM implementations as the storage SRA does not need to perform the typical replication tasks, and datastores do not need to be resignatured and mounted.  I think the other compelling feature for stretched storage is planned migration.  In a VPLEX Metro this would allow customers to do maintenance on one side of the cluster after migrating the VMs over to the other cluster using SRM.  Now of course this can be done today, but it is a manual or scripted process, not fully automated.

Well enough with the overview and explanations.  I wanted to show you how the SRM 6.1 setup is different for stretched storage.  I’m going to go through some of the important requirements and steps with screenshots and then include some demos.  I am going to assume SRM knowledge on behalf of the readers as I will skip some of the setup.

Requirements:

  • VPLEX SRA 6.1.0.87 VPLEX SRA binary  VPLEX SRA Release Notes
  • VPLEX Metro GeoSynchrony 5.4+ In my configuration I used a witness which is preferable, though not required. The VPLEX administrator’s guide explains how a witness works as well as all failover scenarios.
  • VMware vCenter SRM 6.1 (VMware SRM 6.1)
  • vCenter Server 6.x in Enhanced Linked Mode (explained below)
  • Customized cluster site names are NOT supported.  If your cluster is not currently named “cluster-1” and “cluster-2” you must rename them back to the default.

The setup for VPLEX Metro with SRM adheres to the SRM requirements in that there are 2 vCenters, not 1 as with vMSC.  The difference comes with the aforementioned Enhanced Linked Mode (ELM – my abbreviation).  ELM allows cross-vCenter migrations of virtual machines, and hence why it is required for the stretched storage SRM setup.  The way ELM works is this:  With vSphere 6 vCenter there are two components – a Platform Service Controller (PSC) and the vCenter itself.  The PSC contains the SSO domain.  Each vCenter in an SRM setup that supports stretched storage must share a PSC.  This PSC could be external to each vCenter or included with one vCenter and then shared with the other vCenter (NB: In more recent vSphere versions embedded PSC is not supported).  That is my setup.  I installed the VCSA on the protected site with the PSC included.  Then on the recovery site I just installed the vCenter component and pointed to the protected site PSC.  Remember, even if you use the VCSA you still need a Windows x64 for SRM and the VPLEX SRA.  I installed each of those components on their own server, pointing to the PSC and the appropriate vCenter with which to register.  The VPLEX SRA does not require any additional software.  Here is what the vCenters look like in ELM.

elmClick to enlarge – use browser back button to return to post

 

With the vCenters and SRM/VPLEX SRA setup, I then paired my sites.  Here you can see the monitor tab where the SRAs are listed.  Note that new column at the bottom that says “Stretched Storage”.  If the SRA supports that, it will say, oddly enough, “Supported”.

pairedClick to enlarge – use browser back button to return to post

The next step is going to be to add the array managers.  This proceeds the same as any SRM configuration.  After choosing the VPLEX SRA, I supply the information in the screenshot.  As this VPLEX serves many purposes in the lab you can see I included the optional filter for the consistency group.  In the following screen I added the other array and then enabled the storage.

Click to enlarge – use browser back button to return to post

Once complete, the enabled clusters appear along with the VPLEX distributed devices.  Note how the local and remote devices have the same name.

array_manager_2Click to enlarge – use browser back button to return to post

I should mention that there are some rules about the device presentation you’ll find in the VPLEX SRA release notes.  Most important I think is that the Detach Rule of the distributed consistency group must be setup to have the protected site (cluster-1 in my setup) as the winner. Note that in the event of a failover, the SRA will change the detach rule to the other site as it will now be considered the protection site.

cgClick to enlarge – use browser back button to return to post

So far, so good.  The SRM setup up to this point should be familiar to anyone who has used this product.  Our next logical step is to create the protection group and then recovery plan.  Here, however, is where the paradigm changes when using stretched storage.  VMware uses tag-based storage profiles to setup the protection groups.  If you have used the VASA 1 integration with VMAX or another platform you might remember creating profiles based on disk type (for a reminder VASA WP) or perhaps you have used them in general for helping VMware users deciding what datastores to use for particular VMs (also critical with upcoming VVols).  BTW VMware also uses them for Storage DRS (KB 2108196) in conjunction with SRM.  For SRM these profiles are tag-based so the first thing we need to do is to create a tag.  In addition to the tag, we assign it a category which allows grouping of multiple tags.  Once created, the tag is assigned to the VPLEX datastores.  First we’ll create the tag and the category, and then the storage profile, and finally assign the storage profile to the VMs in the datastores.  Here is a graphical step-by-step of tag/category creation.  The image is not quite crystal clear but I think the steps are.

Click to enlarge – use browser back button to return to post

In enhanced linked mode, only create a single tag on the protected site and use it for both storage policies.  As with other objects like networks and folders in SRM , there must be a mapping.   Here is the new mapping tab:

policy_mappingsClick to enlarge – use browser back button to return to post

The second part is to create the storage profile.  Once again I have presented a step-by-step screenshot.

profilesClick to enlarge – use browser back button to return to post

Because I named my profiles differently on each site, I actually had to do a manual pairing:

policy_mappings2Click to enlarge – use browser back button to return to post

Finally we need to assign the VMs in the VPLEX datastores to the new storage profile.

vm_profileClick to enlarge – use browser back button to return to post

OK datastores tagged, mapping and storage profile created, and VMs assigned, now time for the protection group.  I have the 4 step-process in the image below.  Note in particular the new radio button in the second screen for the storage profiles.

pgClick to enlarge – use browser back button to return to post

I’m not going to include the creation of the recovery plan since that has not changed.  You still select the protection group from which to build the recovery plan.  Let’s move on to an actual run of the recovery plan.  There is an important thing to understand before you begin “testing” your VPLEX SRM environment and that is that VPLEX has no inherent ability to take a copy off a device, distributed or otherwise, for presentation to the remote site.  This implementation is not like when RecoverPoint is integrated with VPLEX and SRM.  If you try to run a test failover with SRM, it is going to fail.  Therefore you have two options when you run a test:  planned migration or failover.  Because of the “finality” of failover, it is preferable to run a planned migration.  The migration would then be followed by a reprotect, a second planned migration, and another reprotect to return the environment to its original state.  If this were a typical SRM configuration I would agree with any statement that said “no test ability” sounds awful; but since this is stretched storage, we are never losing access to our environment.  Of course that is not to negate the potential performance impact while the VMs are vMotioned to the other vCenter and the need to run the migration during either a slow time or maintenance window; but it is nothing like a typical SRM planned migration where an outage is expected.  I won’t make you suffer through any more screenshots so here are the demos.

For the two demos, initially I have not included an audio track.  The videos are quite short and I did not think any additional information was necessary beyond the callouts given the content of the blog.  I will add audio in a few days though for those not directed from the blog post.  The first demo is the planned migration from NY to NJ followed by a second demo walking through how to reverse the migration (reprotect, planned migration, and reprotect) and get us back to NY as the protected site.

Planned Migration (use HD 720p for best viewing)

 

Reverse Migration (use HD 720p for best viewing)

Advertisements

28 thoughts on “VPLEX Metro and VMware SRM 6.1

    • After the cost associated with vSphere hosts in each site, SAN storage in each site, VPLEX, NSX or some technology to ensure L2 adjacency to support the vMotion across both sites, and the network links across & networking gear within each site, I find a $5k list price for vCenter as being prohibitive to pursuing this solution.

  1. Would you happen to know wether this will ever be supported on the VNX-series? We have a metro-cluster and are using SRM. It would be very nice to have this possibility… In fact, we have everything needed except a supported SRA 🙂

    • I’ll be honest I have never heard of a VNX Metro cluster, though admittedly I don’t work with that platform. Can you provide detail of the supported configuration of the VNX stretched storage implementation? I assume you are using vSphere Replication. I can then research it with the product and development groups to see if there is a roadmap. Thanks.

  2. I have yet to see an SRDF SRA for SRM 6.1. Are VPLEX and RecoverPoint the only options or will the 6.0 SRDF SRA work with SRM 6.1?

  3. Hi Drew, we are seeing an error Operation Error: Protection group ‘test-vm-vplex’ is protecting the consistency group ‘Test’ which does not have the site preference.

    We have the Detach Rule of the distributed consistency group setup as per your screenshot. The other thing I noticed was in our array pairs the status is ‘Failover Complete (Stretched Storage)’ as opposed to what you have in your failover demo ‘ <- Site Preference'

    Any ideas?

    Thanks
    Tony

    • Hi Tony,

      So if your status is Failover Complete I assume you are trying to run a reprotect? When a failover is run, the devices are unpresented from the primary site and the site preference moves to the failed over site. From there you run the reprotect back to the primary, then a second failover and reprotect.

  4. Hi Drew,

    thanks for the reply

    This was the status of the Array devices as soon as we added the Array Pair to SRM. All of our existing VPLEX devices are showing up with status ‘Failover Complete (Stretched Storage)’

    Once the protection group was added the consistency group error appeared regarding site preference not being present – Which I don’t know how to resolve. So was wondering if the Array status is the root cause of the issue.

    Thanks
    Tony

    • I see. Well that is very strange. Yes until the array reads properly concerning site preference it will not work properly. Can you confirm you have a single consistency group with the distributed devices, a view on each site and that the detach rule is set for wherever you are creating the protection group? You also probably want to delete the protection group anyway until the array pairs are reading properly as in my screenshots.

      • Thanks again Drew, we are getting there now. Have sorted the site preferences, it was a misconfiguration in VPLEX and I can now failover and fail back. But instead of live vmotion the VM shuts down first then starts up on the otherside which I can’t understand. I am not seeing any vMotion errors, the vMotion networks of the 2 sites are different subnets but I have configured with gateways so they are routable.

      • Are you running a failover or a planned migration? Only the planned migration is going to vMotion them.

  5. Hi Drew

    Sorry meant to say it’s only shutting down when failing back with planned migration. Seems to be VPLEX storage related as I am seeing this warning in the job log but I am not sure what it means:

    Consistency Group “Test”:
    Warning: Cannot process consistency group ‘Test’ with role ‘target’ when expected consistency group with role ‘source’.
    Cannot process consistency group ‘Test’ with role ‘target’ when expected consistency group with role ‘source’.

    • So at this point I think Tony you should open an SR if you haven’t already. I am by no means a VPLEX expert and we could spin our wheels for a while rather than getting you directly to the developers on the VPLEX side.

  6. Pingback: SRDF SRA 6.1, SRDF-AU 6.1, SRDF Metro and SRM Stretched Storage |

  7. Hi Drew, great article.
    Have a couple of questions which i hope you can help me with.
    1. vMSC seems to be pushed more with VPLEX metro cluster, but this is described by VMware as a always-on fail over solution rather than a DR solution, and it does not provide the orchestration that SRM provided. How relevant do you think VPLEX metro and SRM solution is still today? SRM is an active/passive architecture where vPLEX provide active/active?
    2. How well does your solution explained in this blog play when a VM does a cross-vCenter vmotion? Is this even supported?

    thanks
    Johann

    • Hi Johann,

      1. Even with SRM I consider this more of an HA solution given the distance for Metro is so close. The benefit of the orchestration with SRM is for things like planned maintenance. Without SRM you have to do all the vMotions manually. Though SRM is active/passive you can still split your workload (VMs) across both sites and setup the groups accordingly.

      Now as to whether customers use this solution, I’d say it is not very popular since it is not true DR. In the VMAX space (my most recent post), we offer a third site which makes SRM more attractive as a solution with SRDF/Metro.

      2. You can still do cross vCenter vMotions, but do you mean if the VM is protected by SRM?

      • Hi Drew, thanks for getting back to me so quickly.
        On #2. sorry yes i meant for VMs that are protected by SRM. I guess you would just perform this task through SRM by initiating a planned migration on a recovery plan that makes use of vmotion. This would mean i would have to create separate recovery plans for each VM or group of VM’s you want to migrate?

        Some follow-up questions on #2. Once the VM is moved do you have to initiate a reprotect? Also would you have to change the storage policy for this VM to now be associated with the other site? Assume that tag stays the same since using enhanced linked mode.

        So with vMSC or SRM with VPLEX metro being a HA solution I am trying to figure out which will better suite stretched layer 2 sites with the following requirements
        Be able to migrate VMs between sites on demand
        Easily perform site maintenance
        Least amount administrative overhead on maintaining this environment.

      • So in a planned migration with Metro and SRM, SRM will do a vMotion of the VMs before failing over to the other site. Once failover is complete, you then need to run a reprotect – SRM handles all the policy stuff because you have to have that setup ahead of time (you create the association in SRM). Once maintenance is done, you run another planned migration back and another reprotect. I think I showed all that in the videos for VPLEX.

        The last part of your question is harder to answer. I think it depends on the number of VMs you are running. Large environments might benefit from SRM since you set it up once, and then you can use it when needed without much knowledge (point and click). Moving all the VMs manually could be cumbersome. On the other hand, if it is a medium sized environment, and maintenance is minimal, doing it manually (which of course could mean scripting it too) might not be a huge task. If you don’t already own SRM there is the licensing cost to consider too. I think you have to weight the options.

        Although I created this whitepaper for SRDF/Metro, all the VMware pieces would be the same as VPLEX if you want to take a look at it. Focus on the setup pieces, rather than the Oracle side: http://www.emc.com/collateral/white-paper/h14577-srdf-metro-oracle-ebus-rac.pdf

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s