3-site SRDF with VMware SRM

As there is a 500 page TechBook on SRDF with SRM, it might seem a bit redundant to be covering a topic like this again, and perhaps it is; but as I have worked with customers and support over many months (and years), a few issues seem to recur. It’s usually at that point I figure a post might help just to emphasize what is already in the documentation because I agree you can’t be faulted for not reading it cover to cover (honestly I try to avoid it, too).

Our SRDF SRA version 5.8 and higher supports most 3-site SRDF topologies. This includes both Star (3-sites with interwoven relationships) and non-Star. Starting with SRA 6.2 we also support SRDF/Metro with 3-sites in non-Star. Putting aside Star which you really want the TechBook for, I want to focus on a typical 3-site SRDF SRM environment.

A non-Star 3-site environment consists of 3 VMAX arrays – let’s call them VMAX A, VMAX B, and VMAX C. SRDF is configured first between VMAX A and VMAX B and pairs (R1, R2) setup to replicate synchronously (SYNC) or asynchronously (ASYNC). (The SRA does not support adaptive copy.) Once this first relationship exists, a second replication is configured on the same pairs. There are two types of supported replication: concurrent and cascaded. A concurrent setup means we replicate the R1 to the VMAX C while a cascaded setup is replicating the R2 to the VMAX C. Here’s a mock-up of the 2 different types.

There are some important differences I need to point out between a regular 3-site environment and one that will use SRM:

  • Note that the only supported replication to the 3rd site is asynchronous. You cannot use adaptive copy or synchronous. The SRA will not work with either.
  • In concurrent setups, the R1 becomes an R11; on the cascaded the R2 becomes an R21.
  • Due to the way the SRA was coded, in cascaded setups the VMAX A must see the VMAX C (this is not required outside of SRM). In other words the arrays must be zoned and an SRDF group (empty) created between them. If this is not done, VMAX A and VMAX C will not be available as an array pair to enable in SRM. For many customers this is not an option and unfortunately until this issue is addressed in a future SRA, a concurrent setup is the only choice.
    • SRDF/Metro is supported with 3-site but still has this cascaded requirement.

These caveats noted, what usually trips-up customer configurations is how to configure SRM differently depending on whether using the VMAX B or VMAX C as the recovery site. So here is a quick rundown of the differences.

First, the SRM protection site must be the R1 or R11. Now some customers will have some R1s on both the VMAX A and VMAX B and use different protection groups, and that’s fine, but the majority will use each array for only one purpose. My point is more that you can’t use an R2 as the starting point of the protection. So in our picture above, I can’t setup SRM to failover from VMAX B to VMAX C (a common mistake). I think where it can get most confusing is with SRDF/Metro where you have an active-active configuration so it might appear there would be no difference in which device you would use at the protection site, BUT it does. You must use the R1. At this point it might be useful to mention that this does not impact the paths you present to the hosts. In particular if you run a cross-connect SRDF/Metro you will have both R1 and R2 presented to the same host(s). That’s perfectly fine, but it does not change the requirement that you use the R1 in configuring SRM.

Now that we’ve established the R1 is the protection site, or the VMAX A in our picture, what are my options with SRM in 3-site? The answer is you have 2 options. The first is to setup a typical R1 to R2 failover. In this configuration you are designating the VMAX B as the DR site and the replication is either asynchronous or synchronous. Basically you are configuring the environment as if the VMAX C was not there. If you ever wanted to failover to the VMAX C you would do so manually. The other option is to setup SRM to failover to the VMAX C. This second option is where problems usually arise so I’ll expand.

The most important step to take when you want to use the VMAX C as the DR site is to modify the global options file (EmcSrdfSraGlobalOptions.xml) that the SRA uses to determine its behavior. There is a global setting created just for this scenario called FailoverToAsyncSite. By default this is set to “No”, but if you want to failover to the VMAX C you must set it to “Yes”; furthermore you must set it on both the protection and recovery sites. This should be done before you even start configuring SRM. If you don’t do this you’ll never get it to work.

The second step concerns the array manager setup. Issues with this are usually the result of a misunderstanding of how and where devices are presented to hosts. In a 3-site configuration you will have either 2 or 3 vCenters. In a 2 vCenter setup, customers may choose to have no vCenter (or compute resources) at the VMAX B site (sometimes referred to as the “bunker” site). Since the VMAX B is not being used as a failover location you really don’t need a vCenter there; but you can of course have one which would be a 3 vCenter setup. In SRDF/Metro environments you may have only 1 vCenter for both VMAX A and B, and then the second at VMAX C. When you setup the array managers, then, you want to use a Solutions Enabler environment that sees the VMAX A only as local, and Solutions Enabler environment that sees the VMAX C only as local. You do not have a Solutions Enabler environment for the VMAX B (or not one for SRM anyway). Your recovery site is the vCenter where the VMAX C devices (concurrent or cascaded) are presented. If this step is done properly you will be presented with the VMAX A to VMAX C array pair to enable. If you don’t see the array pair you expect, you have either hit the cascaded issue above, or you are not using the correct Solutions Enabler setup.

Here is a screenshot of my VMAX A (103), VMAX B (104), and VMAX C (062). Note in particular that the only pair I can enable is my VMAX A to VMAX C. Since I have no Solutions Enabler for VMAX B (104), I cannot enable that pair. This is as it should be.

If anything needs clarification feel free to drop me a comment. I hope this helps next time you need to configure 3-site SRDF with SRM.

Advertisement

8 thoughts on “3-site SRDF with VMware SRM

Add yours

  1. How does reprotect option in SRM work in 3 Site Cascaded SRDF with Metro between R1 and R21, Async between R21 and R2?

    1. Reprotect is not supported in that particular configuration. It is covered in the TechBook and you can also find it mentioned in the SRA Release Notes as a known issue: “Reprotect workflow not supported for SRDF Metro 3 site, Failover to Async configuration. Reprotect must be done manually which may include downtime.”

  2. it’ awesome explained. I have query about SRDF Metro with Single SRDF/A Suppose I have 3 PowerMax 2000’s, 2 configured in Site-A as Metro, 1 is in Site-B as a SRDF Async DR site. All data paths, to and from each array is working fine. Now I understand if need to fail-over from PMAX-A to PMAX-C need to modify the EmcSrdfSraGlobalOptions.xml as you explained above. For testing need to change the EmcSrdfSraTestFailoverConfig.xml file where in provide DR array id, source devices and target devices for testing or just need to change only EmcSrdfSraGlobalOptions.xml file? Is there any thing special need to take care from SRM or SRA side ?

    1. There are two options for testing. You can manually create target devices on PMAX-C and then modify the EmcSrdfSraTestFailoverConfig.xml to match the R2 devices with the newly created target devices, or use the recommended auto creation by changing this global parameter in the EmcSrdfSraGlobalOptions.xml to Yes: AutoTargetDevice>No</AutoTargetDevice. If you review this post it will explain the options in detail: https://drewtonnesen.wordpress.com/2019/09/11/srdf-sra-and-srm-test-failover-review/

  3. Hi , I’m currently working on concurrent replication between powermax 2000 and SRM, we have three vcenter and 2 couples of SRM appliance that forms 2 site pair between Production site A and recovery site B and Production site A and recovery site C.
    We have protection group and recovery plan defined on the first site pair between A and B , as soon as I define the concurrent SRDF link between A and C the replicate device disappear from the array pair replicated device list of site pair A/B, from a storage perspective everything looks good.
    Do you have any suggest (i don’t have access to the techbook link
    Many thanks
    Regards

    1. Hi Mauro,

      I’m not sure I fully understand the setup so you may have to detail it a little more. SRM is always a 2 site solution, even if you are using 3 sites with PowerMax. You have 2 SRM servers so one is at site A and the other is at site B or site C. You have an SRDF pair between A and B, and then you add a concurrent leg (let’s say SRDF/A) to site C. If you want to failover to site C with your concurrent setup you have to change the parameter FailoverToAsyncSite to Yes on both site A and site C. This also means your management servers (Solutions Enabler) must discover array A as local and array C as local. If you are failing over to site B instead, that parameter should be no and you should have array A as local and array B as local. It can only be one or the other. If I misunderstand what you are doing, there are more advanced configs – I have one detailed here https://drewtonnesen.wordpress.com/2017/01/13/adv-srm-srdf-sra/. In general the failure to discover the array pairs you want has to do with the array managers configured and that parameter I mentioned. The SRA can only discover either the direct pair (e.g., SYNC) or the concurrent pair (e.g., ASYNC) and that is all based on the discovered array pairs.

      BTW the TechBook link is under the Library (https://drewtonnesen.wordpress.com/important-docs/). Any links in blog posts are likely dead due to internal server changes.

  4. Hi Drew
    Thank you for your prompt answer , in the meantime I have found the parameter you mentioned by reading your document h17000-srdf-sra-srm-tb_v10.pdf, we have been able to set-up a concurrent three site configuration with three vcenter server and 4 SRM appliance.
    We successfully tested the recovery plan on both direction A to B and A to C
    Thank you for your support and for the link
    Regards

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: