Online SRDF device expansion – SRDF/Metro

As I mentioned in my post on SRDF online device expansion, we do not yet support online expansion of SRDF/Metro devices (until 5978.444.444). There are some SRDF enhancements in PowerMaxOS, however, which can get you pretty close so I thought it might be worthwhile to explain how to do that.

We now support moving SRDF/Metro pairs between SRDF groups – from Active to Sync or Adaptive Copy and back. So how exactly does that help with online device expansion in a VMware environment? Well, SRDF/Metro is an active/active solution so both the R1 and R2 are read/write. It is typically deployed in a VMware Metro Storage Cluster (vMSC) where a single vCenter contains ESXi hosts from two different datacenters. In a preferred configuration, each host only sees either the R1 or R2, yet they are accessing the same datastore. Therefore the VMs can move freely between the hosts with vMotion despite the underlying storage being on two physically distinct arrays. So in this type of setup, we can take advantage of SRDF online device expansion by executing the following steps:

  1. Disable automatic (if applicable) DRS on the cluster to prevent VMware from moving VMs between the hosts.
  2. vMotion all VMs on ESXi hosts that are accessing the R2, to hosts accessing the R1.
  3. Move the SRDF/Metro pairs in question from an Active group to a Sync group. During this action, the SRDF pairs will change from Active to Suspend, but the R1s will always remain accessible, hence why we moved the VMs.
  4. Now that the pairs are in a Sync group, we can expand the devices on both sides through Unisphere for PowerMax or Solutions Enabler.
  5. Once expanded, move the SRDF group from the Sync one back to the Active one. The devices will remain suspended until all the invalid tracks on the R2 (from any activity generated on the R1 where the VMs are) are synced. Once synced, the pairs will change back to Active (bias or witness).
  6. Rescan the HBAs on the ESXi hosts.
  7. Run the expansion wizard in the vSphere Client (or ESXi client if using vSphere 6.7 due to the bug explained in the previous post) and increase the datastore size.
  8. Move the VMs back to their original host and re-enable automatic DRS (if applicable). This step could be done before the expansion of the devices if desired.

That’s it. I created a video of the process using vSphere 6.7. I apologize it is a bit long but I wanted you to be able to follow the steps and not jump too quickly between screens. In my setup I use a non-uniform (preferred) setup as I have noted in the steps above and I am using DRS. The movement of the SRDF groups is done through Solutions Enabler and the expansion with Unisphere for PowerMax. I use a virtual witness but bias is fine also. I used IOMETER just to prove the VMs stay up during the whole process.

Now it is important to remember that unlike expansion with the other SRDF modes, during part of this process for SRDF/Metro you are at risk because the R2 is not consistent with the R1. Depending on how heavy the activity is during the expansion, the synchronization process can take time so it would be best to run this during a time of reduced activity. However I think this provides a nice alternative to moving all the VMs off the storage completely (Storage vMotion) before expansion, particularly if you have many and they are large.

Note that if you employ a uniform configuration (cross-connect), theoretically you don’t need to move any VMs since the R2 paths would become inaccessible and only the R1 paths would be active; however I don’t recommend doing it that way. Instead, disable the paths to the R2 and avoid the timeouts. Here is an example of how to do that:

If you are using PP/VE (recommended for either configuration) you could still disable the paths but it is less important as PP/VE is far more responsive than NMP.

The other thing I should point out is there are a bunch of caveats and conditions around moving between SRDF groups so please read the documentation on support.EMC.com before trying this out and always test, test, test!

Advertisement

20 thoughts on “Online SRDF device expansion – SRDF/Metro

Add yours

  1. Hi Drew, another great post!!!

    One doubt about Power Path VE- We are in a project to implement vMSC,we have power path VE, But unitl now we are not considering using PPve in VMware enviroment, for avoid a kind of complexity. But after reading your post, I’ II consider, We already have the licenses included in FX Package. Could you tell me, what are the key benefits of utilize power path VE comparing with the default VMware multipath?
    Many thanks

    1. I’m not really the person for questions like this. Off the top of my head,

      • I/O load balancing – host-by-host basis using internal algorithms to ensure the most underutilized path is used
      • Statistics gathering which are user accessible for analysis
      • Path failover and recovery – automated re-direct
      • Proactive path testing on failures
      • Supports both virtual and physical environments

      You should direct further inquiries to your Dell team who can do a far better job illuminating the benefits than I can.

  2. Thanks Drew, I’m considering using PowerPath/VE

    One doubt about SRDF Metro in a Storage Group

    What is the best way to protect a Storage Group of ASM Oracle LUNs?
    Constantly we have to add LUNs in the storage group of ASM. What should I do to protect this new volumes and keep the organization in Storage Groups?
    Currently I’m doing in this way but for VMware LUNs, but for Oracle I’d like to know, if has a better way to avoid disruption
    1 – Change the SRDF pair to suspend or syncronous mode
    2 – Create the new volumes in both side
    3 – Create a new pair in SRDF Group, selecting the new volume and add it to the correct Storage Group at local site
    4 – Associate the new volume in the correct storage group at the remote site
    5 – Establish the RDF Pair again

    Another doubt do you know if Unisphere 9.0 has some limitations for create pair with SRDF by GUI? Because I’m having some problems to find a remote site in GUI for SRDF operations. With SYMCLI have been working fine

    Many thanks again

    1. Last doubt, what is the right way to change a SG “Metro” to “sync” for perform the operation of expanssion? I only got sucess changing to suspend.Trying to change direct to sync mode I got the error below:
      #symrdf set mode sync
      “An RDF Set ‘Synchronous Mode’ operation execution is in progress

      Operation is not allowed while device is part of an RDF/Metro Configuration”

      1. The process I documented and put in the demo worked for me. Definitely open an SR and ask them to investigate why it is failing for you.

    2. So depending on your version of PowerMax you can add volumes dynamically to SRDF/Metro groups so I’m not sure why you are doing step 1 above unless you are expanding devices; otherwise your process is fine to add devices.

      There are some RDF tasks you cannot do in Unisphere but they are the more advanced ones. Specifically you cannot do SRDF/Star tasks.

      1. Sure It was my mistake. I tryed without step one and works fine in SRDF metro Group. Now is much better 🙂
        Just for SRDF/Sync I had to suspend the replica.Is there a way to add volumes in a Sync SRDF Group without suspend?
        About the tasks in SRDF, It wass a issue with my embededd unisphere, I Opened a ticket as you suggeste and the services in unisphere container was restarted and now is working perfectly

        One doubt, SYNC RDF Group can have multiples Storage Group right? the Metro SRDF Group Can have only one? I tryed to replicate another Storage Group, using the same SRDF but it doesn’t show to me, I had to create another one

        Many Thanks

      2. SRDF/S allows dynamic pair adding also. You should not have to suspend.
        You can have many SRDF groups of all types Metro, Sync, Async but a volume can only be in one group.

  3. In existing SRDF/S Group when I,m creating a new pair, I have option to add this new volume to a existing SG at Local Site but at Remote Site, the SGs are hidden. So, for add the new volumes to a SG for host access, I have to suspend all SRDF Group and then add the volume in the correct SG, without suspend , I’m unable to perform the operation.cThe association of a volume to a SG in a remote site, during a creation of a pair should work or I’m doing this in a wrong way?

    Many thanks again
    After all your tips, I’m more comfortable with Unisphere and VMAX as well!!!

  4. Hi Drew, do you know how can I configure alerts to notify me when A Storage Group protect by SRDF change the status from syncronous/metro to suspend? I reseached at he Unisphere alerts guide but didn’t find this option. If there’s no way in Unisphere, What tool/way do would you suggest for monitor and notify in case of change the status of SRDF Groups?

    Many thanks

    1. No, there is no such alert. You could script such a solution using the REST API – run it say every 5 minutes checking on the RDF groups that are of interest. The other option is to use a log aggregator like Log Insight and then setup alerts there based on fields from the log entries. I have a whitepaper in the docs section here that explains that.

  5. Nice blog and description. Could you also explain with video on how to expand a boot lun which is also part of SRDF metro set up.. with active active state

    1. Well the SRDF process is the same for expansion if you are booting from SAN, though I don’t typically see that configuration. If you are doing it for extra paths to the remote array in a cross-connect configuration, then I recommend adding a step to disable the R2 paths since my post assumes a non-uniform configuration. The key difference for expanding the ESXi partition is at step #7 in my post where I explain to use the vSphere Client to expand the datastore. You can’t do that with the boot partition, you have to follow this KB: https://kb.vmware.com/s/article/2002461. So in summary you follow the process (though it is unlikely you have any VMs running on the boot datastore to move) and then use the KB to expand the datastore.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: