As I mentioned in my post on SRDF online device expansion, we do not yet support online expansion of SRDF/Metro devices (until 5978.444.444). There are some SRDF enhancements in PowerMaxOS, however, which can get you pretty close so I thought it might be worthwhile to explain how to do that.
We now support moving SRDF/Metro pairs between SRDF groups – from Active to Sync or Adaptive Copy and back. So how exactly does that help with online device expansion in a VMware environment? Well, SRDF/Metro is an active/active solution so both the R1 and R2 are read/write. It is typically deployed in a VMware Metro Storage Cluster (vMSC) where a single vCenter contains ESXi hosts from two different datacenters. In a preferred configuration, each host only sees either the R1 or R2, yet they are accessing the same datastore. Therefore the VMs can move freely between the hosts with vMotion despite the underlying storage being on two physically distinct arrays. So in this type of setup, we can take advantage of SRDF online device expansion by executing the following steps:
- Disable automatic (if applicable) DRS on the cluster to prevent VMware from moving VMs between the hosts.
- vMotion all VMs on ESXi hosts that are accessing the R2, to hosts accessing the R1.
- Move the SRDF/Metro pairs in question from an Active group to a Sync group. During this action, the SRDF pairs will change from Active to Suspend, but the R1s will always remain accessible, hence why we moved the VMs.
- Now that the pairs are in a Sync group, we can expand the devices on both sides through Unisphere for PowerMax or Solutions Enabler.
- Once expanded, move the SRDF group from the Sync one back to the Active one. The devices will remain suspended until all the invalid tracks on the R2 (from any activity generated on the R1 where the VMs are) are synced. Once synced, the pairs will change back to Active (bias or witness).
- Rescan the HBAs on the ESXi hosts.
- Run the expansion wizard in the vSphere Client (or ESXi client if using vSphere 6.7 due to the bug explained in the previous post) and increase the datastore size.
- Move the VMs back to their original host and re-enable automatic DRS (if applicable). This step could be done before the expansion of the devices if desired.
That’s it. I created a video of the process using vSphere 6.7. I apologize it is a bit long but I wanted you to be able to follow the steps and not jump too quickly between screens. In my setup I use a non-uniform (preferred) setup as I have noted in the steps above and I am using DRS. The movement of the SRDF groups is done through Solutions Enabler and the expansion with Unisphere for PowerMax. I use a virtual witness but bias is fine also. I used IOMETER just to prove the VMs stay up during the whole process.
Now it is important to remember that unlike expansion with the other SRDF modes, during part of this process for SRDF/Metro you are at risk because the R2 is not consistent with the R1. Depending on how heavy the activity is during the expansion, the synchronization process can take time so it would be best to run this during a time of reduced activity. However I think this provides a nice alternative to moving all the VMs off the storage completely (Storage vMotion) before expansion, particularly if you have many and they are large.
Note that if you employ a uniform configuration (cross-connect), theoretically you don’t need to move any VMs since the R2 paths would become inaccessible and only the R1 paths would be active; however I don’t recommend doing it that way. Instead, disable the paths to the R2 and avoid the timeouts. Here is an example of how to do that:
If you are using PP/VE (recommended for either configuration) you could still disable the paths but it is less important as PP/VE is far more responsive than NMP.
The other thing I should point out is there are a bunch of caveats and conditions around moving between SRDF groups so please read the documentation on support.EMC.com before trying this out and always test, test, test!