One of the new features we released with the latest PowerMaxOS is known as SRDF/Metro Smart DR, or MetroDR for short. MetroDR expands the previous functionality of a third leg off the R1 or R2 of an SRDF/Metro pair by allowing either the R1 or R2 to update the third leg. In essence this is similar to our SRDF/Star configurations. The way this works is that only the R1 of an SRDF/Metro pair updates the remote asynchronous leg (mode can be adaptive copy also), but in the event of a failure or reconfiguration, the R2 can become the R1 and thus take over the updating of the remote leg. The feature has been fully integrated into Unisphere for PowerMax and Solutions Enabler. There is a separate section in Unisphere just for MetroDR:
Before I provide an example of using this functionality, I want to note that the SRDF SRA for VMware SRM (the SRA enabling the use of SRDF array replication with SRM) does NOT support MetroDR. There is no timeline for support, so if you were planning on using this feature with VMware SRM, you would have to create a completely manual solution without SRM.
SRDF/Metro Smart DR Example
I thought I’d run through a simple example of MetroDR just to give an idea how it can work in a VMware environment. If you understand SRDF I don’t think it will be difficult to follow along.
PowerPath Virtual Edition (PP/VE)
Before I began the setup, I installed PP/VE 7.0 P01 on the 2 ESXi hosts in my cluster. This version supports vSphere 7. It is our best practice after all. This does require a reboot.
Generally I like CLI when working with SRDF simply because I’m not very good about working within the constraints of Unisphere (that and my lab is such a mess it’s usually impossible), but unlike SRDF/Star, SRDF/Metro Smart DR is fully integrated into Unisphere for PowerMax so I’m going to use it. In this example I started with 3 arrays (obviously) all at the new PowerMaxOS 5978 Q3 2020 (5978.669.669) release which is a requirement. Unlike other SRDF relationships which can go between different code levels, MetroDR cannot. Arrays 357 and 358 will be in the SRDF/Metro relationship while array 355 will serve as the asynchronous leg.
- 000197600357 (Metro R1)
- 000197600358 (Metro R21)
- 000197600355 (Async R2)
To start, I’ve created a storage group srdf_metro_smart_dr_sg with 2 x 100 GB devices and placed that in a masking view to the ESXi hosts in my VMware Metro Storage Cluster shown above. Then I created 2 datastores on the two devices. Remember best practice is to present the R1 first, create the datastore, then you can present the R2, whether nonuniform or uniform configuration.
Unisphere Protect Wizard
OK now we’re ready to protect the storage group. The wizard in Unisphere follows the same way as protecting any storage group with SRDF or TimeFinder. You’ll see a new option, “Setup high availability with DR using SRDF/MetroDR”. The first selection is for Metro, which should be familiar. The second selection is for the asynchronous site (the mode can also by adaptive copy). Remember you’ll need connectivity from each Metro leg to the asynchronous leg. I’ve used Automatic for SRDF groups but you can certainly manually select the groups if you’ve pre-created them.
Here is some detail of the steps.
Next, I present the storage group from my Metro R2 to the vMSC cluster. As I’m using PP/VE, I’ve setup Autostandby so while technically I have a uniform setup, I am running it non-uniform. My dsib0180 ESXi host has 2 active paths to array 357 and 2 standby paths to 358, and the ESXi host dsib0182 has the opposite. You can see the command to update the configuration and the result below.
In the event of a problem with R1 paths, PP/VE will immediately direct IO to the R2 paths. If you need detail on PP/VE with Autostandby I have a post on that here.
Finally, I present the storage group at the DR site (array 355) to the DR vCenter.
Here is a visual view of the configuration in Unisphere for PowerMax. Note that by hovering over one of the connection lines, you can get some more detail.
But since I can’t resist, here is the CLI view. In this case I am looking at the R2 Metro leg (which is really an R21) and the Async leg. Note the -metrodr switch so I only see that type of configuration.
In this configuration, if either Metro side (R1, R21) fails, then the other will take over the data transmission. If it is the R1 that fails, the R21 will become the R1 (as it would in any failure that uses a witness). But for the purposes of this example, let’s assume I want to failover to my DR site for a test. Again, best to use Unisphere for this and being a test, we’ll be able to do it cleanly.
- Shutdown all VMs
- Unregister VMs
- Unmount datastores
Within Unisphere, we can click through the wizard to failover to the asynchronous site.
And CLI shows our Metro is suspended, our Async is failed over.
The devices are now read/write at the DR vCenter so we can go through the process of mounting them, just like SRM would be doing (which, again we don’t support). Then we would register the VMs and good to go.
And then using the wizard we can failback and return the environment to its starting position.
The MetroDR feature is a vast improvement over the current 3 or 4 site SRDF/Metro functionality which must treat each leg as independent. The newly built intelligence of this feature will protect customers against an HA failure while never losing the ability to transmit changes to the DR site. Now I wish I could say the entire ecosystem and features surrounding MetroDR are also there, but alas not yet. Without further ado, the caveats.
- All 3 arrays must be running PowerMaxOS 5978 Q3 2020 (5978.669.669).
- The Dell EMC SRDF SRA for VMware SRM does not support MetroDR (yes the third time I’ve told you). When it comes to 3-site SRDF/Metro configurations, the SRA only supports concurrent configurations (single leg off the R1). I’m sorry but I have no insight into when that support might be forthcoming. It requires some code changes within PowerMaxOS in addition to those in the SRA.
- Online device expansion in MetroDR is not supported, i.e. while in the metrodr mode. You can, however, disable MetroDR, expand the asynchronous device, then the Metro pair from the R1 and then re-enable, without pausing replication.