I have worked with a number of companies that have a DR requirement that their production environment switch sites (for the sake of this post we are speaking of a traditional two site configuration) every six months. Think of this as taking the place of regular testing at the DR site. Really then these companies have two production sites, one primary, the other secondary. VMware SRM is a good solution for these VMware customers because of its dual-site ability. We (and I include myself) generally speak about the SRM protection and recovery sites as defined from the start, meaning a particular location like protection in LA, recovery in Boston. But really protection and recovery sites are defined at the group or VM level. I can very well choose to have half my environment run in LA and half run in Boston and then both those locations are the protection and recovery sites depending on the protected VMs. Granted most customers I work with do skew to the traditional definition and then use the recovery site as a test/dev site so as not to waste hardware; but they don’t have to. For the companies I wrote about in the first sentence they are more traditional, though switch sites every 6 (or less) months via a planned migration in SRM.
When these companies do the planned migration (and yes planned not disaster), the desire is to have SRDF replication resume to the production site as soon as possible. In SRM, that is what the reprotect function accomplishes. But what if you want to be sure replication resumes as soon as
hum array possible in case someone forgets to hit that reprotect? Well the SRDF SRA has an answer for that.
Default Planned Migration
I’ll use a typical configuration of SRDF/A – two devices in a single protection group and recovery plan. You can see I have my consistency group created (srdfa) on each site. I am replicating from array 302 to array 341.
The array pairs are in the required Consistent state. Note the view below is from array 341, the write disabled R2 site.
Now I issue a planned migration.
During planned migration, the array pairs will first move to Suspended then Failed Over. The location of the R1 and R2 does not change.
And this is the end state after planned migration completes.
Now I run a reprotect and the pairs return to a Consistent state. Note how array 341 is now the R1 or read/write.
Now let’s look at how we can automate returning the pair to a Consistent state as part of the planned migration rather than the reprotect.
Modified Planned Migration
So back to the issue at hand, how can we tell SRDF to reverse replication as soon as the failover takes place? We’ve got a parameter for that! Within the EmcSrdfSraGlobalOptions.xml file you’ll find the parameter ReverseReplicationDuringRecovery. I have it highlighted below.
The parameter, self-explained, will perform the SRDF tasks during planned migration that are normally done during reprotect. So first we change the file (download/upload process) on the recovery site – where we are migrating to.
Now we run the planned migration again. Within SRM it all looks the same because VMware is unaware what the SRDF SRA is doing.
If we watch the array pairs, rather than moving from Suspended to Failed Over as before, now it goes from Invalid to Partitioned and finally Consistent. Note in particular how the R1 becomes an R2 as replication is reversed.
If at this point if we ran a reprotect, the SRDF SRA would tell VMware that the devices are already in the correct state and therefore do nothing. VMware would then do its job of switching the protection group/recovery plan direction.
For the customers of this example, immediately reversing replication means that even if they forget to run the reprotect after the planned migration, replication continues unabated on the array. Just another safety measure for those who require it.
The parameter is only for planned migration. It has no bearing on disaster recovery.