Yes, another SRM post. I guess I do write a lot of them to be fair. I’m not actively seeking them out I assure you. As is the case the vast majority of the time, this is drawn from a customer experience. I had not given much thought to the differences in these scenarios I am going to cover, since all are acceptable configurations; but two are used far more than the other one and perhaps that is due to a lack of understanding. So when in doubt, I blog about it.
SRDF
Let’s start with SRDF. The post will focus solely on consistency in SRDF Asynchronous (SRDF/A) environments. For those of you who run SRDF/A, you will no doubt be aware of the special relationship device pairs have with their SRDF group, that being all device pairs in an SRDF group (SRDFG) must act as a single entity. If my SRDFG has 3 pairs, AB, CD, and EF, and I wish to failover pair AB, I can’t do so without failing over CD and EF (those who use SRDF/Metro should also be familiar with this). Conversely, if my mode was synchronous (SRDF/S), I could failover just AB even though both CD and EF are part of the group. In SRDF/S, the pairs are always consistent, they have to be – what in the database world we might call a 2-phase commit. The local device, A, will not have the data written until the remote device B, acknowledges it has the data. With SRDF/A, data is sent over in deltas from the SRDFG and thus all associated devices. There are not separate sets for each device. Therefore I cannot take an action against only one of the pairs in the group as they are in a shared relationship. Anyway, if you use SRDF/A this is probably a review you didn’t need, but it is the foundation of our topic here so thanks for your indulgence.
Typical Configurations
When customers use VMware SRM with SRDF/A, they typically have one of two configurations. The first, is they place all their devices in the same SRDFG and thus the same Protection Group. This is a well-trodden path so no need to go into detail. If you know all the devices need to be at the same recovery point, it is easiest to put them in the same RDFG.
The second typical setup is multiple SRDF/A groups. A customer has multiple applications and each one is in their own SRDF/A group. Take the following example where I have two SRDF groups, 24 and 28. SRDFG 28 has 4 devices, 4 datastores, and 4 VMs. Similarly SRDFG 24 has 2 devices, 2 datastores, and 2 VMs. I setup 2 device groups (async1, async2) on each array manager (Solutions Enabler) for each SRDFG and discover the pairs in SRM.
Now I create 2 Protection Groups, ASYNC1 and ASYNC2. I’ll start with ASYNC1. Note below how SRM helpfully separates out the datastores by the device groups I created.
Both Protection Groups are added to a single Recovery Plan.
Now if I run this Recovery Plan, VMware will cycle through the Protection Groups. The SRDF SRA treats each set of devices (i.e. SRDFG) independently, exactly as I have asked it to with my device groups. I should mention that the SRDF SRA cycles through DGs/CGs as it performs operations.
One of those 2 configurations is usually what I see. But how about something a little more complex?
Composite Consistency Groups
There are cases where a customer wants to separate out their applications into multiple SRDF/A groups, but nonetheless needs them to be consistent with each other. A multi-tier application is probably first to mind – let’s say a database and application tier. Assuming I want to failover or test them at the same point, I could put all the devices in the same SRDFG; but in this example I also need to be able to test my database independently outside of SRM for business reasons, and including the application devices is a waste.
Therefore, I will create 2 SRDF/A groups like I did for our independent applications shown above. But now instead of 2 device groups on my array manager, I am going to create a single composite group that contains the devices from both groups. So here are my 2 SRDF/A groups, 3 and 17, each with 2 pairs.
Now I create the composite group on each array manager, putting all the R1s in one group, and all the R2s in the other group.
Let’s discover our devices and create our Protection Group and Recovery Plan. Here are the array pairs showing our composite group, composite_async:
And now the Protection Group and Recovery Plan. See how SRM has bundled all our datastores together.
Good to go right? Let’s run a test.
Oh, well that’s not right. What did we do wrong? Let’s see what SRM says:
Consistency was not enabled? But didn’t I specify -rdf_consistency? We’ll double-check the composite group by doing a show:
Uh oh, consistency is disabled. Well -rdf_consistency is not enough. After adding my devices to the group I need to enable consistency across the two SRDFGs. But that’s easy to resolve. On the R1 side I issue the enable and see how now the group is consistent.
Remember, too, that the composite group is a single entity, even though we have an R1 and R2 group, enabling consistency means all devices are consistent. This is from our R2 CG:
And re-run the test (after a forced cleanup of course):
Summary
Consistency is everything in the world of SRDF/A and most definitely with the SRDF SRA. Although I’ve rarely seen customers do the consistent composite group across multiple SRDF groups, it’s good to know you have the option if you wish.
Leave a Reply