PiTs with SRM and the SRDF SRA

I wanted to revisit this post on using point-in-time (PiTs) with SRM and the SRDF SRA having just demonstrated SRM with virtual and physical infrastructure. A dual purpose environment was the original impetus for adding the parameter IgnoreActivatedSnapshots so many years ago now, and as it has been four years since the post I thought showing the original purpose on a PowerMax would be worthwhile.

I will once again start the post with the caveat that when you are using PiTs instead of a current snapshot of the SRM environment, the SRDF SRA is unable to validate that it matches the current environment. What I mean by that is if I use last week's snapshot, and yesterday I added a VM to my protection group and recovery plan, that VM is not going to be in the snapshot and SRM is going to produce errors, sometimes fatal.

Rather than just mention the caveat, I’ll also show you what this particular error looks like which is probably the best use case since it doesn’t cause a complete test failure. Before I get to it, let me lay out the environment and demonstrate the correct use of a non-current snapshot.

SnapVX and Generations

A quick pit stop for one additional caveat about SnapVX and generations. This particular feature to re-use an existing snapshot has been around a long time, well before SnapVX was introduced and the concept of generations. As part of the cleanup that the SRA does when you tell it to re-use the existing snapshot, is it relinks it. It does this without providing a generation, assuming of course there is only one (which would be the case if not using this parameter). So if you have multiple generations of a snapshot, and have linked a generation other than 0 for your test (or a new one creates during your test), when the SRA cleans up it will relink generation 0, and not the one you may want it to. There are a couple ways to avoid or get around this if you don’t want to do another unlink/relink to return it to the original snapshot (if you need to).

  • Use unique snapshot names. That’s what the SRA expects.
  • If you cannot use unique names and you absolutely don’t want to unlink/relink, set the global options parameter TerminateCopySessions to YES. Your cleanup will fail because of the generation problem, but then you can do a force cleanup which will leave the original link in place. Not pretty, but effective.

Environment

I’m using the previous environment covered in the physical/virtual blog post. These are my devices and associated storage group. The devices are also placed in a composite/consistency group called composite_async.


Here are the virtual pairs SRM discovers. Remember it can’t see my physical pairs:

Manual snapshot

Since we are not going to have the SRA automatically create our snapshot and target devices, we have to do that ourselves. But since I am using two SRDF/A groups instead of one, I can’t take the snapshot in Unisphere, nor can I take individual snapshots of the storage groups. Either of those options would leave me with an inconsistent snapshot. Instead, I want to use my consistency group, just like the SRA would. So to take a manual snapshot I would issue the following from my R2 Solutions Enabler server. I’ve named the snapshot “ignoreactivate”, though the name has no association with the SRA:

symsnapvx establish -name ignoreactivate -cg composite_async -sid 883 -nop

All future snapshots will need to be executed like this. While we have a nice feature called “Snapshot Policies” which allow you to schedule snapshots in Unisphere, unfortunately it can’t be used in this type of configuration. Instead, this will have to be scripted and then scheduled via old school cron, or by using any of a variety of other scheduling software.

Linking snapshot

As we are creating our own snapshots for use with SRM, we also need to create the target devices that we will link to. This environment has four devices, two 100 GB and two 200 GB, so that is what I will need for the targets. You can use Unisphere or CLI to create these. After I created them, I put them in each respective storage group (physical and virtual) so that my Recovery Site ESXi servers will see them when I run the testfailover. You can also put them in a different masking view if you wish to avoid issues with SRDF management by storage group, but your call. Here are mine. Note at this point the devices have not been linked and could just as well be used by the environment to which they are presented.

So let’s link them. I’m using a device file that has my source R2s matched to these newly created devices. This is the snap.txt file, R2 on the left, new device on the right. Make sure you match them correctly or the physical device will complain the target is the wrong size:

30 4D
31 4E
2A 2C
2B 2D

And here is the symsnapvx command to link the targets. This is exactly what the SRA does for me if I use the automated functionality:

symsnapvx link -sid 883 -file snap.txt -snapshot_name ignoreactivate -nop

And my linked targets (the flag L is linked):

SRA XML files

Our manual work continues if we want to use the ignore snapshot functionality. There are two files we need to modify that make up the SRA. In other posts I’ve covered how to modify these files by downloading them via the appliance management so I’m only going to show the changes I am making. Be sure to make backups of the existing files before changing.

EmcSrdfSraGlobalOptions.xml

Since we are not going to use auto creation, we need to be sure AutoTargetDevice is turned off. Also as we have created the snapshot, we don’t want the SRA removing it. This means changing two more parameters – TerminateCopySessions and AutoTargetDeviceReuse (though technically this one is not used, if it is not set opposite the other parameter the SRA will error). Finally, we enable the IgnoreActivatedSnapshots parameter.

With auto off, we will need to tell the SRA what devices it is supposed to use for the test since it won’t be creating them. A couple comments here for clarity before proceeding. First, there is no ability to tell the SRA which snapshot to use by name or date or anything like that. The SRA works at the device level so that is how we tell it which linked snapshot to use. Second, once a snapshot is associated with a device (an active link), you can’t link that target device to another snapshot. You can link the same snapshot to some other new devices, but one session per target device. Therefore there is no way for the SRA to use the wrong snapshot as long as you give it the correct device ID. We good? So to tell the SRA which devices to use we go with the older SRA testfailover functionality which used an XML file. We tell it which devices are linked to the R2 snapshot.

EmcSrdfSraTestFailoverConfig.xml

And here is our file. Simply list each device pair (R2 is Source, snapshot target is Target) for both your virtual and physical environment. The SRA doesn’t have to care about consistency here. It trusts the snapshot is good and just needs to know the devices and what array they are on. There is a CopyType and CopyMode listed also, but they should be set as below 99% of the time. If you want more detail on how this file is used when not using the ignore snapshot parameter, and that 1%, please check the TechBook.

Testfailover

That’s it. Files saved, uploaded back to the SRM appliance, and now we are ready. Run your test. In my example (from the physical/virtual post) my two VMs come up, everything is good. When I’m done testing I can then run a cleanup. Remember that because of how we set the parameters, the SRA will not unlink nor delete our snapshot targets. But also remember that if you changed anything during the test, those are still there if you run the test again. You would have to relink the snapshot if you wanted a fresh copy.

So that’s when everything goes according to plan. But let’s take a look at the risk with using an old snapshots.

New VM with old snapshot

I mentioned I’d cover what happens when your manual snapshot does not match the current environment. Well lots of bad things can happen, but here’s the least of the bad. Let’s say today I added a new VM, VIRTUAL_VM_3, to my SRM protection group. My old snapshot, however, only has the two original VMs. So how will SRM react to that? Not happy, yet not fatal. You see below it was unable to recover the missing VM. It did handle the other two, though, so semi-success.

You might imagine, other SRM changes might cause more fatal issues, so that is why we always recommend using a snapshot that is close to the current date.

Advertisement

One thought on “PiTs with SRM and the SRDF SRA

Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: