Along with the new PowerMax as expected we have a new SRA release, version 10.0. The primary feature of the release is to support Solutions Enabler 10.0 and the new platform – PowerMax 2500/8500. But there are a few items to discuss concerning the release, one a new requirement, the second a bug fix which changes the way the install or upgrade proceeds with the appliance version, and the third support for a new but old feature TimeFinder/Clone. There is also a change in the way that the SRA treats the parameters dealing with reusing devices, but fortunately it doesn’t require the user to do anything.
I’ll first take this time to report that these two restrictions are still in place in 10.0:
- 3-site SRDF/Metro reprotect is not supported
- MetroDR is not supported
Both these features require the same code changes in PowerMaxOS/SE which have not been added yet, and thus we cannot change the SRA to enable their use. They are coming, but it is not imminent. In addition, there is no longer a vApp for Solutions Enabler in 10 – it has been deprecated (and also Unisphere vApp). You have to use a regular OS – physical or virtual.
This is both an old and new requirement. Creation of a device or composite/consistency group has always been presented as a prerequisite for running tests with the SRDF SRA, but there was a failsafe mechanism as part of the SRA which would create groups for the user if they were not present. This failsafe, however, did not always work, and therefore manual creation of the groups would be required anyway. In the SRDF SRA 10, this is now a hard-coded requirement.
If you fail to create the group(s), I’ve seen two issues. The first seems transient, but the second is permanent. The first issue revolves around device discovery. In a large environment with many SRDF groups, my device discovery returned partial results until I created the CGs. Technically I do not believe the behavior is expected, but I did see it. The second issue is coded so you will see it. If you run a test without the CGs you will receive the following error:
One or more devices are not present in a CG/DG.
The error is highlighted in the screenshot below.
Once you receive the error you will have to run the cleanup with force, which means twice since you can’t force a cleanup in the first run. Remember this only impacts testing. Groups are not required for running failovers.
Multiple SRA bug
I wrote about this problem in this previous post. In summary, when there is more than one SRA on the appliance, the enableAutoSSLCertGen.sh script creates the hostname and password files on the first docker volume. This is not always the SRDF SRA so putting these files in the wrong place means your array manager discovery fails with a security mismatch (we send the docker hostname rather than the appliance hostname). We’ve instituted the fix into the SRA 10 enableAutoSSLCertGen.sh script. When executing the script as root, you will now have to supply the repository and tag in the command. Here I execute the script as was done in previous releases and am provided the usage.
Determining this repository tag sounds more daunting then it is, bringing to mind CLI and searching for a container ID perhaps. That’s not the case. All the script needs is the repository tags displayed in the SRA management screen of the appliance. I’ve highlighted it below.
And this value is not unique to your environment. It is only unique to the SRA build. So every customer runs the same command and you will to on both the protection and recovery sites. Below is a proper execution of the script. Note we still output the syntax even though it is run correctly which I found confusing (don’t know why but there it is).
Simple enough I hope.
We have supported TimeFinder/Clone with the SRA actually since inception of the product. Back then TF/Clone was its own software but after TF/SnapVX was introduced, TF/Clone was run as an emulation, meaning SnapVX was the technology driving it. With Solutions Enabler 10, however, TF/Clone is now back as its own technology and has no association with SnapVX. Funnily, our test failover example file has never changed from using CLONE as the default copy type:
<?xml version="1.0" encoding="ISO-8859-1"?> <TestFailoverInfo> <Version>10.0</Version> <CopyInfo> <ArrayId></ArrayId> <CopyType>CLONE</CopyType> <CopyMode>NOCOPY</CopyMode> <DeviceList> <DevicePair> <Source></Source> <Target></Target> </DevicePair> </DeviceList> </CopyInfo> </TestFailoverInfo>
OK you say but why did we bring it back and does that mean I should be using it? As to why it’s back, mostly customer demand. A clone is unlike a snapshot because it is a point-in-time full copy of the source volume(s) with no ability to rewind the changes. With SnapVX you first must create the snapshot and then if you want to use it, create a target and mount the snapshot to that. With clone you create the target and issue one command to create the copy. You don’t have to unlink the target later or delete the snapshot. Some customers find it useful to keep a set of target volumes and keep reusing them with new clones on demand and either don’t use SnapVX at all or only for backup/logical corruption. Should you use it with the SRA? For the vast majority of customers, no. Since the SRA automates all the SnapVX steps for you, there is little benefit in using clone. In addition, you could not use the automated target device creation of the SRA, rather you’d have to create target devices and maintain the testfailover XML file which can be a bear as those who have used it will attest. Can I think of a use case for clone? Well, let’s say you wanted to do extended testing of your environment for some reason, like UAT or implementing some new software which will require days of changes. That might be a useful scenario to use a clone rather than keep a snapshot around which must keep track of changes while it exists. There could be other cases, but if you aren’t using it today, there really is no reason to switch.
In any case, the support of TF/Clone in the SRDF SRA 10.0 does not require the user to complete any different steps than they do today if they use clone with the SnapVX emulation (which is now defunct). You need to pre-create the target volumes and you need to set the parameter AutoTargetDevice to No and modify the EmcSrdfSraTestFailoverConfig.xml with the pairs. One thing to keep in mind in case you use the symclone command outside the SRA, is that clones are not crash consistent by default like snapshots. When you create a clone you need to pass the consistent flag with it. We hope in a future version it becomes the default, but for now use the flag. The SRA uses the flag (among others). You can actually see it issued if you check the symapi log file on the remote SE manager. Don’t let the SNAP confuse you though – this log snippet is from my run in the demo.
04/10/2022 20:49:23.245 1575 STARTING a Clone 'ACTIVATE' operation for 1 [SRC-TGT] Pair: 04/10/2022 20:49:23.245 1575 dg: clone, flags: (SNAP)(Consistent)(tgt_dev) 04/10/2022 20:49:23.268 1575 Symm 000120200341 Number of Pairs: 1 Operation Flags: InstantSplit, Multi 04/10/2022 20:49:23.268 1575 Source-Target Devices: [ 0125-014E ]
I’m not sure this is necessary since not a lot new, but I did a demo of using TF/Clone. Though I do not show it in the demo, I set my options file to reuse my device and therefore at the end you will see my clone session ends with “Recreated” meaning it is prepped for the the next activate. We only recommend reusing the device if you are doing a lot of testing. For your last test always terminate the session because the clone recreation will use resources as it hangs around for weeks.
There’s been a tweak to how the SRA treats two parameters in the Global options file:
In SRA versions prior to 10.0, these two parameters had to be opposite or the SRA would kick out an error. They were treated as a pair. Starting in SRA 10.0, these two now represent the two different types of testfailover – manually with pre-created targets using the testfailover XML, and allowing the SRA to create the devices. If you use the testfailover XML, you only need to use the TerminateCopySessions parameter. Set it to “Yes” keep the snapshots, set it to “No” to remove them. When letting the SRA create the devices, you will use AutoTargetDeviceReuse. Setting it to “Yes” keeps the devices, setting it to “No” removes the devices.
The good thing, is the SRA only cares about one of the parameters at a time; however if you leave the TerminateCopySessions to “Yes”, but are using AutoTargetDevice, then you’ll see this message in the log file. To be honest, I really don’t think they needed to put this in there, but anyway you can ignore it.
[WARNING]: Invalid global configuration options. TerminateCopySession operates with AutoTargetDevice flag disabled