Not the prettiest title, but hey why beat around the bush with it. One of our European customers hit a bug with our SRDF SRA when they went to add Solutions Enabler array pairs in Photon SRM (Windows SRM is not impacted). It wasn’t clear this was a bug as the error is common enough in SRM when you try to discover arrays:
There are a myriad of reasons for this error, but the most common one is a certificate issue between the Solutions Enabler client (the SRDF SRA container) on the SRM server, and the Solutions Enabler server. To diagnose, you can look at the storsrvd.log file on the SE server and you are likely to see something like:
<Error> [2062355 SESS 2432] Feb-16 11:20:36.106 : ANR0151E Common Name in client certificate not valid: expected "server.global.com", received "pac852c410ie"
In this case the SE client is sending the hostname of the Docker container rather than the hostname of the SRM server. If I saw this I would likely conclude the customer either forgot to run the enableAutoSSLCertGen.sh script, or they did not run it after installing the SRDF SRA. The latter would be understandable since we used to run it before not after. In this case, however, the customer was following my blog post on the hot fix so they had done everything correctly. So what happened?
The big difference in this customer’s environment from the vast majority we see, is that they used multiple vendor arrays with the same SRM environment. This is perfectly fine of course, but the customer suspected that might be the heart of the issue because of the output of the enableAutoSSLCertGen.sh script.
As a reminder, the function of the script is to add one or two files to the docker volume of the SRDF SRA to ensure:
- The hostname of the SRM server is sent to the SE server and not the container hostname (hostname file) – mandatory and automatic
- If filtering is desired, we also store the vCenter credentials in another file (.emcpwddb file) – optional and manual entries
Using my environment, I’ll show you the problem.
I have three SRAs installed in my setup. VMware lists them alphabetically, but I installed the SRDF SRA last. And no I’m not blasphemous enough to actually be running HPE in my environment, but none of the SRAs check for storage during installation 🙂 Take care to see how each has a unique Docker image ID which I highlight below.
Now I run the enableAutoSSLCertGen.sh script. I do provide the vCenter credentials which is optional. I first list the containers so you can see there are three and that each one has an associated volume (/var/lib…). By comparing the Docker container image ID above, you can match it to the volume below. In order, therefore, they are Unity (red box above), SRDF SRA (blue box above), and finally HPE (green box above).
Everything completes successfully above, but looking at the output closely as our customer did, because there are three volumes how did we decide which one is the SRDF SRA? Not very logically, unfortunately. We just took the first one. Was it what we wanted? Well no, that was the Unity one. But since I am the root user running the script nothing stops it from putting it there:
The information in the files is correct but in the wrong place. If I proceed and try to add my array managers, I’m going to fail the discovery and end up with a storsrvd.log file that looks like this where the container hostname is sent:
Fortunately, the workaround is easy enough. There are two steps:
- Move the two files from the Unity docker volume (or whatever wrong volume you have) to the SRDF SRA one.
- Reload the SRDF SRA. <—- This is critical
So first move the files:
Next reload the SRDF SRA from within the SRM Appliance Management screen:
Return to add your management pairs and everything should be good. This fix does exactly what the script was supposed to do so it will persist through reboots.
I should add that it is possible, depending on the SRAs in the environment, that our script gets lucky and selects the SRDF SRA image, but if you are running multiple SRAs you need to check.
One scenario I did not include here is if you used the non-hot fix SRA which is version 9.2.0. If you use that version, the Repository tags is set to sradocker:latest. Unfortunately this was lazy labeling which other SRAs use also. Since SRM relies on the uniqueness of this tag, if you install our SRA and then install another SRA with the same tag (RecoverPoint uses this tag), it will overwrite our SRA. For those who know Docker, it is possible to change the tags, but we do not support that for our SRA. The proper solution is to upgrade which will allow both SRAs to co-exist.
*** Fixed in SRA 10.0 ***
We’ll have a hot fix for this in the coming weeks or months. It’s not a difficult fix, but between other priorities and the QA timeline, it won’t be next week. They want to be sure they have covered every scenario and fortunately the workaround is not too difficult. Many thanks to our European customer for finding the issue.
I wrote a KB outlining this issue which you can find here.