Oracle RAC vVol test failover without SRM (import snapshot)

Our vVol 2.0/VASA 3.0 implementation which includes SRDF replication, is best implemented with VMware Site Recovery Manager (SRM). We do not recommend using PowerCLI as there are manual processes involved and there are known issues. There’s really little comparison between the two. SRM handles all the orchestration for you and is the superior solution for DR with vVols. But, and yes there is a but, SRM is a VM-based solution. It’s an all or nothing proposition. I can’t replicate part of a VM. Then again, you might ask why anyone would want to. Fair enough, the use case list is not long; yet one immediately sprung to my mind while writing a paper on Oracle RAC with vVols.

So my use case for PowerCLI involves refreshing an Oracle RAC database without having to failover the VM RAC nodes and deal with all the networking headaches in a lab.  I say lab because generally customers have a robust network setup such that they can use VLANs to get around IP conflicts, though not always. And Oracle RAC is just a nightmare when it comes to duplicating an environment like SRM will do. This is because you can’t simply rename Oracle RAC hosts. The clusterware doesn’t take kindly to that. You can change IPs (and in the future paper I’ll show you how), but hostname changes essentially require a grid software rebuild. No one wants that. Our best option, therefore, is to keep the grid software intact, and refresh the database in situ.

Managed vs Unmanaged snapshots

Before I go into the CLI, I want to revisit managed versus unmanaged snapshots in the vVol world. The easiest distinction to make is a managed snapshot is one VMware executes via its software, unmanaged is one I take with the array software. We don’t support unmanaged snapshots with PowerMax. But there is some subtlety here which comes into play with replication and SRM. So I like to think of a managed snapshot as being one that VMware controls – VMware doesn’t have to be the one initiating it, it just has to know about it. So that means me right-clicking on a VM in vCenter and requesting a snapshot is “managed”, but also me requesting a test failover in SRM is “managed”, even though on the PowerMax we are the ones creating the snapshots on replicated devices (see the whitepaper in the Documentation Library for detail). Unmanaged, on the other hand, is one that VMware knows nothing about which would be one I take in Unisphere or the CLI (which obviously we can’t do though some array vendors can). So that’s why we don’t support unmanaged snapshots.

But where it gets gray is if I use PowerCLI instead of SRM to initiate a test failover because I am going around SRM which would normally handle the orchestration for me. SRM is fully integrated with vVols and the VASA Provider. It knows all about my VMs, my replicated devices, and my snapshots of those replicated devices. It knows what commands to issue and when to bring up my test or failed over environment. PowerCLI, however, is more of a backdoor. Instead of SRM playing a role with the VASA Provider, I make a call directly via VMware to the VASA Provider to initiate a test or failover of a VASA replication group. I still have no control over what snapshot will be used (we maintain five and use the latest for tests or failover), but since I am not using SRM, VMware is unaware of the test or failed over vVols. The command I issue is the same (functionally) that SRM issues, but it lacks any of the other information that SRM has access to, or receives from the array. And at this point managed and unmanaged blur a bit – because VMware lacks that extra info, I’ve essentially created “unmanaged” snapshots with PowerCLI.  And as such I have to tell VMware I now have test or failed over vVols available. The way I do that is by importing an…wait for it…unmanaged snapshot.

I hope that was not too confusing because it is essential to understand it if you plan on using this functionality on the PowerMax. If you’re still wondering what really is an unmanaged snapshot, given my blurring above, then I’ll give a quick example. One of our competitors that has an orange tinge allows the user to go into their array GUI interface (Unisphere-equivalent) and take a snapshot of any vVol device (replicated or not). Once you know the vVol WWN/ID of that snapshot device, you could then import it into VMware as a truly unmanaged snapshot, unrelated to SRM in any way.

Enough of competitors though, here is the procedure for creating three test vVols which contain an Oracle ASM disk group and database.

Environment

As I noted from the start, I’m using an Oracle RAC environment as my test bed. The production database consists of two Oracle RAC nodes with ASM for storage (using the AFD library). The vvol database is located in the VVOL_REFRESH disk group which contains three disks:

  • /dev/sdc
  • /dev/sdd
  • /dev/sde

These three disks for this ASM disk group are assigned a storage policy that is different than the disks that hold the software or the DATA ASM disk group. See below where my DATA disk group is not replicated, but the VVOL_REFRESH disks are, and the VASA replication group is ORAPCLI.

The three disks are in group 37 on the array. Just a reminder if I was using SRM, I’d have to replicate all vmdks of the VM, not just these three. Since our replication is asynchronous, all these three disks will be snapped together. And as ASM disk groups are independent, I am going to be able to mount only the VVOL_REFRESH group on the test VM.

Speaking of the test environment, I am using a single VM there, though I still have Oracle RAC configured. It is setup similarly to production in that I have two disks for the software and one disk for a DATA disk group (which contains my voting and cluster information). I have no disks for any other ASM group as I am going to add the test ones from the production VM.

I have an init.ora file for the vvol database and I configured srvctl. So once my ASM disk group is refreshed on this VM I can immediately mount it and start the database.

Enough with the prelude, let’s run the procedure.

Procedure

In order to run this, you’ll need PowerShell (which everyone has) and the PowerCLI modules which you can download as a zip or install directly:

PS C:\WINDOWS\system32> $PSVersionTable 
Name Value
---- -----
PSVersion 5.1.18362.1593
PSEdition Desktop
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
BuildVersion 10.0.18362.1593
CLRVersion 4.0.30319.42000
WSManStackVersion 3.0
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1


PS C:\WINDOWS\system32> Find-Module -Name VMware.PowerCLI 
Version Name Repository Description
------- ---- ---------- -----------
12.3.0.... VMware.PowerCLI PSGallery This Windows PowerShell module contains VMware.PowerCLI

Once you have these, connect to your target vCenter.

Testfailover

WWN identification

First, I need to execute a test failover of the VASA replication group. This command will tell the VASA Provider to put the target array replication group into “Intest” mode (as below),

create snap target devices, and link them to the latest snapshot. But before I run the command, we need to consider how to get the WWN of the target snap devices because we will be unable to use the GUI or symsnapvx command to do so. I’ll preface this discussion by saying that it is possible to develop scripting around my methodology or another similar one. I may do so in the future, but for now I’m just explaining a couple ways you might find these target devices.

The most basic way to get the information, assuming your vVol environment is not incredibly active, is to run the following command before and after the test failover. It will reveal the new vVol devices.

symdev list -vvol -sid xxx

A similar way, and the one I used, is to determine which vVol devices are snapshot targets. If you have a more active environment, this will likely help. First, find out if there are any target devices currently. We have a row in the verbose output of symdev which identifies the device as a snapshot, Snapvx Target. So I look for True.

symdev list -vvol -sid xxx -v|grep "Snapvx Target : True"

If any results are returned, you’ll need to output the verbose results to a file then search for the row. So:

symdev list -vvol -sid xxx -v > verbose_output.txt

Then search the file. The device will be listed there.

With the device in hand, you can list the wwn (or you could pull it from the verbose above):

[root@dsib2017 ~]# symdev list -sid xxx -vvol -wwn |grep 176
00176 Not Visible VVOL 600009700BC7246338010036000011C2

Test failover command

Two commands are required for this step. We need to set the VASA replication group name (ORAPCLI from above), then issue the failover. It takes less than a minute to execute. As I was connected to another vCenter at the time, I specified the target one here with the -Server, but if you just connect to the target vCenter then you don’t need this parameter.

When the command completes, I go through the process above to find my three WWNs for the linked targets which are:

  • 600009700BC7246338010036000011C1
  • 600009700BC7246338010036000011C2
  • 600009700BC7246338010036000011C3

These are the devices we are going to import as snapshots since, again, VMware does not know about them and therefore you can’t see them in the target vVol datastore after the test failover.

Import the snapshots (linked targets)

The method we use to import the snapshots is called, appropriately enough, ImportUnmanagedSnapshot which is under virtualDiskManager. The method will take three parameters:

  • The vmdk path (which you assign)
  • The datacenter (target)
  • The vVol ID or for us the WWN. Note as with other disks, we will require a prefix which is naa. in this case.

We’ll need to begin by setting the virtualDiskManager to our view of the target vCenter:

$virtualDiskManager = Get-View (Get-View ServiceInstance).Content.virtualDiskManager -Server 10.228.246.224

Now assign the datacenter to a variable so we call call it. We will actually only reference the managed object reference in the ultimate command (MoRef).

$dc = get-datacenter "Boston"

So we have three vVols and thus we’ll need three variables as we will make the import call three times. Again, note the naa prefix.

$uuid1 = "naa.600009700BC7246338010036000011C1"
$uuid2 = "naa.600009700BC7246338010036000011C2"
$uuid3 = "naa.600009700BC7246338010036000011C3"

Finally, we need the vmdk path in the target vVol datastore. You can place these vmdks wherever you want in the vVol datastore, but the path (folders) must be there. I am going to put mine in the same folder as my test VM. The datastore goes in brackets, followed by folder(s), then vmdk name. The vmdk name is your choice, too. I like to keep mine the same name as the production environment so I know where they come from, but whatever works for you.

$vvolpath1 = "[vVol_Oracle_PowerCLI] dsib1246.lss.emc.com/dsib1236.lss.emc.com_1.vmdk"
$vvolpath2 = "[vVol_Oracle_PowerCLI] dsib1246.lss.emc.com/dsib1236.lss.emc.com_2.vmdk"
$vvolpath3 = "[vVol_Oracle_PowerCLI] dsib1246.lss.emc.com/dsib1236.lss.emc.com_3.vmdk"

OK now we can run the import command. When we run the command VMware will import the vVol into the VASA DB, bind it to a Protocol Endpoint, and create the pointer file in the datastore so that we can add it to a VM. The commands will complete fairly quickly.

When the commands complete, we can check the vVol datastore and see our imported vVols.

Add to VM

With the vVols available, we want to add them using the add “Existing Hard Disk” functionality. Normally you should be able to do this with the vCenter GUI or CLI, but the vCenter GUI will produce a VMware storage policy error that is unique to say the least.

The reason the vCenter does not work is because of the SPBM plugin. Basically VMware is looking for a replication group for this vVol as it is a mandatory field in the spec. But we don’t want to use a vVol policy with replication for this test copy, so we can’t use the vCenter. Technically it may be a bug, but as there are workarounds which don’t use the spec (including I think govc), it’s not a big deal. BTW if you don’t want to use the CLI, the vSphere Client GUI (directly attached to the ESXi host of the VM) does work also.

Anyway I like the CLI so here are those commands:

$vm = get-vm dsib1246.lss.emc.com
$vm |new-harddisk -Diskpath $vvolpath1
$vm |new-harddisk -Diskpath $vvolpath2
$vm |new-harddisk -Diskpath $vvolpath3

Now our disks are present. They will come in with a Datastore Default policy.

vVol RAC database refresh

Once the VM is assigned the vmdks, ASM will immediately recognize the devices and show the ASM disk group as DISMOUNTED. I can then mount the disk group, and then start the vvol database and I’m ready to test.

All in all a pretty slick way to deal with Oracle RAC vVol database testing without having to constantly rebuild the grid.

Ending test

When you are done testing, you can reverse the procedure.

  • Shutdown the database
  • Dismount the ASM disk group
  • Remove the devices from the VM (both GUI and CLI work for this), include the checkbox to remove the vmdk pointer
  • Stop the test failover

If you fail to do some of the steps above, when you run the command to stop the test you will get this error telling you the device is still in use.

But if things are done correctly, the stop test will run successfully, and again quite quickly:

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: