vSphere 7 is GA today and introduces support for NVMeoF or NVMe over Fabrics. The PowerMax has supported NVMe over Fibre Channel (FC-NVMe) since our last major release, 5978.444.444, at that time making the PowerMax a complete NVMe solution from host to back-end. Until now, however, FC-NVMe could not be used in VMware environments.
For more detail please see FC-NVMe on PowerMax detail
So let’s start with a quick recap of NVMe and NVMeoF. I’m going to plagiarize myself (if there is such a thing) rather than linking you to a previous post. This way you don’t have to go back and forth and I don’t have to tell you what statements to ignore.
So what is NVM and NVMe and why are there two of them? NVM stands for non-volatile (media) memory. This is the media itself such as NAND-based flash or as I just mentioned, storage class memory. Even if you run a VMAX with flash, you have NVM. Now NVMe or non-volatile memory express is a set of standards which define a PCI Express (PCIe) interface used to efficiently access data storage volumes on NVM. The standards were developed by a consortium of a bunch of companies, including Dell. NVMe is a moving target as new specifications are always in flux. The first spec was a simple command set of about a dozen, but more and more features are being added in each spec, e.g. multipath, compare and write – locking (ATS). NVMe is all about concurrency, parallelism, and scalability to push performance. It replaces the other SCSI protocols which were never designed to work with the speeds of flash. Our PowerMax is an all-NVMe box so it benefits from this increase in speed, however there is a second part of the NVMe story and that is the network.
If we look at NVMe in its simplest form, it is a server with local NVMe storage. No network is required. VMware supports such servers with the NVMe interface and will recognize those disks as local NVMe. As you may recall, VMware even has an NVMe controller for vmdks placed on those disks (though you can use the controller with non-NVMe). But now with FC-NVMe support, VMware will recognize the NVMe drives on the PowerMax as that, not simply as flash storage.
NVMe over fabric defines a common storage architecture for accessing the NVMe block storage protocol over a storage network. This means going from server to SAN, including a front-end interface to NVMe storage. In vSphere 7, VMware is supporting both FC and RoCE in the first release. I suspect others are in the future, for example IP. As FC is the predominant technology for our customers, we are starting with FC-NVMe.
Fortunately, you can use your existing Fibre Channel network for FC-NVMe whether 8, 16 or 32 Gb (the network will auto-negotiate down as needed). Not only that, but FC-NVMe can co-exist with your traditional FC. This allows a seamless transition from one to the other. And what if you continue to run other non-VMware applications that need regular FC? No problem at all. In fact, for VMware, you can present storage on both traditional FC and FC-NVMe, then Storage vMotion VMs on FC to new datastores on FC-NVMe. Therefore moving to FC-NVMe can be accomplished with no downtime.
Obviously, to obtain the lowest latency you would want 32 Gb from host through array, all connections on the same switch to the same ports. Avoid things like ISLs.
Now how about the VM controllers? Does this mean I should use the newer NVMe controller for my VMs over VMware Paravirtual? Well this one is a bit confusing because in some places VMware says it is better for NVMe; however they clearly state that when using the HPP plug-in for pathing (which is default BTW because there is no support for NMP), you should use the VMware Paravirtual as a best practice. Generally I use VMware’s best practices and I’ve had good luck with Paravirtual so I’d go with that but if you do use the NVMe one, know that there are some requirements concerning the GuestOS and hardware version to be met before using the NVMe controller so be sure to review the documentation. When in doubt, I’d say stick with the Paravirtual but either is perfectly fine with PowerMax.
Finally I should mention that VMware’s implementation of FC-NVMe is not a complete conversion to the new command set. For a number of reasons there is still some command-set translation that goes on between the new protocol and traditional SCSI. This is not the type of thing that will increase latency, however.
I’ve done some testing comparing FC to FC-NVMe using our standard performance profiles, in particular random read miss (RRM) and random read hit (RRH). I used IOMETER as the testing tool running on a VM with 4 paths to a total of 8 datastores. Because my network is 16 Gb, but my HBAs and PowerMax cards are 32 Gb, everything was auto-negotiated down to 16 Gb. This I think is typical of a customer environment. Here are the results of the testing showing a relative performance benefit using FC-NVMe over FC, depending on the type of test. Remember of course this is just a lab exercise. Your results may vary for better or worse.
So what of the requirements to use FC-NVMe?
- The 32 Gb SLIC on the PowerMax array at code level 5978.479.479 with fix 104661
- A host HBA (Gen6) that supports FC-NVMe on the ESXi – be sure you have the latest firmware, vSphere 7 will have the drivers. If you happen to use Emulex, you may have to set a parameter to see NVMe devices (lpfc_enable_fc4_type=3)
- A Gen5 (8, 16 Gbps) or Gen6 (32 Gbps)
The following sections cover the important VMware/Dell EMC/General restrictions around FC-NVMe. Note that this is not an inclusive list, however these are the ones most likely to be of interest to the user. These restrictions will be lifted over time.
- vVols are not supported with FC-NVMe. The Protocol Endpoint can only be presented via FC or iSCSI. Support is on the road map, but it will require develop on both sides – VMware and the storage vendors, so it will take some time.
- RDMs are not supported, only VMFS. This is true of any NVMeoF implementation, FC or IP.
- No Site Recovery Manager (SRM) support. This is true of any NVMeoF implementation, FC or IP.
- No shared VMDKs (multiwriter) unless using vSphere 7.0 U1. BUT Dell EMC requires an RPQ. This is not the same as clustered vmdk which is not supported.
- No SCSI-2 reservations.
- Only 4 paths per namespace (device) and only 32 namespaces (devices) per host – so 128 total paths. (this is perhaps the most limiting of the restrictions)
- Only 2 HBA ports.
- VMware only supports ALUA with NVMeoF. Our devices advertise ALUA so this is not an issue.
- You can only present a device for use with FC or FC-NVMe, not both.
- There is no support for SRDF/Metro. Future support will require work from both VMware and Dell EMC, so it is not a small undertaking.
- There is both PowerPath\VE support for FC-NVMe with VMware (here), and VMware’s HPP plugin (NMP plugin is not supported). We strongly recommend PP/VE.
- No VSI support.
- There are no adjustable queues with FC-NVMe because, well, the technology works differently. In other words, don’t worry about it.
- VAAI Primitives
- ATS Compare and Write supported.
- UNMAP (deallocate in NVMe) supported.
- XCOPY not supported. This particular primitive has not been ported over to the NVMe command set yet.
- Block zero or WRITE SAME (Write Zeroes in NVMe) not supported. There appears to be some issue with translation at VMware’s layer. The only operation of note this impacts is when creating an eagerzeroedthick disk (EZT). So if your application requires EZT, the array will write all the zeros for the disk size. If you are using data reduction, generally the zeros will be reclaimed for you. If you do not have data reduction enabled, or see the zeros not being reclaimed after a time, you can get back the space by running an array reclaim on the device in Unisphere. Note that because the array supports WS, VMware will show it as supported if you query the device VAAI status. Also of interest there is that the VAAI Plugin Name is blank. This is because the commands are translated directly to NVMe without need of a plugin.
I created a demo below which shows both FC and FC-NVMe in the same environment and how you can move a VM between datastores accessed by different protocols.
Official certification is complete.
BTW I am working on updates to my documentation. In this vSphere 7 release there isn’t a huge amount of material to change or add from a storage perspective, but I do want to keep on top of it so I will try to have it out soon.