FC-NVMe on PowerMax detail

I’ve been delayed in updating my documentation for vSphere 7 and as we have a new release for PowerMax coming out soon, I can’t publish what I have as it is intertwined with pre-GA material. Therefore I’m going to add some detail here about using FC-NVMe on PowerMax in vSphere 7 in the meantime. I’m not going for coherence per se, rather I just want to include the pertinent topics that will be in the published documentation next month.

FC-NVMe PowerMax configuration

Let’s start with the basics of FC-NVMe on PowerMax. There is nothing special about the FC-NVMe configuration on the PowerMax with vSphere 7. You want to follow the best practices in general for the FN emulation. You want to use 4 ports if you can, and the best performance is always going to be with a full 32 Gb infrastructure, though of course it is not required. Only your HBA on the servers need to be 32 Gb (and the PowerMax obviously), the SAN network might still be 16 Gb (or even 8 Gb) and everything will be negotiated down.

Driver configuration in VMware ESXi

The drivers provided by VMware as part of the VMware ESXi distribution should be utilized when connecting VMware ESXi to Dell EMC VMAX and PowerMax storage. However, Dell EMC E-Lab™ does perform extensive testing to ensure the BIOS, BootBIOS and the VMware supplied drivers work together properly with Dell EMC storage arrays.

FC-NVMe

When using FC-NVMe, different drivers and/or firmware may be required on the Gen 6 (32 Gb) HBA to support the protocol. In addition, modification of HBA parameters might be necessary. For instance, when using Emulex HBAs the parameter lpfc_enable_fc4_type should be set to 3 and the host rebooted:

esxcli system module parameters set -p lpfc_enable_fc4_type=3 -m lpfc -f

Note that changing this parameter will not prevent you from masking traditional FC to these HBAs if so zoned. Review the HBA vendor and VMware documentation for the correct configuration.

Device presentation

FC-NVMe devices will use the EUI prefix, or Extended Unique Identifier, and not the NAA or Network Addressing Authority. Either naming, however, is unique to that LUN. The NAA or EUI number is generated by the storage device. Since the NAA or EUI is unique to the LUN, that same naming is used across any ESXi host(s) to which the device is presented. This is what permits ESXi hosts to recognize the same datastore on that LUN, even if those hosts are in different vCenters (or standalone). Note that both the mobility ID and compatibility ID devices are shown.

FC-NVMe detail in vSphere

VMware offers specific esxcli commands for NVMe to list namespaces, controllers, adapters, etc. The options are shown below.

In addition, many of the command outputs are available in the vSphere Client. Simply highlight the appropriate HBA where the NVMe devices are presented. In the figure below are the details on devices, paths, namespaces, and controllers is available.

HBA queues in FC-NVMe

The PowerMax implementation of FC-NVMe supports multiple queues which is a feature of NVMeoF. The array controls the queue setting. There are 8 IO queues per connection or path, each with a queue depth of 128. This provides excellent scalability.

Pathing

Native Multipathing plug-ins

By default, the native multipathing plug-in (NMP) supplied by VMware is used to manage I/O for non-FC-NVMe devices. NMP can be configured to support fixed and round robin (RR) path selection polices (PSP). In addition, Dell EMC supports the use of ALUA (Asymmetrical Logical Unit Access) only with the Mobility ID for non-FC-NVMe devices.

NMP is not supported for FC-NVMe. VMware uses a different plug-in called the High-Performance Plug-in or HPP. This plug-in has been developed specifically for NVMe devices, though it is the default only for NVMeoF devices. For local NVMe devices, NMP is the default, though it can be changed to use HPP through claim rules. HPP only supports ALUA with NVMeoF devices, but unlike NMP, it is unnecessary to create a different claim rule for these devices as HPP is designed for ALUA. To support multi-pathing, HPP uses the Path Selection Schemes (PSS) when selecting physical paths for I/O requests. HPP supports the following PSS mechanisms:

Fixed
LB-RR (Load Balance – Round Robin)
LB-IOPS (Load Balance – IOPs)
LB-BYTES (Load Balance – Bytes)
Load Balance – Latency (LB-Latency)

HPP Path Selection Schemes (PSS)

The High-Performance Plug-in uses Path Selection Schemes (PSS) to manage multipathing just as NMP uses PSP. As noted above, HPP offers the following PSS options:

Fixed – Use a specific preferred path
LB-RR (Load Balance – Round Robin) – this is the default PSS. After 1000 IOPs or 10485760 bytes (whichever comes first), that path is switch in a round robin fashion. This is the equivalent of NMP PSP RR.
LB-IOPS (Load Balance – IOPs) – When 1000 IOPs are reached (or set number), VMware will switch paths to the one that has the least number of outstanding IOs.
LB-BYTES (Load Balance – Bytes) – When 10 MB are reached (or set number), VMware will switch paths to the one that has the least number of outstanding bytes.
Load Balance – Latency (LB-Latency) – this is the same mechanism available with NMP, VMware evaluates the paths and decides which one has the lowest latency.

Because the PSSs LB-IOPS, LB-BYTES, and Load Balance offer intelligence, they are superior PSSs to LB-RR or Fixed. As performance is paramount for FC-NVMe, Dell EMC recommends using the Load Balance PSS, or LB-Latency. It offers the best chance at uniform performance across the paths.

To set the PSS on an individual device, issue the following as seen below:

esxcli storage hpp device set -P LB-Latency -d eui.04505330303033380000976000019760

To add a claimrule so that this PSS is used for every FC-NVMe device at reboot issue. Note that we cannot pass the usual flag of “model” because that field is restricted to 16 characters and our Dell EMC model is 17 characters (EMC PowerMax_8000, EMC PowerMax_2000). For cases like these VMware offers the –nvme-controller-model flag.

esxcli storage core claimrule add -r 914 -t vendor --nvme-controller-model='EMC PowerMax_8000' -P HPP --config-string "pss=LB-Latency"

Latency threshold setting

By default, every I/O that passes through ESXi, goes through the I/O scheduler. It is possible that because of the speed of NVMe, using the scheduler might create internal queuing, thus slowing down the IO. VMware offers the ability to set a latency threshold so that any IO with a response time below the threshold will bypass the scheduler. When this mechanism is enabled, and the IO is below the threshold, the I/O passes directly from PSA through the HPP to the device driver.

For the mechanism to work, the observed average I/O latency must be lower than the set latency threshold. If the I/O latency exceeds the latency threshold, the IO temporarily returns to the I/O scheduler. The bypass is resumed when the average I/O latency drops below the latency threshold again.

There are a couple different ways to set the latency threshold. To list the existing thresholds, issue:

esxcli storage core device latencythreshold list

To set the latency at the device level issue:

esxcli storage core device latencythreshold set -d eui.36fe0068000009f1000097600bc724c2 -t 10

To set it for all Dell EMC NVMe devices issue:

esxcli storage core device latencythreshold set -v 'NVMe' -m 'EMC PowerMax_8000' -t 10

These settings will persist across reboot, but any new devices would require latencythreshold to be set on it. Dell EMC is making no specific recommendation around latencythreshold as VMware does not. There has not been any scale testing to date that provides data on the value of this parameter; however, Dell EMC supports the use of it if desired.

Managing HPP in vSphere Client

Claim rules and claiming operations must all be done through the CLI, but the ability to choose the HPP multipathing policy for FC-NVMe devices can be performed in the vSphere Client itself. By default, PowerMax FC-NVMe devices being managed by HPP will have a PSS set to the policy of “LB-RR”. As this is not a best practice, the PSS can be changed to LB-Latency either through CLI or the vSphere Client. Through CLI, for each device execute:

esxcli storage hpp device set -P LB-Latency -d <device_id>

Alternatively, each device can be manually changed in the vSphere Client below.

PowerPath/VE

PowerPath fully supports vSphere 7 and FC-NVMe. Once installed, no further configuration is required. PP/VE will automatically recognize FC-NVMe devices as ALUA, whether the mobility ID or compatibility ID is used.

APD/PDL

Just a quick mention that VMCP, and thus APD and PDL are supported with FC-NVMe. Here is an example.

Well that’s the bulk of the material for FC-NVMe that I’ll publish next month. I have not seen any production implementations yet on vSphere 7. It’s early for both, and as FC-NVMe has many restrictions, it will have to fill out for most customers to begin looking at it.

FC-NVMe on PowerMax detail

FC-NVMe PowerMax configuration

Driver configuration in VMware ESXi

FC-NVMe

Device presentation

FC-NVMe detail in vSphere

HBA queues in FC-NVMe

Pathing

Native Multipathing plug-ins

HPP Path Selection Schemes (PSS)

Latency threshold setting

Managing HPP in vSphere Client

PowerPath/VE

APD/PDL

2 thoughts on “FC-NVMe on PowerMax detail”

Add yours

Leave a comment Cancel reply

FC-NVMe PowerMax configuration

Driver configuration in VMware ESXi

FC-NVMe

Device presentation

FC-NVMe detail in vSphere

HBA queues in FC-NVMe

Pathing

Native Multipathing plug-ins

HPP Path Selection Schemes (PSS)

Latency threshold setting

Managing HPP in vSphere Client

PowerPath/VE

APD/PDL

Share this:

2 thoughts on “FC-NVMe on PowerMax detail”

Add yours

Leave a comment Cancel reply