vSphere 6.7 U1 & NMP latency

In vSphere 6.7 VMware released, albeit without fanfare, a new type of NMP, latency. Most customers probably never even knew it was there – I was among them until a colleague told me about it. In Update 1 of 6.7 released today here, they’ve supplied the documentation to go along with the feature so we’ll assume VMware doesn’t want you to use it before U1. What the new capability offers is to allow VMware to test the performance of your paths to your device so that IO is routed more efficiently. VMware is calling this feature Enhanced Round Robin Load Balancing. I’m going to talk about how this option works in this post, and then, for the sake of length, follow it with a separate post on where to use it. So let’s start with what it is and how this works.

As a quick reminder, although this post is going to focus on NMP, it in no way changes our best practice of using PowerPath/VE with VMware. It is a superior pathing product to NMP even with the new latency feature; however, we do have many customers that use NMP so we want to be sure to provide the best recommendations around that pathing product.

So what is this enhancement all about? There are currently only two “types” of pathing available with NMP that we use with VMAX and PowerMax – Fixed and Round Robin. We no longer support the third type, Most Recently Used (MRU). Of the two supported types, Round Robin is really the only viable option. We used to require Fixed path with Gatekeepers, but that was long ago. Fixed restricts the path and does not allow you to spread IO across multiple paths as Round Robin does. In fact, Round Robin is our default policy (VMW_PSP_RR) when you present VMAX or PowerMax devices to an ESXi host. You can see it here if you look at the SATPs (Storage Array Type Plug-Ins).

Note we still use “EMC Symmetrix” as the description and part of the name (SYMM) despite no longer calling our arrays that. It provides consistency across ESXi releases to leave it as that name – think SRDF which also uses the Symmetrix name. It is simply a moniker and has no bearing on functionality.

Round Robin works just as the name suggests, VMware switches from one path to another, across however many paths there are, sending IO. So when does VMware switch? Well there are two ways to control that – by the number of IOs (type=iops) or by the size of the data (type=bytes) being sent. By default, VMware uses type=iops and will send 1000 IOs down a path before switching to the next one. Now VMware found many, many years ago that waiting for 1000 IOs before changing paths did not produce the best performance. They therefore recommend changing that number to 1. Dell EMC also recommends making this change. BTW if this is news to you please read the TechBook to learn how to make that change as it easy and does not require downtime 🙂

Alright, with the history lesson out of the way what did VMware change? Well one thing you may have deduced from VMware’s model is that neither bytes nor IOs takes into account the response time/latency of the path in use. Certainly VMware is capable of recognizing a dead path, but if there is congestion at say the port, VMware is still sending the IO since it doesn’t know any better. With U1, there is now a third “type” for Round Robin, latency. When type=latency is set on a device, both the latency and pending IOs of a path are used to determine whether an IO should be sent down a particular path. Good right? OK how do you set it?

By default, the latency option of Round Robin is disabled in GA 6.7 but should be enabled in U1 with a new install or upgrade. If you want to double-check, you can run the following (this is GA 6.7):

[root@dsib0142:~] esxcfg-advcfg -g /Misc/EnablePSPLatencyPolicy
Value of EnablePSPLatencyPolicy is 0

If necessary, to enable run the following:

[root@dsib0142:~] esxcfg-advcfg -s 1 /Misc/EnablePSPLatencyPolicy
Value of EnablePSPLatencyPolicy is 1

The GUI is here (this is U1):

Once enabled, you can then apply the new type to a device(s):

esxcli storage nmp psp roundrobin deviceconfig set -d  --type=latency

Here is an example of setting latency for a device and then the command to see that it is properly set to Limit Type of Latency:

So how does VMware decide what path to select? VMware monitors all the active paths and calculates the average latency based on either time or number of IOs. Once the type of latency is set, the first 16 IOs per active path are used to calculate the latency. Subsequent IOs will be directed to the path with the lowest latency. In environments with congestion challenges, this mechanism can provide better results than the current recommendation of switching paths every 1 IO.

I should point out that VMware doesn’t simply sample 16 IOs and then use the results forever. There is a sampling window, so VMware will re-test every 3 minutes by default. If we go back to the previous screenshot where I displayed the results of the set command:

[root@dsib1115:~] esxcli storage nmp psp roundrobin deviceconfig get --device=naa.600009700bc724652e29001600000000
   Byte Limit: 0
   Device: naa.600009700bc724652e29001600000000
   IOOperation Limit: 0
   Latency Evaluation Interval: 180000 milliseconds
   Limit Type: Latency
   Number Of Sampling IOs Per Path: 16
   Use Active Unoptimized Paths: false

You can see there is a Latency Evaluation Interval and Number of Sampling IOs Per Path. These set the time between re-assessing the paths – default 3 minutes– and the number of IOs to use for the sample – default 16– respectively. If you wish to make changes to these values you can do so when setting the type to latency. To change the evaluation time, use the -T switch, while the number of sampling cycle is changed using -S.

[root@dsib0142:~] esxcli storage nmp psp roundrobin deviceconfig set -t latency -d xxxxx

Usage: esxcli storage nmp psp roundrobin deviceconfig set [cmd options]

Description:
set Allow setting of the Round Robin path options on a given device controlled by the Round Robin Selection Policy.

Cmd options:
-B|--bytes= When the --type option is set to 'bytes' this is the value that will be assigned to the byte limit value for this device.
-g|--cfgfile Update the config file and runtime with the new setting. In case device is claimed by another PSP, ignore any errors when applying to runtime configuration.
-d|--device= The device you wish to set the Round Robin settings for. This device must be controlled by the Round Robin Path Selection Policy(except when -g is specified) (required)
-I|--iops= When the --type option is set to 'iops' this is the value that will be assigned to the I/O operation limit value for this device.
-T|--latency-eval-time=
When the --type option is set to 'latency' this value can control at what interval (in ms) the latency of paths should be evaluated.
-S|--num-sampling-cycles=
When the --type option is set to 'latency' this value will control how many sample IOs should be issued on each path to calculate latency of the path.
-t|--type= Set the type of the Round Robin path switching that should be enabled for this device. Valid values for type are:
bytes: Set the trigger for path switching based on the number of bytes sent down a path.
default: Set the trigger for path switching back to default values.
iops: Set the trigger for path switching based on the number of I/O operations on a path.
latency: Set the trigger for path switching based on latency and pending IOs on path.
-U|--useano= Set useano to true,to also include non-optimizedpaths in the set of active paths used to issue I/Os on this device,otherwise set it to false
[root@dsib0142:~]

Although these parameters can be adjusted, VMware does not recommend changing these values as the default ones are not arbitrary. They are set based on performance testing. I did some of my own testing strictly out of curiosity and did not find any significant differences in performance, but I have no reason to doubt VMware’s conclusion so I also advise not to adjust the values.

VMware did a good write-up here on how the testing was done. It’s definitely worth a look. And because it includes how to setup host profiles along with some other CLI information, I’m not going to cover that. However, if you want to know where to use type=latency with VMAX and PowerMax, continue on to my next post here.

BTW even if you aren’t going to use this feature, it’s a good idea to upgrade to U1. There are lots of bug fixes, among them one for SEsparse snapshots which can corrupt your VM and anything running on it. Yes, I said corrupt like an Oracle database. Here is the KB if you want more detail: and the current workaround if you can’t upgrade right away: https://kb.vmware.com/s/article/59216

Advertisement

5 thoughts on “vSphere 6.7 U1 & NMP latency

Add yours

  1. Hi Drew, this sounds like a huge enhancement for VPLEX uniform constructs. Even it´s strongly recommended to use PowerPath, almost every customer I know uses Round Robin. (Metro´s are next to each other)
    But I only know about XtremIO documents saying to change default values from 1000 IOs to 1. (brings 25-30% more performance) VPLEX documents still don´t recommend it. So maybe someone should talk with the VPLEX team?

    1. Hi Marc,

      Yes, though I have not tested VPLEX with this, I suspect that type=latency would benefit VPLEX uniform vMSC much as it does VMAX/PowerMax. As far as changing IOPS from 1000 to 1, those XIO numbers look awfully high to me, but I assume they’ve published their results from testing and as I have no special XIO knowledge I can’t comment further on their platform; however, on all arrays I have tested I have never achieved higher than 10% benefit, and then only in particular workloads. On VMAX/PowerMax our documented recommendation is specifically for regular VMware environments and non-uniform vMSC. The reason we recommend IOPS of 1 is equally or more about quicker detection of hardware/path issues and to spread IO evenly as much as it is about performance. I think any performance gain makes the change worthwhile, but the other benefits are really where the value is. Conversely, in uniform vMSC we recommend 1000 over 1 (as I note in the recent and older posts) to avoid contention. When the arrays are co-located, customers can use 1 though I still recommend 1000 having seen occasional contention (depending on workload). On VPLEX if they are silent on the issue, then feel free to use 1 as that is VMware’s general recommendation. The nice thing about the setting is it can be changed online if an issue arises. If you would like a more official response for VPLEX, feel free to open an SR with the question and it should get back to their team.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: