Modifying VMware parameters with PowerMax – queues, et al

I’ll provide the warning up front that there is nothing new or exciting in this post. It falls into the category of repetitive posts (here’s another good example) I do sometimes in the hope of summarizing information for customers so they don’t have to wade through my tome of best practices. Of course with that being said, if you’ve read the tome, you can stop reading this post now since you already know the answer to the question of whether to modify VMware parameters. If you haven’t had need to read them yet, know that the information I am going to write about here is available in those documents if you want additional detail, though the advice is the same.

The questions

I’ve been receiving questions about how to set certain VMware parameters for many years, well since we had the documentation. And though this is all documented as I said, I continue to get the questions, sometimes referencing the very documents which explain how to handle the parameters :- )  The questions certainly are in earnest though, so I thought I’d cover the main reasons I still get asked about it and why the recommendations are what they are.

What are the recommendations?

Let’s start with what are those best practices around VMware parameters? Well, in general it is to take the defaults. There are some exceptions, XCOPY comes to mind, but they are few. In making those recommendations perhaps it is useful to understand our overarching principle. In making best practice recommendations the goal is not to cover all environments, but the majority of them. There will always be environments that have particular applications or workloads that tax the array in unique ways and may require changes to these parameters, but most do not. I’ve said this a number of ways in these docs, and even in this blog where I wrote a few years ago around best practices the following: “In general, do not change any of VMware’s default parameter values. All VMAX/PowerMax arrays work well without adjusting them and in fact changing them can result in performance issues.” For those environments that do require changes, there is absolutely no issue in making those changes, though you definitely want to test thoroughly as I’ve pointed out that you can have negative performance implications.

Queues are among the most common parameters I get asked about, but again in the TechBook I write: “For the majority of environments, Dell EMC recommends using the default queue values for both the HBAs and the VMware specific queues; however, due to performance [issues], some customers may require adjustments.” A special note about queues that sometimes the default is a value, but for other parameters like QFullThreshold which is an adaptive queue, the default actually means the parameter is disabled. And this is intended. By the way I don’t recommend you do tons of testing with different values for queues. We’ve done that which is how we came to recommend the defaults. Customers who need to make changes do so because a performance issue arises which they are able to pinpoint back to an area in the stack which can be impacted by a parameter. They do this analysis via esxtop and perhaps Unisphere for PowerMax, among other statistic gathering tools. We suggest involving VMware when making changes to the parameters as a result of such analysis.

Here are a few of the general reasons I get questions about these parameters with some explanation as to why they do not change the recommendations.

Secret settings

The secret setting! This is actually the most frequent type of query I get. It usually goes something like this: “You write in the best practices that this ‘x’ parameter should be left at the default value. But what do you really think we should set it to?” I promise you I have no secret settings, values, held-back opinions, etc. concerning VMware parameters that I’ve stashed away awaiting the chance to pull it out at the very last minute. Documentation is a laborious endeavor, not simply because it takes so much time to create and edit, but also all the testing that backs those words and images. The idea that I would go through it all and not include the actual best practice in it, is just not the case. And I understand that sometimes what is being asked is if there is some nuance to the setting, or perhaps an advanced way to look at it that I did not want to document. But there isn’t. If there were, I would include that in the doc, or at the very least on this blog for starters. And to prove my point there actually have been cases like that – for example when we had to adjust the IOPS recommendation for NMP for SRDF/Metro from 1 to something higher to avoid collisions and thus performance degradation. Through testing and customer incidents, we came to a new recommendation. It started as a post, and then was incorporated in the documentation. That is always my methodology, share as soon as I know.

Other arrays don’t use the default settings

This could be competitor arrays, but mostly I get this question because of other arrays in our family. Some of our arrays rely on very specific settings for VMware parameters, in particular XtremIO comes to mind. When our customers own both arrays it can be confusing when one array’s best practices require changes and the other does not. But the individual arrays in our family are unique, each with a different architecture, each with a different operating system. XtremIO has found that changing the default values of some VMware parameters improves performance for most environments, hence they recommend those changes. For the PowerMax, we have not, hence we recommend the defaults. We used the same testing methodology in determining the recommendations, but came to different results based on the underlying hardware. The only time there is a conflict is when customers run both arrays off the same host. In that use case we have a KB article that will guide in that particular scenario: https://www.dell.com/support/kbdoc/en-us/303782. You’ll find, however, that we still take the defaults for the most part.

Customize me

This question is of the type “We run “x”, and every other Saturday at midnight our latency is high, should we change VMware parameter “y” even though the docs say to take the default?” Now this is a perfectly legitimate question, and I have the answer in the recommendations where I write that sometimes changes are needed. So my answer would be maybe, but it would be irresponsible of me to make a random recommendation based on this type of information. The key in this case is the analysis. Find out where the latency is coming from – does the array indicate great performance but esxtop reports otherwise? Probably safe to assume you have to examine from the point it leaves the array (network) up through the ESXi stack. Then you keep narrowing it down until you find the culprit. My friend Cody Hosterman did a great blog post on this you may find useful that discusses how to figure out where your problem is. Yes, he works for Pure but the methodology he uses is sound for our arrays too. When in doubt involve VMware and Dell EMC support. In any case I always suggest involving VMware when changing their parameters.

I will surely continue to receive these questions, and that’s OK, since as I’ve said I write these posts for myself as much as you. Hopefully this more detailed explanation will aid in customers’ understanding.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.

Up ↑

%d bloggers like this: