vSphere Automatic UNMAP/XCOPY issue

Patch fix is available here:

The relevant fix you can find in the release notes is:

PR 2411494: VMFS6 automatic asynchronous reclamation of free space might work at a higher space reclamation priority than configured

******** Update 10-14 ********

VMware has formalized the fix I tested and it is set to be included in the next patch for vSphere 6.7 (ESXi 6.7 Patch 01(P01)), which I believe should come by the end of the year (December). If the issue is causing major problems in your environment, however, you could open an SR and request the fix at your current patch level.

****************************

******** Update 9-6 ********

Dell EMC issued an advisory on this now. VMware is close to a fix, I’ve done some testing already.

The link is here:  DTA 536473: VMAX3, VMAX AFA, PowerMax Series: Storage VMotion from VMWare ESXi 6.7 with Automatic Space Reclamation (UNMAP) enabled may lead to performance issues on the array.

******** Update 9-6 ********

******** Update 8-5 ********

VMware has acknowledged that UNMAP is not being properly throttled to the value specified on a datastore. For instance, if you leave the datastore at the default UNMAP of 25 MB/s, VMware will not honor that. I’ve done recent testing showing consistent 1 GB/s rates for thick vmdks (thin rates vary, though all higher than 25 MB/s). These higher rates make it very difficult for VMAX/PowerMax to keep up and avoid the XCOPY interaction. VMware engineering is working on a solution/workaround (other than that noted below). Here is the KB explaining this: https://kb.vmware.com/s/article/74514

******** Update 8-5 ********

Well VAAI seems an unwelcome topic these days. I recently posted about how to address performance concerns in relation to SRDF here and now I’m back today with something else. This new issue is specific to VMFS 6 with automated UNMAP and Storage vMotion.

Before beginning, I'm going to assume the reader knows how our VAAI implementations work. I have multiple posts on the site as well as documents you can read linked here.

Issue

We’ve had this new problem crop up a few times recently when our customers have the following setup:

  • A heavily utilized array
  • No SRDF or only SRDF/A on devices involved
  • A VMFS 6 datastore as source device with automated UNMAP enabled (default)

The customer then runs one, or multiple, Storage vMotions from one datastore to the other. Assuming no conditions which prevent XCOPY from being issued (see paper for details), the VM will be moved from one datastore to the other using XCOPY. After the array receives all the commands from VMware, we report the move complete, while in the background we copy the data using SnapVX (TimeFinder). So all that is working as normal; however the problem comes when VMware issues UNMAP against the recently vacated datastore immediately after we report the move complete. The move, in fact, is not complete as we are still copying data. We now have to deal with UNMAP commands against tracks which cannot yet be reclaimed until the copy is done. The copy, therefore, becomes a top priority. This will invariably impact performance since the copy is no longer a background process and this can lead to IO timeouts or aborts on the device. I should note nothing breaks here. Everything works, but there are performance implications.

Workarounds

Though most environments will not encounter this issue, if you do, the workaround for VAAI issues is very consistent – disabling. This allows host-based mechanisms to be used. If you run into this problem, there are two options here. You can either disable automated UNMAP on the datastore or disable XCOPY on the host or array. I would recommend disabling automated UNMAP since this still leaves the possibility of using manual UNMAP if necessary. Manual UNMAP has its own concerns but can be run during maintenance windows or low activity when no SvMotions are planned. Note that if you are using Storage DRS, you would want to set it to manual mode before running manual UNMAP and then re-setting to automatic after the command is complete.

There is a Dell EMC KB article here:  https://support.emc.com/kb/534895

 

PSA_DTW

Advertisement

5 thoughts on “vSphere Automatic UNMAP/XCOPY issue

Add yours

  1. Very good write up. Make sure you give my customer (Jason Mooney) a shout out/credit. He was the one that found this and had to prove (for a few months) to Dell/EMC and VMware there was this Pref issue. :O)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: