Patch fix is available here:
The relevant fix you can find in the release notes is:
PR 2411494: VMFS6 automatic asynchronous reclamation of free space might work at a higher space reclamation priority than configured
******** Update 10-14 ********
VMware has formalized the fix I tested and it is set to be included in the next patch for vSphere 6.7 (ESXi 6.7 Patch 01(P01)), which I believe should come by the end of the year (December). If the issue is causing major problems in your environment, however, you could open an SR and request the fix at your current patch level.
****************************
******** Update 9-6 ********
Dell EMC issued an advisory on this now. VMware is close to a fix, I’ve done some testing already.
******** Update 9-6 ********
******** Update 8-5 ********
VMware has acknowledged that UNMAP is not being properly throttled to the value specified on a datastore. For instance, if you leave the datastore at the default UNMAP of 25 MB/s, VMware will not honor that. I’ve done recent testing showing consistent 1 GB/s rates for thick vmdks (thin rates vary, though all higher than 25 MB/s). These higher rates make it very difficult for VMAX/PowerMax to keep up and avoid the XCOPY interaction. VMware engineering is working on a solution/workaround (other than that noted below). Here is the KB explaining this: https://kb.vmware.com/s/article/74514
******** Update 8-5 ********
Well VAAI seems an unwelcome topic these days. I recently posted about how to address performance concerns in relation to SRDF here and now I’m back today with something else. This new issue is specific to VMFS 6 with automated UNMAP and Storage vMotion.
Before beginning, I'm going to assume the reader knows how our VAAI implementations work. I have multiple posts on the site as well as documents you can read linked here.
Issue
We’ve had this new problem crop up a few times recently when our customers have the following setup:
- A heavily utilized array
- No SRDF or only SRDF/A on devices involved
- A VMFS 6 datastore as source device with automated UNMAP enabled (default)
The customer then runs one, or multiple, Storage vMotions from one datastore to the other. Assuming no conditions which prevent XCOPY from being issued (see paper for details), the VM will be moved from one datastore to the other using XCOPY. After the array receives all the commands from VMware, we report the move complete, while in the background we copy the data using SnapVX (TimeFinder). So all that is working as normal; however the problem comes when VMware issues UNMAP against the recently vacated datastore immediately after we report the move complete. The move, in fact, is not complete as we are still copying data. We now have to deal with UNMAP commands against tracks which cannot yet be reclaimed until the copy is done. The copy, therefore, becomes a top priority. This will invariably impact performance since the copy is no longer a background process and this can lead to IO timeouts or aborts on the device. I should note nothing breaks here. Everything works, but there are performance implications.
Workarounds
Though most environments will not encounter this issue, if you do, the workaround for VAAI issues is very consistent – disabling. This allows host-based mechanisms to be used. If you run into this problem, there are two options here. You can either disable automated UNMAP on the datastore or disable XCOPY on the host or array. I would recommend disabling automated UNMAP since this still leaves the possibility of using manual UNMAP if necessary. Manual UNMAP has its own concerns but can be run during maintenance windows or low activity when no SvMotions are planned. Note that if you are using Storage DRS, you would want to set it to manual mode before running manual UNMAP and then re-setting to automatic after the command is complete.
There is a Dell EMC KB article here: https://support.emc.com/kb/534895
PSA_DTW
https://support.emc.com/kb/534895 – Sorry, we cannot find that page. It is no longer available or has moved
Sorry apparently they are adding information so it is currently unavailable.
They republished now.
Very good write up. Make sure you give my customer (Jason Mooney) a shout out/credit. He was the one that found this and had to prove (for a few months) to Dell/EMC and VMware there was this Pref issue. :O)
Yes, been involved in that case for a while.