Update – datastore corruption issue with XCOPY

*** Update 2-27-2020 ***

We have now updated the DTA for this issue where you can find the relevant patch number for the array. Please see article 000537000 for details.

537000 : DTA 537000: VMAX AFA, PowerMax: After Storage vMotion, VM Clone or any activity which invokes XCOPY in VMware ESXi 6.x, the target datastore VMFS metadata may be completely overwritten https://support.emc.com/kb/537000

*********

Rather than update the previous post I wanted to create a new one to avoid confusion since the DTA for this issue has not been updated. The engineering teams at VMware and Dell EMC have devised a solution that mitigates the corruption bug and permits customers to re-enable XCOPY on their ESXi hosts or array. This solution, as we’ll call it, is detailed in an internal KB article (538246) that your Dell EMC teams will have access to. The reason it is internal is that the fix is still restricted, meaning it is not simply available for everyone to download. These types of fixes are always done this way. Customers with an urgent need will work with their Dell EMC teams to apply the fix, and absent any issues at those select customers over a few weeks, the fix will be readily available for all. At that point, the DTA noted in the other post will be updated for customer consumption. My advice is that unless need compels, wait the additional time for the general release. I can, however, explain what the fix does and how you can take some preparatory steps on the VMware side.

Solution

The fix in the PowerMaxOS code defends against VMware requesting that the header information of the datastore be overwritten, thus rendering the datastore corrupt. When such a request is made, we will record error codes on the array (the array will dial-home), and depending on the version of VMware and thus VMware’s ability to understand the return code, the VMware task using XCOPY will either fail, or revert to software copy. Either result will ensure that you cannot corrupt a datastore.

VMware version update

In conjunction with our code change, VMware added a bug fix in vSphere 6.5 and 6.7 to recognize the return code we send when we encounter the corruption. If you have the VMware fix applied, this is when VMware will fail the task; otherwise it will revert the task from XCOPY to software copy. If VMware fails the task because you have the correct vSphere version that understands the return code, simply retry the operation and it will use XCOPY to succeed, as you probably are aware the bug has evaded all reproduction and re-attempts will almost certainly not hit the bug again (if it does, however, please open an SR as a reproduction would be very welcome!). There are fixes for both 6.5 and 6.7 in the following, respective patches:

Hot patch on 6.5EP17 – PR 2460660
Hot patch on 6.7EP13 – PR 2460388

Be aware that the vSphere fix alone has no ability to prevent the bug. It is simply able to interpret the return code we send from PowerMaxOS when the fix is on the array and the bug is encountered; however as explained the vSphere update is not necessary to prevent the bug, though it is recommended.

3 thoughts on “Update – datastore corruption issue with XCOPY

Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.

Up ↑

%d bloggers like this: