Stranded space on VMAX All Flash datastores – can I get it back?

In a word, yes. Let me start with a basic explanation of VMware storage versus array storage.

A VMAX All Flash (AFA) array is a completely thin-provisioned box. A hallmark of using a thin array with VMware invariably means what VMware reports as used storage in a datastore may not reflect what an array reports as used. This is because a VMAX AFA will only allocate storage when data is written to a vmdk. Now for thin vmdks (thin on thin) since VMware does not preallocate space, what VMware reports is generally what is reflected on the array. For thick vmdks (zeroedthick (default), eagerzeroedthick), however, VMware will report the entire size of the vmdk as used, even if no data has actually been written to it.

Let’s illustrate this through an example. For instance, let’s say I create a 191 GB zeroedthick VM on a 500 GB VMAX AFA datastore. With the little bit of metadata VMware requires, VMware reports that it is using 192 GB of storage on the datastore.

If we look at the array, however, we see that only 72 GB is actually allocated (before compression gets a hold of it).  Here is the Unisphere for VMAX view:

Using CLI we can get more detail, including the various compression pools where the data is stored, but again only a portion of the 192 GB VMware reports is used.

This is good, of course, because it means we don’t waste space on the array – regardless of what VMware preallocates, we only allocate based on actual data written.

Fast forward to a future time when either we don’t need this VM any longer, or we want to move it around (Storage vMotion). In my case I’m simply going to delete the VM which leaves me with an empty datastore (just the metadata) at 1.4 GB.

So now that I have an empty datastore, I should also have no storage allocated on the array, right? Well, much to your disappointment that’s not the case. You might think to yourself, but I’m using automated UNMAP in vSphere 6.5 so won’t VMware reclaim the storage for me? Unfortunately because of the way VMware implemented automated UNMAP you will never get back that storage. This is because there is a catch with using automated UNMAP. You must have an active VM on the datastore or the storage will never be reclaimed. An empty datastore or one with inactive VMs will keep the allocated storage stranded. Therefore, the only VMware option available is to use manual UNMAP which is executed via CLI: esxcli storage vmfs unmap -l <datastore_name>.

We’re good then right? VMware will take care of it. Well, some of the time we are, but not all the time. There are situations where VMware will not unmap the storage. In particular there is a VMware bug that impacts both zeroedthick and eagerzeroedthick disks. If your stranded space was made up of these vmdks, manual UNMAP will not work. It will only work for thin vmdks. VMware is fixing this in the next version of vSphere and a future 6.5 patch, but for now if you have these type of vmdks (one of which is the default disk type), I will explain how to get back the space using VMAX technology.

From the VMAX side the only way we can really free up the storage of a device is to unmap/unmask it from the host and then run a “free -all” on it. This returns the storage to the SRP where it is then available for any device. If there is a datastore on the device, however, that’s not an option since obviously it would destroy the datastore. What we can do is a bit of a workaround.

The procedure we want to undertake is to overwrite the allocated extents with zeroes via an eagerzeroedthick (EZT) vmdk and then we can reclaim them, or rather the VMAX AFA is going to do it for us automatically thanks to compression. Now for this process we do actually want to write zeroes, not simply tell the array we want to write zeroes. What I mean by that is we don’t want to involve VAAI, or write same/block zero. If VAAI is used, the VMAX array is not going to write zeroes, rather we just do what amounts to flipping a flag. It’s incredibly efficient for normal eagerzeroedthick vmdk creation, but in this case it won’t help us. We want our compression engine to see the zeroes being written because it will grab them on-the-fly, discard them, and then reclaim the intended tracks. So the first thing we want to do is to turn off write same. This can be done in the GUI or CLI. I used the CLI so on my ESXi host I ran:

esxcfg-advcfg -s 0 /DataMover/HardwareAcceleratedInit

(If you need more detail on how to do this in GUI I have a VAAI paper under Important Docs at the top you can check out.)

Now create an EZT disk with vmkfstools (or GUI) that uses all the space in the datastore (or as close as you can get). You might be thinking why does it have to be the whole datastore, after all won’t VMware reuse blocks? Yes it will, but depending on the circumstances there might be extents that VMware doesn’t know about or think it owns. In particular this is the case if you delete and recreate the datastore. So the safest thing to do is to fill the entire datastore with zeroes; however if your datastore is very large it may be worth trying a smaller vmdk that is large enough just to fill the allocated extents on the array. If it fails to reclaim all the storage, you can then go with the full size. Without the benefit of VAAI, EZT creation will take some time if the vmdk is very large. As my datastore is 500 GB, I go the full amount (accounting for the metadata):

vmkfstools -c 498GB /vmfs/volumes/VMAX_AFA_4A/test.vmdk -d eagerzeroedthick

As the vmdk is created, you will see the allocation on the backend decreasing in real-time. After it completes, be sure to remove the vmdk:

vmkfstools -U /vmfs/volumes/VMAX_AFA_4A/test.vmdk

Here is a short video of the process, just about in real-time:

So what if you don’t have compression? Well obviously you’re out of luck! Nah, fear not. Within Unisphere for VMAX (or CLI) a reclaim can be executed which will do the same thing as compression does, just after the fact. Follow the exact same process as above. After the vmdk deletion, run a reclaim on the device as depicted in steps below.

The nice thing about compression is that a single command does both, but you’ll get back the space with either process.

And don’t forget once you are finished getting back that storage, to re-enable write same through CLI or GUI:

esxcfg-advcfg -s 1 /DataMover/HardwareAcceleratedInit

Advertisement

One thought on “Stranded space on VMAX All Flash datastores – can I get it back?

Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: