Kyle Ross ran into an issue with VMware Data Recovery this week that needs to be mentioned to the wider VMware community. Below is a write-up of the issue he encountered and the workaround he went through with VMware support.
I was made aware of a serious (in my opinion) bug with VDR during a call with VMware support that I haven't seen discussed anywhere. This is an internally known issue that causes snapshots to build up on VM's that are members of VDR backup jobs.
During the backup process a new snapshot is created and VDR updates the snapshot descriptor file (vm_name-000001.vmdk) to mark the snapshot as un-removable. The bug is introduced when the backup process completes, it fails to mark the snapshot as removable causing them to remain.
The tricky part of the problem is that the snapshots are not visible through the vSphere Client, nor are they listed in apps like 'RVTools' that use the VMware CLI to gather data. They could potentially be listed in the new datastore views but I didn’t think to look there before I resolved it in my environment. I ran across them by logging into the service console and running the following command to list all the delta files on the datastores attached to the server.
find /vmfs/volumes/ -name \*delta\*
In my environment I noticed numerous VM’s with multi-gigabyte delta files that I couldn’t account for via snapshots listed in the GUI. Here is the solution I was given by VMware. Via the service console, browse to the location of the VMDK files for the affected VM. Run this command to identify the descriptors that need to be corrected, replacing ‘virtual_machine_name’ with the actual name of the VM.
grep –I ddb.dele virtual_machine_name-000???.vmdk
This command will quickly identify the delta files that are marked as non-deletable. The workaround is to edit the affected VMDK descriptor files and change “ddb.deletable” from “false” to “true”. You will probably also need to edit the root VMDK file and change this field as well, otherwise you may be left with one open snapshot. Note that due to a change in how ESX 4 performs file locking, you will probably need to SSH into the host that is currently running the VM to edit these files. Once you have edited all the files, create a new snapshot for the VM either via the GUI or command line. Then issue the “Delete All” snapshots command to force ESX to combine all the files and close all the visible and hidden snapshots.
Since ESXi doesn't normally provide a service console you can simply clone the VM's as the new machines would have no open snapshots. If you are trying to avoid downtime as I was, you can use the unsupported service console on ESXi to edit the descriptor files as described above.
Update : VMware Data Recovery 1.0.1 is released via Ducan Epping at Yellow Bricks.
Data Recovery modifies virtual machines’ vmdk files’ settings so a snapshot can be created for backup purposes. In the past, after the backup has been created, the vmdk file’s settings was sometimes left configured for snapshots even after the backup was complete. This led to these virtual machines being left in snapshot mode while accumulating snapshots that were undetected by vSphere Client. This process has been redesigned so that these temporary files are no longer be left behind. In previous versions of Data Recovery, this issue can be resolved by following the process described in the knowledge base article titled “Delete ddb.delete entries and snapshots left behind by Vmware Data Recovery”.