We recently had a series of AVD Session hosts that did not power on properly due to a disk provisioning error in Azure. Through support, we were able to determine that Azure was successful in shutting down the VM the previous night (Nerdio Auto scale), but Nerdio continued to try and resize the disk. This resize task failed through the weekend and led to the VM being unable to start Monday Morning.
The root of the issue was that Microsoft's limitation of downgrading disks more than once in a 12 hour period prevented Nerdio from completing its auto-scale task. For whatever reason, this led to the VM being unable to provision/allocate in the morning.
I believe the root of this issue is that Nerdio is shutting down the VM and also downgrading the disk in the same action (that failed) and retried multiple times. While the VM was shutdown, I do not believe Nerdio left it in a state where it could be ready again since the last action failed. This is unline the start-VM autoscale action which resizes the disk and starts the VM in separate actions.
It would be nice that upon shutdown, Nerdio trigger a separate action to resize the disk for the autoscale action. If this action were to fail due to this specific Microsoft limitation, or any other reason, Nerdio would continue to have the session host ready to power on/provision/allocate with the enterprise disk. This would result in higher cost for the day, but would ultimately make the VM available first thing in the morning rather than leading to a Azure VM repair ticket first thing.
Comments (3 comments)