Separate Tasks for Shutting down Session Host and Resizing the Disk

We recently had a series of AVD Session hosts that did not power on properly due to a disk provisioning error in Azure. Through support, we were able to determine that Azure was successful in shutting down the VM the previous night (Nerdio Auto scale), but Nerdio continued to try and resize the disk. This resize task failed through the weekend and led to the VM being unable to start Monday Morning. 

The root of the issue was that Microsoft's limitation of downgrading disks more than once in a 12 hour period prevented Nerdio from completing its auto-scale task. For whatever reason, this led to the VM being unable to provision/allocate in the morning. 

I believe the root of this issue is that Nerdio is shutting down the VM and also downgrading the disk in the same action (that failed) and retried multiple times. While the VM was shutdown, I do not believe Nerdio left it in a state where it could be ready again since the last action failed. This is unline the start-VM autoscale action which resizes the disk and starts the VM in separate actions. 

It would be nice that upon shutdown, Nerdio trigger a separate action to resize the disk for the autoscale action. If this action were to fail due to this specific Microsoft limitation, or any other reason, Nerdio would continue to have the session host ready to power on/provision/allocate with the enterprise disk. This would result in higher cost for the day, but would ultimately make the VM available first thing in the morning rather than leading to a Azure VM repair ticket first thing.

1

Comments (3 comments)

0
Avatar
Dave Stephenson

Welcome to the community, Ryan Stephenson 🙂!

Great find and idea.
I know we face a similar limitation with Storage Account shrinking where we can only do it once every 24 hours.

Having it as a separate task (i.e. Shutdown the VM = Complete, Resize Disk = Failure) would allow for the shutdown to complete, but could error on the resize and still allow for the VM to start on the next day.

Out of curiosity, would you want a notification for the resize failing or would you want it to be part of the “Auto Heal” functionality where it tries to do the resize after a few hours?

1
Avatar
Ryan Stephenson

Thanks, Dave! 

I think both those options are great for this idea. The ability to queue a min/max number of retries as well as a notification to either a Nerdio admin or ticketing system email address would be great.

0
Avatar
Dave Stephenson

Excellent! Our product team will get this on the backlog and work to get it implemented into the product.
If you have any more thoughts or ideas around this, please keep them coming. 😎

Please sign in to leave a comment.