For larger host pool environments, it would be nice to set a % rule for spot VM's.
For Example, if you have a host pool of 100 VM's, and we set a 20% Spot VM setting, then once the 81st VM is powered on, it is set as a spot VM. Every VM after that is also a Spot VM.
Typically, the last VM's that spin up for the day are low user, low usage. If the VM needed to power down, it wouldn't affect as many users, and it would happen later in the day.
This would also save on costs for these lower usage VM's.
Spot VM's in Host Pools
For larger deployments, this could translate into significant savings, as long as the customer is willing to accept the risks of Spot VMs.
Playing “devil's advocate” again, what if the Spot VMs are evicted, the FSLogix profile becomes corrupted, and the Nerdio Azure Capacity Extender is left scrambling to find a different size for 1-3 hours during tax season?
Haha. Yes, I know it's not likely to happen, but it could.
Honestly, it comes down to the whole cost vs risk assessment planning. Maybe if we combine this with our “Burst beyond capacity” feature and then trigger a burst when the first Spot VM is evicted and then switches to the Burst VM (non-SpotVM) if more than 1 Spot VM evicting is triggered the risk might be worth it.
Out of curiosity, does your team utilize a lot of Spot VMs in production workloads or you're just looking to compare apples to apples and have the option available to you?
We don't use spot VM's today as they can be hard to manage. But if you did have VM's up for 4 hours a day, they would be less likely to be evicted. I know this request might be a long shot. It was just another one of those that a competitor is using. So I am not sure how they are handling potential corupted sudden hard disconnects of FSLogix profiles. I don't have much experience with Spot. Maybe Microsoft does a soft shutdown of the OS when doing the eviction which would allow the FSL service to safely diconnect the user profiles.
I'm in the same boat.
I've used spot VMs in the past, but never had an “evicting” take place to know what it would do to an FSLogix profile. I'm guessing it wouldn't be a soft shutdown, just going off what I've seen described in the FAQ, but I could be wrong.
Again, it may come down to the cost vs risk argument and could be worthwhile to some but not make sense for others.
We'll have to dig into the feasibility of this and see if we can make it a bit more predictable/safe.
I like the premise, and would be interested (with select customers where the tolerance would be higher) - but with smaller pools. Whereas a pool with 5 or 10 hosts, committing to using Spot, on 20% of the running hosts. If auto-scale brings down a host that isn't spot (e.g. a violation of the rule) because the capacity isn't needed, work to get that user shifted over and get the excess spot offline. The logic could be turned the other way - ensuring that no more than 80% of the pool is running full price.
Most of our pools don't use an aggressiveness above low or medium - but I could see this being useful in a scenario where High aggressiveness were in use and it would be reasonable to expect the users to have to logout/login. I could also see this being valuable in the enterprise side in driving costs down (with some modest risk).
Though I do agree, the savings can be notable, is it worth the risk? If we could automate a process into the eviction (e.g. force logout of users, triggering the fslogix proper dismount) - I'd be keen on seeing how that would work. But by design, spot VM's are intended for disposable workloads, and while I like to think of session hosts in a shared pool as ‘disposable’ - it isn't quite the same.
Please sign in to leave a comment.
Comments (4 comments)