Setting up notifications for cpu and ram notifications after an incident where an AVD host didn't reboot and was stuck at 100% cpu over night.
With the current capabilities we get notified if ram or cpu hit the threshold
But what this doesn't do is CPU is at 100% for the last x minutes. (or more reasonably averaged 80%+ for the last x minutes).
Would like to have the notification allow for average over last x minutes so we can disregard a single spike.
I also see a potential for issues when running maintenance. Would like to be able to mute notifications for server for x time. Similar to exclude from auto-scale.
Adding this is into run script would also make sense. Run script -> Mute notifications while script is running.
Comments (3 comments)