I know we can enable alerts for certain conditions in NMM. However, it is for the whole environment and not for specific customers inside of NMM.
I am asking for an email alert for a host down or unavailable (agent not communicating) in both NMM and NME. I know there is an email option if auto-scale doesn't start a VM, but this doesn't account for the fact that a host may be up and go down or go unavailable.
Email alert on Host down or Unavailable
I would like to add to this, email notification upon bursting as well.
This is a very cool idea!
We do have some notifications that can get as granular as an Account and/or Host Pool, but nothing really to the level that you're describing.
I especially like Johnny's idea of an email notification for bursting, so you know "hey, look at this host pool because we likely need to adjust the autoscale settings".
Playing "devil's advocate" for a minute, how would you avoid "false positives" when a host is coming online/offline and the status changes from available to unavailable? Or, if it's online, but not available, and autoheal gets trigger, does that notification win or does an autoscale notification win?
I hope I don't sound dumb here on this...
Could it be based off of the events "Add Host" and "Delete Host", like for the host pool, specifically those events? I know that may send a bunch of emails when you re-image all of the hosts so I'm guessing on the part for only bursting. Maybe it takes the Base Pool Capacity and if the number created >x, then it sends an email? I don't know the feasibility of that but it is just a guess.
Dave, RE: False positives, could have a delay of 10 min (could be adjustable by user, as Autoheal is adjustable) or could do Autoheal + 2 or 5 of something like that.
RE: Johnny's comment on Re-image, hopefully Nerdio could determine a re-image is being performed, so it wouldn't alert on that, also Nerdio should detect when auto-scale is being performed, so wouldn't alert then. Yes, when a host is re-imaged it goes down, but that's understandable and isn't "during operation".
Thanks, Randy and Johnny.
Someone once told me that "The only dumb question is the one that's not asked" so don't ever think your question is dumb. 🙂
Asking questions is one of the best ways to learn.
On the burst thing, that may take some additional logic to get it to work, but I think you're right that we log it in the Auto-Scale logs that we need to burst so we could trigger off of that.
For the false positives, we could do a delay on it, similar to how we're doing auto-heal, but it might take a redesign of the notification to get that to work.
Although, I know there are a few other requests for additional notification options so we might be able to combine all of those improvement in a release.
Maybe we can build-out the auto-heal section to allow notifications for Unrecoverable Hosts? 🤔(see screenshot below)
Or is there a use-case where you'd want to have Auto-Heal disabled and just be notified of the Unavailable hosts?
My thinking is, if we put this as part of the Auto-Scale settings, then it will only notify during the AutoScale hours and cut-down on the false positives.
And, having it all as part of the autoscale, it would help capture Johnny's use-case on the burst.
(please excuse my rough visualization of the feature (I'm a visual learner🤓)
I am aware of the Notify on Failure when Auto-Scale doesn't start a host when it should.
I would not have the email alert on Host down / unavailable as part of the Auto-Scale setting, as some orgs / host pools may not use Auto-scale.
Here is my use case: I work for an MSP. I work on the Project side. When I'm done with the project (e.g. AVD build) I then turn it over to Managed Services. Managed Services uses tools to notify of Up/Down. However, all customers don't prescribe to the tools OR, they are in MAG where the tools won't work. I want to be able to send an email to support to create a case when a Host goes down, e.g. powers off unexpectedly or becomes unavailable, e.g. agent stops.
Hmm. I see your point, Randy.
There's definitely a need for monitoring/alerting.
We'll have to see what our Product team can come-up with when they research the feasibility of this idea.
It may be something as simple as utilizing Azure AD Groups and notify based on the membership and the current status in NMM.
Or, something more complicated.
We'll see what they come up with and hopefully other partners can continue to add their insight/use-cases in the meantime 🙂
This is a very much needed request. A good example was this morning when bunch of Hosts failed to boot.
I know that underline issue may or may not be nerdio but having notification sent to us would at least let us know that there is a problem so we can deal with it instead of wating for customer to let us know that they are down.
Please sign in to leave a comment.
Comments (8 comments)