Apply Policy Baselines in Parallelization

Today, my main Policy Baseline is applied to ~100 tenants and has ~100 policies. It currently takes over an hour to run, and I believe it runs every hour, so essentially, it is always running. It also seems the job "Republish assignments in baseline" has no parallelization and runs synchronously, which means that one job has roughly 10,000 tasks. Furthermore, the entire job fails if one task fails for any reason. Yesterday, that job ran 19 times and failed 7. Typically, the job following the failed execution is completed successfully, but the failure impacts all policies that had not been reached.

My feature request/proposal is to revamp the logic in which these are executed. For example, have the "Republish assignments in baseline" job kick off independent child jobs for each tenant and allow them to run in parallel. Even with restricting the number of tenants processed in parallelization, this should directly impact the time to completion but would also allow policies to apply to other tenants should a specific tenant have errors. Additionally, if a task fails within the job, it should not exit the entire job. If there is no retry logic, I would love to see that implemented; however, if the task fails (after retrying), I would still like to see the job status show an error, but failing out of the job because of one policy seems counterintuitive. If one policy has an issue, the other policies should still be applied/checked/managed. This would make the solution more robust and allow policies to reach tenants much quicker should something need to be changed.

This same logic could be applied to other jobs at the MSP level, like Solutions Baseline, but in my experience, the Policy Baselines are the most impacted. 

4

Comments (2 comments)

0
Avatar
Dave Stephenson

Great idea, Chris!

Many of our partners aren't to that level (i.e. 100x100), but I can easily see how the current functionality can lead to some bottlenecks.

What you're suggestion to attack this problem from two sides (error handling and parallel processing) could be what's ultimately needed.

For the error handling part, do you think you'd want it more like the Desktop Image error handler (where you can specify the number of retries) or more like the Host Pool task retry where you have to manually retry on a failure?

 

Desktop Image

 

Host Pool

1
Avatar
Chris Brannon

Dave Stephenson, more like Desktop Image, but I don't know if I even need control over the logic.

If the task fails, sleep for a few seconds and try again. If the retry fails, move on to the next task(policy), and then maybe if X number of tasks fail consecutively, fail out of the job. (Always show the status as “Error” if a task fails unless the retry was successful.)

If you're going to give us the control, I would say it should be more granular:
 

Please sign in to leave a comment.