allow more configuration options for the 'Job duration unusual' alert
The 'Job duration unusual' alert is based off of the last ten runs of a job. I think allowing a configurable number of runs to be a baseline or better yet, the last x runs at the same time of day would be ideal.
Brent Templeton commented
I would like configuration (per job) that allows you to exclude low values -- we have many jobs that run frequently, often times taking 0 to 1 second when they don't have work to do, I would like those times to be eliminated from the average.
Create Average duration for the last XX executions over YY seconds
It would be nice if this alert only considered successful runs when determining the baseline.
Matt Girolami commented
Agree with other comments that more configuration options on this alert would be nice. Being able to filter out jobs would be nice, but having a bit more smarts around comparing job runs vs the average time of previous runs at that same time instead of the last 10 times would be nice.
As an example, we have various archive jobs that run every few hours, however the time they take to complete differs based on user load, so runs outside of our normal business hours take seconds, while runs during high use period may take an hour or more. This difference in expected times causes this alert to be of low value to us
Douglas Johnston commented
I have an example too that doesn't really fit into this ... we have a SQL Agent job that does backups of the various databases. However, the backups performed vary based on an internal scheduling mechanism that decides if the database should do a full, diff or tlog backup. For example, if we do something like truncate a log file, it may decide to do a FULL to renew the log chain, or it may decide that every 30 mins it will do TLOG but every 4 hours will do a DIFF. The different durations are wildly different, which triggers that alert every time it changes.
I'd prefer to filter the job name out in a similar way to the long running query alert, or maybe define different windows of variability per job name. Difficult one to design, this one ... I don't envy you :-)
Neil Russell commented
The last x runs at the same time of day is pretty much a must have to get rid of false positives. Here is the most common one we run into: pretty much every SQL Server instance gets some variation of an hourly log backup job and a weekly index maintenance job. Of course the log backup after the index maintenance job always throws an alert since it take longer than the median of the last 10 runs.
We can work around this specific example with a maintenance window but this is just a simple case. It would be really helpful to be able to alert and report on job duration by time of day.
Thank you for your feedback Rusty.