Job Duration Unusual Alert should be 'AG aware' to avoid false positives
We are using an AG for one of our instances.
As you would expect we have the same SQLAgent jobs created on both instances in this AG, but we only want the job to execute on the primary.
In SQLAgent the jobs on both nodes of the AG have a step at the beginning which checks to see if this is the primary instance.
If it is the primary instance it will proceed to step 2 and execute the rest of the job.
If it is not the primary instance the job will terminate.
This all works fine - however the Job Duration Unusual alert does not take AG failovers into consideration.
For example, we have a job that normally takes around 13 minutes to complete on the primary instance.
All the executions of the job on the secondary instance take less than 1 minute, as they are simply checking the AG then terminating.
After a failover, when this becomes the primary instance, they start taking 13 minutes.
But this alert uses the 1 minute executions in its calculation of the baseline, so is continually alerting us that the job is taking considerably longer than the baseline.
Is it possible for the Job Duration Unusual alerts to be aware of the AG and calculate its baseline based on the last 10 executions of the job on the primary instance, so that it uses an accurate baseline when calculating deviation from the norm?
An alternative to the above could be to make the Job Duration Unusual alert more configurable, as mentioned in a different suggestion.