In SQL Monitor, I want to...

Reduce the number of false-positive status alerts (e.g. Machine unreachable)

CONTEXT: Many status alerts (eg. Machine unreachable) have a simple configuration model, meaning that these alerts will trigger as soon as a ping to the Machine fails.

PROBLEM: VPN connection resets, scheduled restarts, etc. all may be acceptable causes for a ping to fail in our system, but these currently lead to a lot of false positives... provided they are brief, and the system recovers in time.

EXAMPLE SOLUTIONS:
• Add configuration to the machine unreachable alerts to set a time threshold
• Require two consecutive pings to fail before raising the alert

88 votes
Vote
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    I agree to the terms of service
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    Daniel RothigAdminDaniel Rothig (Product Manager for SQL Monitor, Red Gate Software) shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →
    Daniel JacksonDaniel Jackson shared a merged idea: Erroneous Monitoring Error for Databases During Restore  ·   · 
    Don FergusonDon Ferguson shared a merged idea: Configure instance unreachable alert  ·   · 
    Brent BurbidgeBrent Burbidge shared a merged idea: Configure Machine unreachable alert  ·   · 
    SupportSupport shared a merged idea: Machine Unreachable Alerts - Repeated alerts would be useful  ·   · 

    13 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      Submitting...
      • RobertRobert commented  ·   ·  Flag as inappropriate

        We also get a lot of noise from this alert, due to intermittent network issues, especially on VMs. The VM hosts seem to get overwhelmed with network traffic at certain times. It's usually during overnight/maintenance hours (probably from backups), so it doesn't affect the user experience, but it results in lots of alerts during off-hours. If we could configure the alert threshold, then we could determine how long before the alert fires.

      • JeffJeff commented  ·   ·  Flag as inappropriate

        I get a machine unreachable alert with 1 second between the "Machine unreachable from" and "Machine became reachable again", so there is no waiting of 5 seconds to re-ping. If it is going to alert for a 1 second "outage", then it is noise and I have to either disable the alert or disable the email, which defeats the purpose. If it was configurable via the UI, I could tweak settings for individual servers that are more ping "lossy" than others. Thank you.

      • AnonymousAnonymous commented  ·   ·  Flag as inappropriate

        Oracle Enterprise Manager allow us set these thresholds. Same thing for MySQL Enterprise Monitor.
        Right now this feature useless.

      • Brent BurbidgeBrent Burbidge commented  ·   ·  Flag as inappropriate

        Hi Priya

        No I still believe this should be configurable. We get false alarms from this alert consistently. Actually had to disable it because it was noise.

        I believe this should be something exposed in the UI and configurable (I.E X pings, wait X secs, X ping)

        Thanks,
        Brent

      • Daniel JacksonDaniel Jackson commented  ·   ·  Flag as inappropriate

        Erroneous Monitoring Error for Databases During Restore
        We log ship several hundred databases and are constantly getting "Monitoring error (SQL Server data collection)" notifications. The connection log will show the following error.
        Database 'Some_Database' cannot be opened. It is in the middle of a restore."
        I feel this is not really an "error" as much as it is a "state". At the very least, it would be nice to allow this particular notification to be turned off. We are currently getting 50-100 emails a day for "errors" that are not really errors.

      • Priya SinhaAdminPriya Sinha (Project Manager, Red Gate Software) commented  ·   ·  Flag as inappropriate

        Hi Don,

        The way SQL Monitor works is that it pings Machine/ SQL Server and if they don't respond then SQL Monitor raises an alert. If I remember correctly, SQL Monitor pings for 5 times with 1 sec difference, wait for few sec, pings for 5 times again before it raises an alert.

        The reason for not exposing this configuration to user is that we believe that it is important to get notified, as soon as possible, if your server is down or not responding to pings. On the other hand, I appreciate that if you are getting lots of false positive then we need to understand reasons for it.

        We have seen this error in other user environment when for some bizzare reasons n/w is flaky and pings randomly fail. I would suggest that we can try increasing logging in your environment and capture errors which is causing this alert to raise in this first place. Then we can investigate it further. If you would like to do this then please do let us know and I can email you the details.

        Thanks,
        Priya

      • Don FergusonDon Ferguson commented  ·   ·  Flag as inappropriate

        I am getting lots of false positives on this. I need to be able to make it less sensitive.

      • Don FergusonDon Ferguson commented  ·   ·  Flag as inappropriate

        Add ability to set a custom unreachable threshold measured in seconds before the instance unreachable alert is raised.

      • Priya SinhaAdminPriya Sinha (Project Manager, Red Gate Software) commented  ·   ·  Flag as inappropriate

        Hi Brent,

        It actually works exactly the way you have described.

        I think it is ten pings in a row each with a 1s timeout. So by default it is

        • five pings
        • wait five secs
        • five pings

        If then Monitor doesn't get response then it raises alert.

        Though this configuration is not exposed via UI.

        Are you happy for us to close this feature request?

        Thanks,
        Priya

      • Brent BurbidgeBrent Burbidge commented  ·   ·  Flag as inappropriate

        Configure the alert/ping threshold to only alert if there are multiple ping failures over a time period or so many failed responses in a row.

      • SupportSupport commented  ·   ·  Flag as inappropriate

        On the “Machine Unreachable” alerts, is it possible to create an additional alert (e.g. after 10 minutes) to advise that a machine is still unreachable?
        This would be useful for overnight automated reboots when applying Windows updates, and the current unreachable alerts are extremely useful for that, but it would be re-assuring to know that we would continue to receive further alerts if a server had failed to come back up again (and if we also failed to see the absence of a “Machine Unreachable – Ended” alert)

      Feedback and Knowledge Base