Operation Intelligence: How Anomaly score calculated in Anomaly alerts?
We calculate using the schedule job "Operational Intelligence - Process Stale Anomaly Score - Daily" , This calls the script from backend processStaleAnomalyScore();
- Our models predict a range of normal operating values for a metric. When a metric deviates from this range, we examine both the size of that deviation and the duration of the deviation and use these to compute a score that lies between 0 and 10.
- A very large deviation from the normal range will result in a critical score (near 10) in very few observations. A period of 30-40 minutes of observations that are barely outside the normal range will also lead to a critical anomaly score.
How we are calculating the exact value:
- Data points that are outside the bounds contribute to an "accumulated anomalousness" measure. Think of it as the area between the metric values and the control bounds when the metric values are outside the control bounds.
- However, we want to forget past anomalous data so there is an exponential weighting that causes the effect of past anomalous observations to decrease with time.
- Since this is an area, it can potentially grow very large, so we feed this area into an S-shaped function that maps all values to the range of 0-10.
- Technically, we fit a sigmoid function to give us the time-to-critical behavior , and then we normalize the output of this sigmoid function to be strictly between 0 and 10 [needed since area is only unbounded above.