Knowing proper adaptive threshold configurations
Adaptive thresholding is powerful and effective when configured properly, but confusing, noisy, and ineffective if not. You need to learn to use adaptive thresholding correctly to take advantage of it.
This article is part of the Definitive Guide to Best Practices for IT Service Intelligence. ITSI administrators and end users will benefit from adopting this practice as they work on Service Insights.
Solution
First, let's review the components of adaptive thresholds:
- Training Window. How much historical data to use when training
- Time Policies. Windows of time where the KPI is expected to behave differently
- Outlier Exclusion. Outliers to be removed in order to clean up the training dataset
- Algorithm. The machine learning math used to compute threshold values
- Algorithm Parameters. User specified sensitivity of the algorithm to determine normal, high, and critical values
Now that you understand the basics, you can move on to best practices for applying adaptive thresholds.
Be cautious when using pre-configured threshold templates
There are many out-of-the-box templates available, but you cannot select a random one and hope it works. Invest some time in your data to select appropriate ones. For more information, see Building your own custom threshold templates.
Resist the urge to use a different time policy for every hour of the day
You likely do not need 168 different policies for a week. Configure as few as possible, but as many as necessary to encapsulate expected behavior changes in a KPI. In the following sample chart, the same policy has been applied on weekdays from 8 AM to 10 AM because the KPI behaves similarly across each of these windows of time.
If your KPIs contain outlier data, use it
Most KPIs will have some outlier data, for example, response time commonly includes outliers. Getting rid of the outliers tightens your threshold ranges and makes them more effective. If you aren't sure what algorithm to use to determine outliers, a good place to start is to use the standard deviation with three sigmas (σ) of sensitivity.
Percent of baseline is the preferred adaptive threshold algorithm
Among the options, percent of baseline is the best. Standard deviation is generally the second best option. The configuration for these looks like the following:
Algorithm | Critical Severity | High Severity | Medium Severity (Optional) | Base Severity |
---|---|---|---|---|
Percentile | ~200% | ~150% | ~125% | Normal |
Standard Deviation | ~3.0σ | ~2.5σ | ~2.0σ | Normal |
New UI improvements for tuning adaptive thresholds
- Time range window. You can customize this window to see more of the historical behavior than only the last week.
- Full granularity. Zoom in and out of the KPI results. The timechart binning has been increased to as little as one minute, instead of the previous 30 minute buckets.
- Advanced display options. While zoomed out of the results, you can see maximum and minimum values in the data set, as well as the 75th and 90th percentiles.
You can also specify the y-axis boundaries. This can be used to zoom into specific ranges that might be difficult to see when the default Y axis min and max values are too big.
- Compare current with historical threshold configurations. As you tune the thresholds, the panel outlined in the screenshot below is where you can see how effective your changes will be and whether your new configurations make sense. Use the percent of critical, high, and normal results compared to the historical data to decide.
Assisted adaptive threshold configurations
This is a new feature, powered by Splunk AI, that you can experiment with to decide whether it is helpful for you.
Pros
- Provides automated adaptive threshold configuration recommendations
- Drives upcoming “bulk” threshold tuning workflow
Cons
- Will not always produce correct results
- Might produce more complex configurations which are harder to tune
Next steps
You might also be interested in the following Splunk resources:
- Splunk Docs: Service insights manual
- Splunk Docs: Create adaptive KPI thresholds in ITSI
- .Conf Talk: Adaptive thresholding...demystified