Building your own custom threshold templates
Splunk ITSI ships with 33 out-of-the-box thresholding templates with various permutations of the following configurations:
- Time Policy Configurations. 8 different ones, based on concepts like AM/PM and weekend/weekday
- Thresholding Algorithms. 4 different ones
- Sensitivity Configurations. Most are thresholded with high and low, which might not be what you want
It isn't always easy to determine what exactly these templates do or which one of these best fits your needs. Often, the default policies might not be appropriate for your KPIs. Therefore, it can often make more sense for an administrator to create custom templates, using your organization's naming conventions. When an administrator does this, they take away the burden of creating complex configurations from service owners, allowing them to rapidly deploy improvements across services on their own.
This article is part of the Definitive Guide to Best Practices for IT Service Intelligence. ITSI administrators and end users will benefit from adopting this practice as they work on Service Insights.
Solution
As you begin to threshold more and more KPIs, you will find commonality among many disparate KPIs. For example, many of your KPIs might:
- Be based only on work days, and not weekends.
- Be based only during work hours and not in the evenings.
- Have common ebbs and flows throughout a day or week.
- Be expressed as a percentage where ~100% is bad. (Example: CPU utilization)
- Be expressed as a percentage where ~0% is bad. (Example: Disk space remaining)
- Be bad when they are either too high or too low
- Expect to see static behavior. (Example: Response time)
You can build custom thresholding templates that match these commonalities and you can name them in ways that are easy for your service owners to interpret, and, subsequently, apply autonomously. For example, here are some hypothetical custom threshold template names that include the type of KPI, the nature of the threshold, and the algorithm or critical percentages used in the template:
- Percentage Based KPIs - High is Bad - Static (90) (Example KPI = CPU utilization)
- Percentage Based KPIs - Low is Bad - Static (10) (Example KPI = disk space free)
- Percentage Based KPIs - High is Bad - Pct Baseline (120%) (Example KPI = error rate)
- Volume Based KPIs - High or Low is Bad - Stddev (3std) - Business Hours, Off
Hours, Weekends (Example KPI = logins) - Volume Based KPIs - High is Bad - Stddev (3std) - Mon-Thurs, Friday, Weekends (Example KPI = logins)
- Response Time KPIs - High is Bad - Pct Baseline (200%) (Example KPI = API response time)
With these types of template names, if you are a service owner building a KPI, you can more easily map the template to a KPI because the template names are intuitive.
Even when a service owner can't use a template exactly as the administrator created it, it can be easier for them to start with one of these custom templates and then customize it further.
Next steps
You might also be interested in the following Splunk resources:
- Splunk Docs: Service insights manual