Skip to main content
 
 
Splunk Lantern

Building your own custom threshold templates

 

Splunk ITSI ships with 33 out-of-the-box thresholding templates with various permutations of the following configurations:

  • Time Policy Configurations. 8 different ones, based on concepts like AM/PM and weekend/weekday
  • Thresholding Algorithms. 4 different ones
  • Sensitivity Configurations. Most are thresholded with high and low, which might not be what you want

clipboard_e27fb80db9d4ed6973f0837503c369b19.png

It isn't always easy to determine what exactly these templates do or which one of these best fits your needs. Often, the default policies might not be appropriate for your KPIs. Therefore, it can often make more sense for an administrator to create custom templates, using your organization's naming conventions. When an administrator does this, they take away the burden of creating complex configurations from service owners, allowing them to rapidly deploy improvements across services on their own.

This article is part of the Definitive Guide to Best Practices for IT Service Intelligence. ITSI administrators and end users will benefit from adopting this practice as they work on Service Insights.

Solution

As you begin to threshold more and more KPIs, you will find commonality among many disparate KPIs. For example, many of your KPIs might:

  • Be based only on work days, and not weekends.
  • Be based only during work hours and not in the evenings.
  • Have common ebbs and flows throughout a day or week.
  • Be expressed as a percentage where ~100% is bad. (Example: CPU utilization)
  • Be expressed as a percentage where ~0% is bad. (Example: Disk space remaining)
  • Be bad when they are either too high or too low
  • Expect to see static behavior. (Example: Response time)

You can build custom thresholding templates that match these commonalities and you can name them in ways that are easy for your service owners to interpret, and, subsequently, apply autonomously. For example, here are some hypothetical custom threshold template names that include the type of KPI, the nature of the threshold, and the algorithm or critical percentages used in the template:

  • Percentage Based KPIs - High is Bad - Static (90) (Example KPI = CPU utilization)
  • Percentage Based KPIs - Low is Bad - Static (10) (Example KPI = disk space free)
  • Percentage Based KPIs - High is Bad - Pct Baseline (120%) (Example KPI = error rate)
  • Volume Based KPIs - High or Low is Bad - Stddev (3std) - Business Hours, Off
    Hours, Weekends
     (Example KPI = logins)
  • Volume Based KPIs - High is Bad - Stddev (3std) - Mon-Thurs, Friday, Weekends (Example KPI = logins)
  • Response Time KPIs - High is Bad - Pct Baseline (200%) (Example KPI = API response time)

With these types of template names, if you are a service owner building a KPI, you can more easily map the template to a KPI because the template names are intuitive.

Even when a service owner can't use a template exactly as the administrator created it, it can be easier for them to start with one of these custom templates and then customize it further.

clipboard_e326d1621d6381dad1f31bb3d526b9466.png

Next steps

This content comes from Splunk .Conf presentation, The Definitive List of Best Practices for Splunk® IT Service Intelligence: How to Configure, Administer, and Use ITSI for Optimal Results, part one presented in .Conf23 and part two presented in .Conf24 session. In the session replays, you can watch Jason Riley and Jeff Wiedemann share the many awesome best practices they've amassed for designing key performance indicators (KPIs), services, episodes, and machine learning to maximize end-user experience and insights. Whether you're new or experienced, you'll come away with tactical guidance you can use right away.

You might also be interested in the following Splunk resources: