Skip to main content
 
 
Splunk Lantern

Using SRE golden signals for KPIs

 

Splunk ITSI administrators often struggle to extract meaningful KPIs from service owners when building new Business Services. And often, ITSI administrators don’t understand the service well enough to propose meaningful KPIs. In these instances, a framework to help identify meaningful KPIs is needed.

This article is part of the Definitive Guide to Best Practices for IT Service Intelligence. ITSI end users will benefit from adopting this practice as they work on Service Insights

Solution 

In Adopting monitoring frameworks - LETS, we discuss the SRE Golden Signals which establish benchmarks for each metric showing when the system is healthy – ensuring positive customer experiences and uptime. However, these Golden Signals aren't just valuable to SREs; we can apply a business lens to the SRE Golden Signals to create meaningful business-centric KPIs perfectly suited for ITSI. The following table shows how you can apply a business lens to the golden signals, and where you can pull the necessary data from to measure each one.

Golden Signal Business Service Context
Response time. Is the service running slower than usual? Login response time
Volume. Are we experiencing much higher or lower traffic than usual? Login volume
Error rate. Is the service producing more errors than usual?  Login error rate
Saturation. Will the service slow down or break if we get more volume? Concurrent users

While a team could always monitor more metrics or logs across the system, the four golden signals are the essential building blocks for any effective monitoring strategy. Common data sources for these golden signals are as follows:

  • Access logs (Apache/IIS access, AWS Cloudtrail, Linux Secure)
  • Database records (from custom tables using Splunk DBConnect)
  • Custom application logs
  • APM tools
  • Synthetic monitoring

Next steps

This content comes from Splunk .Conf presentation, The Definitive List of Best Practices for Splunk® IT Service Intelligence: How to Configure, Administer, and Use ITSI for Optimal Results, part one presented in .Conf23 and part two presented in .Conf24 session. In the session replays, you can watch Jason Riley and Jeff Wiedemann share the many awesome best practices they've amassed for designing key performance indicators (KPIs), services, episodes, and machine learning to maximize end-user experience and insights. Whether you're new or experienced, you'll come away with tactical guidance you can use right away.

You might also be interested in the following Splunk resources: