Skip to main content
 
 
 
Splunk Lantern

Improving Smart Mode usage in ITSI

 

Smart Mode is a notable event aggregation policy option that attempts to group notable events based on similarities.

Why should you use Smart Mode?

  • Grouping similar events together makes work easier
  • Having disparate events grouped based on time also makes work easier
  • You want to reduce the number of incidents created when there's a datacenter outage
  • Keeping alerts for the same entities grouped together, and the same services for the same entities, makes sense

However, there are risks. Smart Mode relies on machine learning techniques that analyze similarity of events. This can lead to errors in grouping:

  • Events that do not belong together do get grouped together. This can lead to:
    • Missed alerts
    • Wrong teams being notified
    • No notification at all for new issues
    • Increased time to resolve issues
  • Events that do belong together don’t get grouped. This can lead to:
    • Alert fatigue
    • Loss of confidence in alerting
    • Lack of response from support staff

This article is part of the Definitive Guide to Best Practices for IT Service Intelligence. ITSI administrators will benefit from adopting this practice as they work on Event Analytics

Solution

Instead of using Smart Mode, use the Content Pack for ITSI Monitoring and Alerting.

There are multiple notable event aggregation policies (NEAPs) available in the content pack that can be situationally used. However, turning them all on can cause, not solve, your problem. Specifically for ones that overlap Smart Mode functionality, here are some recommendations.

  • Episodes by Src (source), especially for third-party alerts
  • Episodes by ITSI Service
  • Episodes by Alarm
  • ITSI Alert and Episode Monitoring

Next, the Content Pack for ITSI Monitoring and Alerting can help prevent alert storms caused by tons of episodes for similar events. It includes tools to help analyze the episodes and alerts that you have. These services and associated KPIs track the performance of alert and episode analytics, and alert on issues when they occur. And several dashboards help triage what might be going wrong with the NEAPS that are in place.

Finally, the content pack includes useful dashboards for analysis and to help you make adjustments as needed. These dashboards include:

  • ITSI Alert and Episode Storm Activity - Episode Review Dashboard
  • ITSI Alert and Episode Volume Trend Analysis
  • ITSI Episode Analysis

Next steps

This content comes from Splunk .Conf presentation, The Definitive List of Best Practices for Splunk® IT Service Intelligence: How to Configure, Administer, and Use ITSI for Optimal Results, part one presented in .Conf23 and part two presented in .Conf24 session. In the session replays, you can watch Jason Riley and Jeff Wiedemann share the many awesome best practices they've amassed for designing key performance indicators (KPIs), services, episodes, and machine learning to maximize end-user experience and insights. Whether you're new or experienced, you'll come away with tactical guidance you can use right away.

You might also be interested in the following Splunk resources: