Improving Smart Mode usage in ITSI
Smart Mode is a notable event aggregation policy option that attempts to group notable events based on similarities.
Why should you use Smart Mode?
- Grouping similar events together makes work easier
- Having disparate events grouped based on time also makes work easier
- You want to reduce the number of incidents created when there's a datacenter outage
- Keeping alerts for the same entities grouped together, and the same services for the same entities, makes sense
However, there are risks. Smart Mode relies on machine learning techniques that analyze similarity of events. This can lead to errors in grouping:
- Events that do not belong together do get grouped together. This can lead to:
- Missed alerts
- Wrong teams being notified
- No notification at all for new issues
- Increased time to resolve issues
- Events that do belong together don’t get grouped. This can lead to:
- Alert fatigue
- Loss of confidence in alerting
- Lack of response from support staff
This article is part of the Definitive Guide to Best Practices for IT Service Intelligence. ITSI administrators will benefit from adopting this practice as they work on Event Analytics.
Solution
Instead of using Smart Mode, use the Content Pack for ITSI Monitoring and Alerting.
There are multiple notable event aggregation policies (NEAPs) available in the content pack that can be situationally used. However, turning them all on can cause, not solve, your problem. Specifically for ones that overlap Smart Mode functionality, here are some recommendations.
- Episodes by Src (source), especially for third-party alerts
- Episodes by ITSI Service
- Episodes by Alarm
- ITSI Alert and Episode Monitoring
Next, the Content Pack for ITSI Monitoring and Alerting can help prevent alert storms caused by tons of episodes for similar events. It includes tools to help analyze the episodes and alerts that you have. These services and associated KPIs track the performance of alert and episode analytics, and alert on issues when they occur. And several dashboards help triage what might be going wrong with the NEAPS that are in place.
Finally, the content pack includes useful dashboards for analysis and to help you make adjustments as needed. These dashboards include:
- ITSI Alert and Episode Storm Activity - Episode Review Dashboard
- ITSI Alert and Episode Volume Trend Analysis
- ITSI Episode Analysis
Next steps
You might also be interested in the following Splunk resources:
- Splunk Docs: Event analytics manual
- Splunk Docs: Smart Mode
- Splunk Docs: Content Pack for ITSI Monitoring and Alerting
- Use Case: Configuring the ITSI Notable Event Aggregation Policy