Improving KPI, entity, and advanced configurations with the ITSI Configuration Assistant

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

The ITSI Configuration Assistant is a powerful tool introduced in Splunk ITSI (ITSI) version 4.19 that helps administrators identify and resolve misconfigured KPIs, manage entities, and optimize system performance. This article walks you through using the Configuration Assistant to improve your ITSI deployment.

The Configuration Assistant provides three main capabilities:

KPI optimization: Identifies KPIs that might have misconfigured thresholds
Entity configuration: Detects duplicate entities, unstable aliases, and flapping entities
Advanced configuration: Provides access to system-level settings that improve performance

This article is part of the The definitive guide to best practices for IT Service Intelligence, which provides essential guidelines to ensure optimal operations and an excellent end-user experience, helping you to unlock the full potential of ITSI.

How to use Splunk software for this use case

KPI optimization

The ITSI Configuration Assistant analyzes your KPIs to identify those that might have misconfigured thresholds by examining how much time they spend outside of a normal state. For example, if a KPI is constantly yellow, the Assistant flags it and suggests that your threshold might be set too low and should be adjusted to reflect actual normal operating conditions.

The Assistant also provides AI-assisted thresholding recommendations, which use historical KPI data to suggest appropriate threshold values. Implementing these recommendations helps you avoid problematic KPIs that would generate noise rather than actionable signals.

The screenshot below shows an example of a full list of KPIs within the Assistant. Select the box next to the KPIs you want the Assistant to analyze, and select Run AI analysis to generate suggestions.

Entity configuration

The ITSI Configuration Assistant helps you identify and resolve common entity issues that can cause confusion or inaccurate health scores. The Assistant scans for duplicate entities, problematic aliases, and unstable entities that appear to flap between up and down states. Flapping typically occurs when an entity sends data at intervals slower than the ITSI scanning frequency. For example, if data arrives every five minutes but ITSI scans every minute, the entity will repeatedly show as up and then down. The screenshot below shows entity configuration issues identified within the Assistant.

When the Assistant identifies these issues, you can resolve them directly from the interface by selecting the problematic entity and selecting Retire Entities to remove it from active use. Doing so keeps your entity list clean and your health scores accurate.

Configure advanced settings

You can use the advanced ITSI configurations settings within the Configuration Assistant to set up global settings for your environment. You must have the admin role to make these changes.

Enabling the rules engine queue mode

Rules engine queue mode, also known as Notable Event Actions Queue Technology (NATS), is a performance optimization feature that fundamentally changes how ITSI processes notable events and aggregates them into episodes.

In traditional ITSI event processing, the system follows a two-step process:

Correlation searches run: These searches identify conditions that should generate notable events.
Notable Event Aggregation Policies (NEAPs) run: These policies aggregate the notable events into episodes and execute associated actions.

This sequential process means you must wait for both steps to complete before alerts appear in your episodes. Rules Engine Queue Mode changes this behavior by optimizing the aggregation portion of the pipeline, resulting in faster alert delivery into episodes without relying on real-time searches for detection.

The screenshot below shows the rules engine queue mode toggle within the Assistant.

NATS is enabled by default from ITSI 4.20 for new installations. If you upgraded from a previous version, you must manually enable NATS.
Do not write events directly to the tracked alerts index if NATS is enabled. Doing so slows down processing and causes events to only be caught during cleanup cycles (every 20-30 minutes).

Configure service auto-refresh

Service auto-refresh is a global setting that controls whether ITSI automatically refreshes service views at a defined interval. This is particularly useful for NOC (Network Operations Center) environments where service trees and dashboards are displayed on monitors and need to stay current without manual intervention. This is a global setting that affects all users; it cannot be configured on a per-user basis.

Set up KPI time adjustment

KPI time adjustment allows you to bulk-adjust time-based thresholds across multiple KPIs simultaneously. This feature is essential in two common scenarios: daylight saving time changes and business operational hour changes.

Daylight savings time changes: When daylight savings time occurs, your time-based thresholds will be offset by an hour, potentially causing false alerts or missed detections until you correct them.
Changes in operational hours: If your business changes its operational hours - such as opening at 8 AM instead of 10 AM during the holiday season - you need to shift all your time-based thresholds accordingly.

Rather than manually editing each KPI individually, the KPI time adjustment feature lets you apply these time offsets in bulk, saving significant administrative effort. The screenshot below shows this bulk adjustment within the Assistant.

View queue status with the Refresh Queue Explorer

ITSI relies on internal queues stored in KV stores to manage configuration changes and notable event actions. Understanding these queues helps you troubleshoot situations where changes don't appear to take effect.

When you make a configuration change in ITSI - such as modifying a NEAP or deleting KPIs - the change is added to the queue, but existing queued actions must process first before your new configuration takes effect. This means if you have a large backlog of actions (for example, a NEAP configured to send emails, open ServiceNow tickets, and add comments simultaneously), your recent changes won't apply until those queued actions complete.

You can monitor queue status using the built-in Refresh Queue Explorer dashboard, shown in the example screenshot below, which displays successful and failed refresh queue jobs.

You can also use the Splunk App for Lookup File Editing to inspect KV store contents directly. The most important queues to monitor are the itsi_refresh_queue and notable_event_actions_queue.

The screenshot below shows an example of the itsi_notable_event_actions_queue containing actions. If this queue is empty, your change has happened. In this example, there are actions waiting and these need to be processed before your new changes can take effect.

The screenshot below shows an example of the detailed contents of the queue, displaying the GUIDs of items waiting to be processed. In this example, you can see entries related to KPI deletions, indicating that a large number of KPIs are queued for removal. If this queue is thousands of events deep, it's easy to assume ITSI is broken and attempt the same change again, which only queues it a second time and causes errors when the original change finally processes.

Additional resources

For more information on using the ITSI Configuration Assistant, see Splunk Help.

These resources might also help you understand and implement this guidance:

Splunk OnDemand Services: Use these credit-based services for direct access to Splunk technical consultants with a variety of technical services from a pre-defined catalog. Most customers have OnDemand Services per their Success Plan. Engage the ODS team at ondemand@cisco.com if you would like assistance.