Nagios alert management with ITSI
Nagios can be configured to communicate alerts using GET or POST functions through HTTP or HTTPS. As an example, a URL might be used as an interface into a trouble ticket system, and, by correctly formatting the GET function, new trouble tickets can be created automatically.
This article is part of the Splunk ITSI event management accelerator for customers who want to integrate ITSI in Splunk Cloud Platform or Splunk Enterprise with their event management supported data sources.
Install and configure the Nagios Add-on
- Download the Splunk Add-on for Nagios Core from Splunkbase.
- On a forwarder installed on the Nagios Core instances, create or edit the
$SPLUNK_HOME/etc/apps/Splunk_TA_nagio-core/local/inputs.conf
file to include the following stanzas, replacing$NAGIOS_HOME
with your Nagios folder (for example,/usr/local/nagios
).[monitor://$NAGIOS_HOME/var/nagios.log] sourcetype = nagios:core
There might be multiple Nagios servers in the environment. All Nagios servers should have their alerts (
nagios.log
) inputs defined.
More comprehensive installation instructions can be found in Installation overview for the Splunk Add-on for Nagios Core.
Validate the data
At this stage, if everything is working correctly, you will see data flowing into the system. As you can see in the payload below, the line breaking isn't happening exactly where we want it, but all of the event data is making it into a single event (rather than multi-line).
In order to make the line breaking and field indexing work correctly, the appropriate source type needs to be used. This step is only needed if the source type is not already configured and installed as a part of the Content Pack for Monitoring and Alerting (CPMA) installation.
[nagios:core] EXTRACT-nagios:core : GLOBAL SERVICE EVENT HANDLER = ^.{13}GLOBAL SERVICE EVENT HANDLER: (?<host_name>[^;]*);(?<service_name>[^;]*);(?<status_code>[^;]*);(?<state>[^;]*);(?<attempt>[^;]*);(?<info>.*) EVAL-signature = coalesce(service_name, eventname,"unknown") FIELDALIAS-NAGIOS_itsi_ alias= info ASNEW description status_code ASNEW vendor_severity EVAL-severity_id = case(status_code="CRITICAL", 6, status_code="WARNING", 3, status_code="OK", 2,status_code="UP", 2,status_code="DOWN", 6, severity="DOWN", 6,hoststate="down", 6,hoststate="up", 2,1=1, 1) EVAL-itsiInclude = case(state="SOFT", "false", searchmatch("_raw=*wproc:*"),"false")
Additional troubleshooting steps for proper data ingestion might include installation of the universal forwarder, configuration of DB Connect, installing TAs, or extracting KPI fields from log data.
Additionally, this will be a fairly new index, and as a result, you might need to use an all-time search to make sure timestamping is happening at the expected time.
Configure event analytics
In this stage, you will leverage the correlation searches and Notable Event Aggregation Policies (NEAPs) provided by the Content Pack for Monitoring and Alerting to enable notable event and episode creation from Nagios alerts. You need to normalize the data according to the Alerts data model and configure the itsi_kpi_attributes
and/or itsi_episode_contact_map lookups
as appropriate to achieve the correct grouping of notable events and optionally configure which group to email when episodes are created.
Next, enable the Universal Correlation Search (UCS) according to the instructions found here and verify that it is working as expected. You can find more info on the capabilities of the UCS in Configuring the Universal Correlation Search to create notable events.
The Universal Correlation Search already includes noise-reduction methods (deduplication), but the performance of the correlation search can be improved by modifying the macro get_itsi_universal_index
. The Universal Correlation Search (and certain drilldown searches) require a very broad ad-hoc search in order to find all normalized alerts. By default, this is index=*
, which can be very expensive in some environments.
The index list can be modified via the macro get_itsi_universal_index
. To improve search performance, provide a list of indexes, rather than index=*
. To change the macro, perform the following steps:
- Click Settings > Advanced Search > Search Macros.
- Edit the macro
get_itsi_universal_index
. The default definition isindex=* (index!=itsi_tracked_alerts AND index!=itsi_grouped_alerts)
. - Change the definition to include the list of indexes which include normalized alerts. For example:
(index=nagios* OR index=solarwinds OR index=SplunkInfraMon)
or((index=alerts AND (sourcetype=nagios* OR sourcetype=solarwinds)) OR index=SplunkInfraMon)
- Update the macro whenever new alert sources are added.
Never use the Universal Correlation Search with an index=*
parameter. Limit to only relevant indexes.
Next steps
If you have any trouble during this process, you should consider an engagement with Splunk Professional Services for further assistance. Click here to learn more about working with Professional Services.
Additionally, you can return to the main ITSI event management accelerator article for instruction on other third party alerts you can integrate into your deployment.