Skip to main content
 
 
Splunk Lantern

Expected *Nix process not running

 

Many critical IT applications and services running on *nix operating systems run as a process. You want to detect when an expected process is not found in the process list on the host so you can proactively manage potential stability issues.

Procedure

  1. Ensure that you have installed the Splunk Add-on for Unix and Linux on your Splunk search head, indexer, and the universal forwarders on the monitored systems. Click here for an example inputs.conf file that can be deployed to the universal forwarder on the *nix host to collect Memory utilization data and store the results into a metrics index.
  2. In Splunk Enterprise or Splunk Cloud Platform, run the following search. You can optimize it by specifying an index and adjusting the time range.
| mstats count WHERE index="<name of *nix metrics index>" AND metric_name=ps_metric* host="<name of host to check>" BY host, COMMAND span=15m
| rename COMMAND AS process
| search process!=\[*
| eval expected_process_list=mvappend("<name of process to check>", "<name of process to check>") 
| eval expected_process_count="<total number of processes expected per host>"
| eval expected_process_regex="(?i)".mvjoin(expected_process_list, "|")
| eval expected_process_found=if(match(process,expected_process_regex),1,0)
| stats values(expected_process_list) AS expected_processes values(expected_process_count) AS expected_process_count 
  values(eval(if(expected_process_found>0,process,null()))) AS processes_found sum(expected_process_found) AS processes_found_count BY _time host
| eval count_of_missing_processes=expected_process_count - processes_found_count
| dedup host
| rename expected_processes AS "Expected Processes", expected_process_count AS "# of Expected Processes per Host", processes_found_count AS "# of Expected Processes"
         processes_found AS "Expected Processes Found on Host", processes_found_count AS "# of Expected Process Found on Host", count_of_missing_processes AS "Expected Processes Missing"

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation
| mstats count WHERE index="< name of *nix metrics index >" AND metric_name=ps_metric* host="<name of host to check>" BY host, COMMAND span=15m Search metrics indexes where process data is being collected and filter down to the desired hosts to check.
| rename COMMAND AS process
| search process!=\[*
Rename the field as shown for better readability.

| eval expected_process_list=mvappend("< name of process to check >", "< name of process to check >")

| eval expected_process_count="<total number of processes expected per host>

Capture the list of expected processes to check and the total expected process count per host.

Add as many processes as you need. You can use regex syntax here.

| eval expected_process_regex="(?i)".mvjoin(expected_process_list, "|")

| eval expected_process_found=if(match(process,expected_process_regex),1,0)

Convert the expected process list into a regex expression, searching over each process for each host looking for matching processes.
| stats values(expected_process_list) AS expected_processes values(expected_process_count) AS expected_process_count
values(eval(if(expected_process_found>0,process,null()))) AS processes_found sum(expected_process_found) AS processes_found_count BY _time host
Compute the number of matching processes per host over time.
| eval count_of_missing_processes=expected_process_count - processes_found_count Return the total number of expected processes which are not currently running on the host.
| dedup host Remove duplicate hosts.
| rename expected_processes AS "Expected Processes", expected_process_count AS "# of Expected Processes per Host", processes_found_count AS "# of Expected Processes"
processes_found AS "Expected Processes Found on Host", processes_found_count AS "# of Expected Process Found on Host", count_of_missing_processes AS "Expected Processes Missing"
Rename the fields as shown for better readability.

Next steps

The Expected Processes Missing field indicates the total number of processes expected but missing from the most recent host process data. Any positive number indicates one or more expected processes missing. Zero indicates the number of running processes matches what is expected. A negative number indicates that a higher number of processes were found than expected.

To alert when a host is not running one or more critical processes, you can configure one of the following two recommendations:

  • Use the SPL from this procedure to configure a Splunk platform alert.
  • Build a new vital metric for the Unix/Linux entity type in IT Essentials Work and configure vital metric alerting. Click here for an example SPL search that can be used for the vital metric search. After the vital metric has been created, configure it to alert when the number of expected processes not running is greater than zero.

Finally, you might be interested in other processes associated with the Maintaining *nix systems use case.