Skip to main content
Splunk Lantern

Expected *Nix process not running

You might want to know when an expected process is not found in the process list when doing the following:

Prerequisites 

In order to execute this procedure in your environment, the following data, services, or apps are required:

Example

Many critical IT applications and services running on *nix operating systems run as a process. You want to detect when an expected process is not found in the process list on the host so you can proactively manage potential stability issues.

Option 1

To optimize the search shown below, you should specify an index and a time range.

  1. In Splunk Enterprise or Splunk Cloud Platform, run the following search: 
| mstats count WHERE index="<name of *nix metrics index>" AND metric_name=ps_metric* host="<name of host to check>" BY host, COMMAND span=15m
| rename COMMAND AS process
| search process!=\[*
| eval expected_process_list=mvappend("<name of process to check>", "<name of process to check>") 
| eval expected_process_count="<total number of processes expected per host>"
| eval expected_process_regex="(?i)".mvjoin(expected_process_list, "|")
| eval expected_process_found=if(match(process,expected_process_regex),1,0)
| stats values(expected_process_list) AS expected_processes values(expected_process_count) AS expected_process_count 
  values(eval(if(expected_process_found>0,process,null()))) AS processes_found sum(expected_process_found) AS processes_found_count BY _time host
| eval count_of_missing_processes=expected_process_count - processes_found_count
| dedup host
| rename expected_processes AS "Expected Processes", expected_process_count AS "# of Expected Processes per Host", processes_found_count AS "# of Expected Processes"
         processes_found AS "Expected Processes Found on Host", processes_found_count AS "# of Expected Process Found on Host", count_of_missing_processes AS "Expected Processes Missing"

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation
| mstats count WHERE index="< name of *nix metrics index >" AND metric_name=ps_metric* host="<name of host to check>" BY host, COMMAND span=15m Search metrics index(es) where process data is being collected and filter down to the desired host(s) to check.
| rename COMMAND AS process Rename the field as shown for better readability.
| search process!=\[*  

| eval expected_process_list=mvappend("< name of process to check >", "< name of process to check >") 

| eval expected_process_count="<total number of processes expected per host>

Capture the list of expected processes to check and the total expected process count per host.

Add as many processes as you need. You can use regex syntax here.

| eval expected_process_regex="(?i)".mvjoin(expected_process_list, "|")

| eval expected_process_found=if(match(process,expected_process_regex),1,0)

Convert the expected process list into a regex expression, searching over each process for each host looking for matching processes. 
| stats values(expected_process_list) AS expected_processes values(expected_process_count) AS expected_process_count 
  values(eval(if(expected_process_found>0,process,null()))) AS processes_found sum(expected_process_found) AS processes_found_count BY _time host
Compute the number of matching processes per host over time.
| eval count_of_missing_processes=expected_process_count - processes_found_count Return the total number of expected processes which are not currently running on the host.
| dedup host Remove duplicate hosts.
| rename expected_processes AS "Expected Processes", expected_process_count AS "# of Expected Processes per Host", processes_found_count AS "# of Expected Processes"
         processes_found AS "Expected Processes Found on Host", processes_found_count AS "# of Expected Process Found on Host", count_of_missing_processes AS "Expected Processes Missing"
Rename the fields as shown for better readability.

Result

The Expected Processes Missing field indicates the total number of processes expected but missing from the most recent host process data. Any positive number indicates one or more expected processes missing. Zero indicates the number of running processes matches what is expected. A negative number indicates that a higher number of processes were found than expected

Option 2

  1. Ensure that you have the Splunk OTEL Collector installed on the host you want to monitor.
  2. Update the receivers section of the OTEL agent config file on the host to collect procstat metrics for each process.
    …
    receivers:
    …
      #The following config will collect process metrics for all processes. You can adjust the pattern parameter to filter down to a subset of processes
      smartagent/procstat:
        type: telegraf/procstat
        pattern: ".*"
  3. Update the services.pipelines.metrics.receivers section of the OTEL agent config file to include the procstat receiver.
    …
    service:
      extensions: …
      pipelines:
        traces: 
          …
        metrics:
          receivers: [..., smartagent/procstat]
          …
  4. In Splunk Infrastructure Monitoring, use the following SignalFlow to search the procstat.cpu_usage streaming metric, filter down to the desired host(s) and process(es), and summarize results by counting the total number of processes found per host.
    A = data('procstat.cpu_usage', filter=filter('host.name', '<name of host to check>') and filter('process_name', '<name of process to check>')).count(by=['host.name']).publish(label='A')

Results

To alert when no process data is flowing in for the selected host(s) and process(es), use the SignalFlow from this procedure to configure a detector with an alert condition of "heartbeat" and alert settings of 15 minutes.

  • Was this article helpful?