Skip to main content
 
Splunk Lantern

Processes running after *nix patching event

 

System patching is a risky process in a production environment. Based on the method by which a critical process was originally started or configured, it might not survive the patching and reboot event. You want a search that will let you determine the number of running processes before and after a patching event.

Procedure

Run the following search. You can optimize it by specifying an index and adjusting the time range.

source=ps earliest=-15m@m latest=now
|eval dataset="last 15m" 
|append 
    [|search index=<index name> source=ps earliest=-75m@m latest=-60m@m 
    |eval dataset="1h ago"]
|search user!=root 
|stats dc(dataset) AS dc_dataset values(dataset) AS values_dataset BY COMMAND host 
|eval no_longer_running_process = if(dc_dataset=1 AND values_dataset="1h ago", COMMAND, null()) 
|eval newly_running_process = if(dc_dataset=1 AND values_dataset="last 15m", COMMAND, null()) 
|stats values(*_running_process) AS *_running_process BY host 

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation

source=ps 

Search process information.

earliest=-15m@m latest=now 

Search back for 15 minutes from now. 

|eval dataset="last 15m" 

Set the field named dataset to the quoted string.

|append 

[|search index=<index name> source=ps earliest=-75m@m latest=-60m@m 

|eval dataset="1h ago"]

Search for events that occurred an hour ago and look back for 75 minutes. Append the results to the primary search.

|search user!=root 

Include results not run by root. 

|stats dc(dataset) AS dc_dataset values(dataset) AS values_dataset BY COMMAND host 

Get a distinct count of the dataset by filesystem and put the contents of the dataset into values.

|eval no_longer_running_process = if(dc_dataset=1 AND values_dataset="1h ago", COMMAND, null()) 

Create the no_longer_running_process field for a process distinct count of “1” when the count only existed 1 hour ago. 

|eval newly_running_process = if(dc_dataset=1 AND values_dataset="last 15m", COMMAND, null()) 

Create the newly_running_process field for a directory process count of “1” when the count only exists within the last 15 minutes. 

|stats values(*_running_process) AS *_running_process BY host

Create a list of process statuses by host.

Next steps

Sample results for this search are shown in the table below. A process shown in the no_longer_running column might indicate that something went wrong due to the patch. In the sample below, sshd is not running, so the patch may have broken the configuration of sshd and should be investigated. 

This search, alongside monitoring package installations and upgrades on a *nix server, could be used to see what was upgraded on a host that shows a process that is no longer running. Putting these events side-by-side on a dashboard can save time. You can also create a correlation between the two searches and use an alert when the condition is met. 

host newly_running_process no_longer_running_process still_running_process

ip-172-31-64-114.ec2.internal

httpd

 

Chronyd

dbus-daemon 

lsmd 

pickup 

qmgr 

rpcbind 

sshd:

ip-172-31-71-164.ec2.internal

 

sshd:

chronyd 

dbus-daemon 

lsmd 

Pickup

Qmgr

 rpcbind

ip-172-31-79-80.ec2.internal

   

chronyd 

dbus-daemon

 lsmd 

pickup 

qmgr

rpcbind 

sshd:

Finally, you might be interested in other processes associated with the Maintaining *nix systems use case.