Processes running after *nix patching event

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

System patching is a risky process in a production environment. Based on the method by which a critical process was originally started or configured, it might not survive the patching and reboot event. You want a search that will let you determine the number of running processes before and after a patching event.

Data required

*nix: Operating system logs

Procedure

Run the following search. You can optimize it by specifying an index and adjusting the time range.

source=ps earliest=-15m@m latest=now
|eval dataset="last 15m" 
|append 
    [|search index=<index name> source=ps earliest=-75m@m latest=-60m@m 
    |eval dataset="1h ago"]
|search user!=root 
|stats dc(dataset) AS dc_dataset values(dataset) AS values_dataset BY COMMAND host 
|eval no_longer_running_process = if(dc_dataset=1 AND values_dataset="1h ago", COMMAND, null()) 
|eval newly_running_process = if(dc_dataset=1 AND values_dataset="last 15m", COMMAND, null()) 
|stats values(*_running_process) AS *_running_process BY host

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search	Explanation
`source=ps`	Search process information.
`earliest=-15m@m latest=now`	Search back for 15 minutes from now.
`\|eval dataset="last 15m"`	Set the field named dataset to the quoted string.
`\|append` `[\|search index=<index name> source=ps earliest=-75m@m latest=-60m@m` `\|eval dataset="1h ago"]`	Search for events that occurred an hour ago and look back for 75 minutes. Append the results to the primary search.
`\|search user!=root`	Include results not run by root.
`\|stats dc(dataset) AS dc_dataset values(dataset) AS values_dataset BY COMMAND host`	Get a distinct count of the dataset by file system and put the contents of the dataset into values.
`\|eval no_longer_running_process = if(dc_dataset=1 AND values_dataset="1h ago", COMMAND, null())`	Create the `no_longer_running_process` field for a process distinct count of “1” when the count only existed 1 hour ago.
`\|eval newly_running_process = if(dc_dataset=1 AND values_dataset="last 15m", COMMAND, null())`	Create the `newly_running_process` field for a directory process count of “1” when the count only exists within the last 15 minutes.
`\|stats values(_running_process) AS _running_process BY host`	Create a list of process statuses by host.

Next steps

Sample results for this search are shown in the table below. A process shown in the no_longer_running column might indicate that something went wrong due to the patch. In the sample below, sshd is not running, so the patch may have broken the configuration of sshd and should be investigated.

This search, alongside monitoring package installations and upgrades on a *nix server, could be used to see what was upgraded on a host that shows a process that is no longer running. Putting these events side-by-side on a dashboard can save time. You can also create a correlation between the two searches and use an alert when the condition is met.

host newly_running_process no_longer_running_process still_running_process

`host`	`newly_running_process`	`no_longer_running_process`	`still_running_process`
`ip-172-31-64-114.ec2.internal`	`httpd`		`Chronyd` `dbus-daemon` `lsmd` `pickup` `qmgr` `rpcbind` `sshd:`
`ip-172-31-71-164.ec2.internal`		`sshd:`	`chronyd` `dbus-daemon` `lsmd` `Pickup` `Qmgr` `rpcbind`
`ip-172-31-79-80.ec2.internal`			`chronyd` `dbus-daemon` `lsmd` `pickup` `qmgr` `rpcbind` `sshd:`

ip-172-31-64-114.ec2.internal

httpd

Chronyd

dbus-daemon

lsmd

pickup

qmgr

rpcbind

sshd:

ip-172-31-71-164.ec2.internal

sshd:

chronyd

dbus-daemon

lsmd

Pickup

Qmgr

rpcbind

ip-172-31-79-80.ec2.internal

chronyd

dbus-daemon

lsmd

pickup

qmgr

rpcbind

sshd:

Finally, you might be interested in other processes associated with the Maintaining *nix systems use case.