Processes running after *nix patching event
System patching is a risky process in a production environment. Based on the method by which a critical process was originally started or configured, it might not survive the patching and reboot event. You want a search that will let you determine the number of running processes before and after a patching event.
Data required
Procedure
Run the following search. You can optimize it by specifying an index and adjusting the time range.
source=ps earliest=-15m@m latest=now |eval dataset="last 15m" |append [|search index=<index name> source=ps earliest=-75m@m latest=-60m@m |eval dataset="1h ago"] |search user!=root |stats dc(dataset) AS dc_dataset values(dataset) AS values_dataset BY COMMAND host |eval no_longer_running_process = if(dc_dataset=1 AND values_dataset="1h ago", COMMAND, null()) |eval newly_running_process = if(dc_dataset=1 AND values_dataset="last 15m", COMMAND, null()) |stats values(*_running_process) AS *_running_process BY host
Search explanation
The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.
Splunk Search | Explanation |
---|---|
|
Search process information. |
|
Search back for 15 minutes from now. |
|
Set the field named dataset to the quoted string. |
|
Search for events that occurred an hour ago and look back for 75 minutes. Append the results to the primary search. |
|
Include results not run by root. |
|
Get a distinct count of the dataset by file system and put the contents of the dataset into values. |
|
Create the |
|
Create the |
|
Create a list of process statuses by host. |
Next steps
Sample results for this search are shown in the table below. A process shown in the no_longer_running
column might indicate that something went wrong due to the patch. In the sample below, sshd is not running, so the patch may have broken the configuration of sshd and should be investigated.
This search, alongside monitoring package installations and upgrades on a *nix server, could be used to see what was upgraded on a host that shows a process that is no longer running. Putting these events side-by-side on a dashboard can save time. You can also create a correlation between the two searches and use an alert when the condition is met.
host |
newly_running_process |
no_longer_running_process |
still_running_process |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
Finally, you might be interested in other processes associated with the Maintaining *nix systems use case.