You might want to detect when CPU utilization is nearing capacity when doing the following:
In order to execute this procedure in your environment, the following data, services, or apps are required:
Excessive CPU utilization on a host, particularly abnormal or prolonged, is a sign of potential issues with the critical applications running on the host. You want to detect when an application is starved for CPU resources so you can prevent performance degradations or application instability.
- Run the following search:
| mstats min(cpu_metric.pctIdle) AS Idle WHERE index="<name of *nix metrics index>" AND host="<name of host to check>" span=1m BY host | eval cpu_utilization=(100 - Idle) | timechart max(cpu_utilization) AS cpu_utilization BY host
The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.
|| mstats min(cpu_metric.pctIdle) AS Idle WHERE index="<name of *nix metrics index>" AND host="<name of host to check>" span=1m BY host||Search metrics index(s) where CPU utilization data is being collected and filter down to the desired host(s).|
|| eval cpu_utilization=(100 - Idle)||Convert percent idle to percent utilized for readability.|
|| timechart max(cpu_utilization) AS cpu_utilization BY host||Plot CPU utilization over time.|
Set up an alert based on this search so you can proactively manage potential stability issues.
- Ensure that you have the Splunk OTEL Collector installed on the host you want to monitor.
- In Splunk Infrastructure Monitoring, use the following SignalFlow to search the cpu.utilization streaming metric and filter down to the desired host(s).
A = data('cpu.utilization', filter=filter('host.name', '<name of host to check>')).publish(label='A')
To alert when CPU utilization is nearing max capacity for the selected host(s) and process(es), use the SignalFlow from this procedure to configure a detector with an alert condition of "Static Threshold" and alert settings of:
- Alert when: Above
- Threshold: 95
- Trigger sensitivity: Duration
- Duration: 5m