High CPU utilization alert
High CPU utilization can be an indication that the host is having problems. If your system is overutilized, it does not have enough capacity for the CPU demand. You want to use metrics to detect heavy CPU usage before it impacts system performance, then alert when that metric exceeds a specified threshold.
- Run the following search. You can optimize it by specifying an index and adjusting the time range.
| mstats avg(_value) prestats=true WHERE metric_name="Processor.%_Processor_Time" AND index=<metrics index name> AND instance="_Total" span=1m BY host | stats avg(_value) AS cpu_usage BY host | eval Critical_Usage = if(cpu_usage > 95, "Yes", "No") | table host Critical_Usage cpu_usage | where Critical_Usage="Yes"
The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.
|| mstats avg(_value) prestats=true WHERE metric_name="Processor.%_Processor_Time" AND index=<metrics index name> AND instance="_Total" span=1m BY host||
Use the %_Processor_Time values in a metrics index to determine average CPU utilization, setting prestats to true so the data can be used with the stats command.
If you don't the names of your available metrics, run this search first separately: | mcatalog values(metric_name) WHERE index=*
|| stats avg(_value) AS cpu_usage BY host||
Display the average CPU utilization for each host in a cpu_usage field.
|| eval Critical_Usage = if(cpu_usage > 95, "Yes", "No")||Set the field named Critical_Usage to show whether CPU usage has exceeded 95 percent.|
|| table host Critical_Usage cpu_usage||Display the results in a table with columns in the order shown.|
|| where Critical_Usage="Yes"||Filter the results to show only those that have passed the Critical Usage threshold.|
You can modify this search to alert on disk utilization and memory utilization. You can also adjust the threshold to meet your organizational needs. After you configure the search, save it as an alert and customize the trigger actions.
You might be interested in other processes associated with the Recovering lost visibility of IT infrastructure use case.