ESXi hosts with high CPU Ready summation value
CPU Sum Ready indicates that a virtual machine needs access to CPU resources to continue processing, but the underlying host has no remaining CPU resources to allocate. This metric can be calculated as summation or percentage.
When many virtual machines on an ESXi host have high sum ready metrics, the host might be experiencing CPU pressure. You want to monitor your network for this type of problem so you can take mitigating action.
Data required
- VMware. This procedure depends on data primarily obtained from the Splunk Add-on for VMware Metrics; however, log and event data from the VMWare environment can also provide additional insights into general VMWare environment health. Therefore, for best performance, you should also download and install Splunk Add-on for VMware ESXi Logs and Splunk Add-on for vCenter Logs.
Procedure
- Ensure that you have installed the IT Essentials Work app to onboard VMware data and provide the various VMware entity type configurations and dashboards.
- Ensure that you are collecting VMware data through one or more Data Collection Nodes, which are essentially Splunk heavy forwarders with specific VMware collection configurations.
- Run the following search. You can optimize it by specifying an index and adjusting the time range.
| mstats avg(vsphere.vm.cpu.ready) AS avg.milliseconds.vsphere.vm.cpu.ready WHERE (index=vmware-perf-metrics) AND name="*" AND moid="*" AND host="*" span=1m BY moid | dedup moid | eval is_high_sumready = if('avg.milliseconds.vsphere.vm.cpu.ready' > 500, 1, 0) | table name moid avg.milliseconds.vsphere.vm.cpu.ready is_high_sumready | append [| search index="vmware-inv" source="VMInv:Hierarchy" type=VirtualMachine | dedup moid | eval esxi_moid = 'changeSet.runtime.host.moid' | table moid esxi_moid] | eventstats values(esxi_moid) AS esxi_moid BY moid | search is_high_sumready=* | table name, moid,esxi_moid, avg.milliseconds.vsphere.vm.cpu.ready, is_high_sumready | stats count AS count_vms_by_host sum(is_high_sumready) AS count_high_sumready_by_host avg(avg.milliseconds.vsphere.vm.cpu.ready) AS avg_sumready_by_host BY esxi_moid | eval perc_vms_high_sumready_by_host = round(count_high_sumready_by_host/count_vms_by_host*100, 2) | sort - perc_vms_high_sumready_by_host
Search explanation
The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.
Splunk Search | Explanation |
---|---|
| mstats avg(vsphere.vm.cpu.ready) AS avg.milliseconds.vsphere.vm.cpu.ready WHERE (index=vmware-perf-metrics) AND name="*" AND moid="*" AND host="*" span=1m BY moid |
Get the most recent results for the performance of all unique virtual machines. |
|
Create the |
| table name moid avg.milliseconds.vsphere.vm.cpu.ready is_high_sumready |
Display the results in a table with columns in the order shown. |
| append |
Obtain the MOIDs of the ESXi hosts on which each virtual machine is running. Append the returned fields to the primary search. |
|eventstats values(esxi_moid) AS esxi_moid BY moid |
Add the |
|
Return all results where the average is above 500ms |
| table name, moid,esxi_moid, avg.milliseconds.vsphere.vm.cpu.ready, is_high_sumready |
Display the results in a table with columns in the order shown. |
|
Sum the total number of virtual machines with high CPU Ready summation value. Obtain the average Sum Ready for all virtual machines per each ESXi host. |
|
Determine the percentage of virtual machines on the host with high |
|
Sort results with the largest percent first. |
Next steps
The table below shows sample results for the search. High values of perc_vms_high_sumready_by_host
mean that the VM has work to do that is waiting to be scheduled. Generally, some factors to investigate are oversubscription where too many vCPUs have been allocated from the physical pCPUs. Another is when there are a lot of smaller VMS alongside a larger one. Often the larger VM will be waiting for the scheduler to preempt enough vPCUs from the small VMs to be able to run. A third factor can be limits.
If you find a host that is experiencing the high values mentioned above, you can balance the workload across the cluster or look for oversubscription.
esxi_moid |
count_vms_by_host |
count_high_ |
avg_sumready_by_host |
perc_vms_high_ |
---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Finally, you might be interested in other processes associated with the Monitoring VMware virtual machine performance use case.