Skip to main content
 
 
 
Splunk Lantern

ESXi hosts with high CPU Ready summation value

 

CPU Sum Ready indicates that a virtual machine needs access to CPU resources to continue processing, but the underlying host has no remaining CPU resources to allocate. This metric can be calculated as summation or percentage.

When many virtual machines on an ESXi host have high sum ready metrics, the host might be experiencing CPU pressure. You want to monitor your network for this type of problem so you can take mitigating action.

Data required

Procedure

  1. Ensure that you have installed the IT Essentials Work app to onboard VMware data and provide the various VMware entity type configurations and dashboards.
  2. Ensure that you are collecting VMware data through one or more Data Collection Nodes, which are essentially Splunk heavy forwarders with specific VMware collection configurations.
  3. Run the following search. You can optimize it by specifying an index and adjusting the time range.
| mstats avg(vsphere.vm.cpu.ready) AS avg.milliseconds.vsphere.vm.cpu.ready WHERE (index=vmware-perf-metrics) AND name="*" AND moid="*" AND host="*" span=1m BY moid 
| dedup moid
| eval is_high_sumready = if('avg.milliseconds.vsphere.vm.cpu.ready' > 500, 1, 0) 
| table name moid avg.milliseconds.vsphere.vm.cpu.ready is_high_sumready 
| append 
    [| search index="vmware-inv" source="VMInv:Hierarchy" type=VirtualMachine
    | dedup moid 
    | eval esxi_moid = 'changeSet.runtime.host.moid' 
    | table moid esxi_moid] 
| eventstats values(esxi_moid) AS esxi_moid BY moid
| search is_high_sumready=* 
| table name, moid,esxi_moid, avg.milliseconds.vsphere.vm.cpu.ready, is_high_sumready 
| stats count AS count_vms_by_host sum(is_high_sumready) AS count_high_sumready_by_host avg(avg.milliseconds.vsphere.vm.cpu.ready) AS avg_sumready_by_host BY esxi_moid
| eval perc_vms_high_sumready_by_host = round(count_high_sumready_by_host/count_vms_by_host*100, 2) 
| sort - perc_vms_high_sumready_by_host

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation
| mstats avg(vsphere.vm.cpu.ready) AS avg.milliseconds.vsphere.vm.cpu.ready WHERE (index=vmware-perf-metrics) AND name="*" AND moid="*" AND host="*" span=1m BY moid
| dedup moid
Get the most recent results for the performance of all unique virtual machines.

| eval is_high_sumready = if('avg.milliseconds.vsphere.vm.cpu.ready' > 500, 1, 0)

Create the is_high_sumready field for results the average is above 500ms.

| table name moid avg.milliseconds.vsphere.vm.cpu.ready is_high_sumready

Display the results in a table with columns in the order shown.

| append
[| search index="vmware-inv" source="VMInv:Hierarchy" type=VirtualMachine
| dedup moid
| eval esxi_moid = 'changeSet.runtime.host.moid'
| table moid esxi_moid]

Obtain the MOIDs of the ESXi hosts on which each virtual machine is running. Append the returned fields to the primary search.

|eventstats values(esxi_moid) AS esxi_moid BY moid

Add the esxi_moid field to all virtual machine Sum Ready results.

|search is_high_sumready=*

Return all results where the average is above 500ms

| table name, moid,esxi_moid, avg.milliseconds.vsphere.vm.cpu.ready, is_high_sumready Display the results in a table with columns in the order shown.

|stats count AS count_vms_by_host sum(is_high_sumready) AS count_high_sumready_by_host avg(avg_p_summation_cpu_ready_millisecond) AS avg_sumready_by_host BY esxi_moid

Sum the total number of virtual machines with high CPU Ready summation value. Obtain the average Sum Ready for all virtual machines per each ESXi host.

|eval perc_vms_high_sumready_by_host = round(count_high_sumready_by_host/count_vms_by_host*100, 2)

Determine the percentage of virtual machines on the host with high Sum Ready.

|sort - perc_vms_high_sumready_by_host

Sort results with the largest percent first.

Next steps

The table below shows sample results for the search. High values of perc_vms_high_sumready_by_host mean that the VM has work to do that is waiting to be scheduled. Generally, some factors to investigate are oversubscription where too many vCPUs have been allocated from the physical pCPUs. Another is when there are a lot of smaller VMS alongside a larger one. Often the larger VM will be waiting for the scheduler to preempt enough vPCUs from the small VMs to be able to run. A third factor can be limits.

If you find a host that is experiencing the high values mentioned above, you can balance the workload across the cluster or look for oversubscription.

esxi_moid count_vms_by_host count_high_
sumready_by_host
avg_sumready_by_host perc_vms_high_
sumready_by_host

host-20

18

14

510.4236111

77.78

host-10

17

11

466.8455882

64.71

host-11

15

0

357.2583333

0

host-26

3

0

6

0

Finally, you might be interested in other processes associated with the Monitoring VMware virtual machine performance use case.