Skip to main content

 

Splunk Lantern

ESXi hosts with high CPU Ready summation value

CPU Sum Ready indicates that a virtual machine needs access to CPU resources to continue processing, but the underlying host has no remaining CPU resources to allocate. This metric can be calculated as summation or percentage.

You might need to see which ESXi hosts on your network have a high CPU Ready summation value when doing the following:

Prerequisites 

In order to execute this procedure in your environment, the following data, services, or apps are required:

Example

When many virtual machines on an ESXi host have high sum ready metrics, the host might be experiencing CPU pressure. You want to monitor your network for this type of problem so you can take mitigating action.

Option 1

To optimize the search shown below, you should specify an index and a time range.

  1. In Splunk Enterprise or Splunk Cloud Platform, ensure that you have installed the IT Essentials Work app to onboard VMware data and provide the various VMware entity type configurations and dashboards.
  2. Ensure that you are collecting VMware data through one or more Data Collection Nodes, which are essentially Splunk heavy forwarders with specific VMware collection configurations.
  3. Run the following search: 
| mstats avg(vsphere.vm.cpu.ready) AS avg.milliseconds.vsphere.vm.cpu.ready WHERE (index=vmware-perf-metrics) AND name="*" AND moid="*" AND host="*" span=1m BY moid 
| dedup moid
| eval is_high_sumready = if('avg.milliseconds.vsphere.vm.cpu.ready' > 500, 1, 0) 
| table name moid avg.milliseconds.vsphere.vm.cpu.ready is_high_sumready 
| append 
    [| search index="vmware-inv" source="VMInv:Hierarchy" type=VirtualMachine
    | dedup moid 
    | eval esxi_moid = 'changeSet.runtime.host.moid' 
    | table moid esxi_moid] 
| eventstats values(esxi_moid) AS esxi_moid BY moid
| search is_high_sumready=* 
| table name, moid,esxi_moid, avg.milliseconds.vsphere.vm.cpu.ready, is_high_sumready 
| stats count AS count_vms_by_host sum(is_high_sumready) AS count_high_sumready_by_host avg(avg.milliseconds.vsphere.vm.cpu.ready) AS avg_sumready_by_host BY esxi_moid
| eval perc_vms_high_sumready_by_host = round(count_high_sumready_by_host/count_vms_by_host*100, 2) 
| sort - perc_vms_high_sumready_by_host

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation
| mstats avg(vsphere.vm.cpu.ready) AS avg.milliseconds.vsphere.vm.cpu.ready WHERE (index=vmware-perf-metrics) AND name="*" AND moid="*" AND host="*" span=1m BY moid 
| dedup moid
 
Get the most recent results for the performance of all unique virtual machines. 

| eval is_high_sumready = if('avg.milliseconds.vsphere.vm.cpu.ready' > 500, 1, 0) 
 

Create the is_high_sumready field for results the average is above 500ms.

| table name moid avg.milliseconds.vsphere.vm.cpu.ready is_high_sumready 

Display the results in a table with columns in the order shown.

| append 
    [| search index="vmware-inv" source="VMInv:Hierarchy" type=VirtualMachine
    | dedup moid 
    | eval esxi_moid = 'changeSet.runtime.host.moid' 
    | table moid esxi_moid] 

Obtain the MOIDs of the ESXi hosts on which each virtual machine is running. Append the returned fields to the primary search.


 

|eventstats values(esxi_moid) AS esxi_moid BY moid

Add the esxi_moid field to all virtual machine Sum Ready results. 

|search is_high_sumready=*

Return all results where the average is above 500ms

| table name, moid,esxi_moid, avg.milliseconds.vsphere.vm.cpu.ready, is_high_sumready  Display the results in a table with columns in the order shown.

|stats count AS count_vms_by_host sum(is_high_sumready) AS count_high_sumready_by_host avg(avg_p_summation_cpu_ready_millisecond) AS avg_sumready_by_host BY esxi_moid

Sum the total number of virtual machines with high CPU Ready summation value. Obtain the average Sum Ready for all virtual machines per each ESXi host. 

|eval perc_vms_high_sumready_by_host = round(count_high_sumready_by_host/count_vms_by_host*100, 2)

Determine the percentage of virtual machines on the host with high Sum Ready.

|sort - perc_vms_high_sumready_by_host

Sort results with the largest percent first.

Result

The table below shows sample results for the search. High values of perc_vms_high_sumready_by_host mean that the VM has work to do that is waiting to be scheduled. Generally, some factors to investigate are oversubscription where too many vCPUs have been allocated from the physical pCPUs. Another is when there are a lot of smaller VMS alongside a larger one. Often the larger VM will be waiting for the scheduler to preempt enough vPCU’s from the small VMs to be able to run. A third factor can be limits. 

If you find a host that is experiencing the high values mentioned above, you can balance the workload across the cluster or look for oversubscription.

esxi_moid count_vms_by_host count_high_
sumready_by_host
avg_sumready_by_host perc_vms_high_
sumready_by_host

host-20

18

14

510.4236111

77.78

host-10

17

11

466.8455882

64.71

host-11

15

0

357.2583333

0

host-26

3

0

6

0

Option 2

  1. Ensure that you have the Splunk OTEL Collector installed on the host you want to monitor. After installation, complete the following additional steps:
    1. Configure the receiver in agent_config.yaml. Note the extra groups.
      receivers:
        smartagent/vsphere:
          type: vsphere
          host: "vcenterexample.local"
          username: "administrator"
          password: "mypassword"
          insecureSkipVerify: true
          extraGroups:
               - cpu
               - mem
      
    2. Configure the service pipeline in agent_config.yaml (smartagent/vsphere).
      service:
        ...
           metrics:
              receivers: [hostmetrics, otlp, signalfx, smartagent/signalfx-forwarder, smartagent/vsphere]
      
    3. Restart the agent after configuration changes:
       systemctl restart splunk-otel-collector
  2. In Splunk Infrastructure Monitoring, use the following SignalFlow to search the vsphere.cpu_ready_ms streaming metric and calculate the mean by ESXi host.
     A = data('vsphere.cpu_ready_ms').mean(by=['esx_ip']).publish(label='A')

Result

To alert when CPU Ready Time suddenly changes on an ESXi host, you can use the SignalFlow from this procedure to configure a detector with the following configurations:

  • Alert condition: Sudden change
  • Alert when: Too high
  • Trigger sensitivity: Medium
  • Was this article helpful?