Skip to main content
Splunk Lantern

ESXi hosts with sustained high ballooning

When an ESXi host is running low on available physical memory, it attempts to reclaim memory from one or more virtual machines through a process called ballooning. You might need to monitor ESXi hosts for sustained high ballooning when doing the following:

Prerequisites 

In order to execute this procedure in your environment, the following data, services, or apps are required:

Example

While some ballooning on your ESXi hosts is normal, frequent and sustained ballooning is a sign that the host is experiencing memory pressure. This situation causes performance degradations to the virtual machines assigned to the host. You want to monitor your network for this situation so you can take corrective action as needed.

NOTE: To optimize the search shown below, you should specify an index and a time range.

  1. Run the following search: 
sourcetype="vmware:perf:mem" source="VMPerf:HostSystem" 
|stats max(p_average_mem_vmmemctl_kiloBytes) AS max_p_average_mem_vmmemctl_kiloBytes BY moid 
|eval is_high_ballooning = if(max_p_average_mem_vmmemctl_kiloBytes > 10, "Yes", "No") 
|eventstats mean(max_p_average_mem_vmmemctl_kiloBytes) AS mean_host_population stdev(max_p_average_mem_vmmemctl_kiloBytes) AS stdev_host_population 
|eval stdev_from_host_population = (max_p_average_mem_vmmemctl_kiloBytes - mean_host_population) / stdev_host_population 
|sort - stdev_from_host_population 
|table moid max_p_average_mem_vmmemctl_kiloBytes is_high_ballooning mean_host_population stdev_host_population stdev_from_host_population   
|rename max_p_average_mem_vmmemctl_kiloBytes AS "Avg Memctl(KB)" is_high_ballooning AS "High Ballooning"  mean_host_population AS "Mean of Host Pop." stdev_host_population AS "Stdev of Host Pop." stdev_from_host_population AS "Stdev Across Host Pop"

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation

sourcetype="vmware:perf:mem" source="VMPerf:HostSystem" 

Search only VMware performance memory data and limit the search to the host system.

|stats max(p_average_mem_vmmemctl_kiloBytes) AS max_p_average_mem_vmmemctl_kiloBytes BY moid

Calculate maximum of the average amount of memory reclaimed by the vmmemctl memory balloon driver, in kilobytes, for each host managed object ID (MOID).

|eval is_high_ballooning = if(max_p_average_mem_vmmemctl_kiloBytes > 10, “Yes”, “No”)

Create the is_high_ballooning field for results where the driver has reclaimed more than 10 kilobytes. 

|eventstats mean(max_p_average_mem_vmmemctl_kiloBytes) AS mean_host_population stdev(max_p_average_mem_vmmemctl_kiloBytes) AS stdev_host_population

Calculate the mean memctl reclaimed and the standard deviation. 

|eval stdev_from_host_population = (max_p_average_mem_vmmemctl_kiloBytes - mean_host_population) / stdev_host_population

Calculate how many standard deviations away each MOID is from the average amount of memory reclaimed on all hosts and put the result in a field called stdev_from_host_population.

|sort - stdev_from_host_population

Sort results with the largest standard deviation first.

|table moid max_p_average_mem_vmmemctl_kiloBytes is_high_ballooning stdev_from_host_population mean_host_population stdev_host_population

Display the results in a table with columns in the order shown.

|rename max_p_average_mem_vmmemctl_kiloBytes AS "Avg Memctl(KB)" is_high_ballooning AS "High Ballooning"  mean_host_population AS "Mean of Host Pop." stdev_host_population AS "Stdev of Host Pop." stdev_from_host_population AS "Stdev Across Host Pop"

Rename the fields as shown for better readability. 

Result

Sample results for this search are shown in the table below. Ballooning is one of the techniques used to reclaim memory and facilitates the guest OS to release memory for reclamation. The High Ballooning value is Yes or No based on the threshold set above. The statistical values show how each host is doing with memory pressure. These results help you determine which hosts have sustained high ballooning and which hosts do not. You can also select a candidate to balance load based on the mean and standard deviation of the hosts that are not ballooning. For example, based on the sample data, you might move load from host-26 to host-11 or host-20. 

moid Avg Memctl(KB) High Ballooning Mean of Host Pop. Stdev of Host Pop. Stdev Across Host Pop

host-26

22

Yes

15.25

6.185

1.091

host-10

19

Yes

15.25

6.185

0.606

host-11

10

No

15.25

6.185

-0.849

host-20

10

No

15.25

6.185

-0.849

  • Was this article helpful?