When an ESXi host is running low on available physical memory, it attempts to reclaim memory from one or more virtual machines through a process called ballooning. You might need to monitor ESXi hosts for sustained high ballooning when doing the following:
Prerequisites
In order to execute this procedure in your environment, the following data, services, or apps are required:
Example
While some ballooning on your ESXi hosts is normal, frequent and sustained ballooning is a sign that the host is experiencing memory pressure. This situation causes performance degradations to the virtual machines assigned to the host. You want to monitor your network for this situation so you can take corrective action as needed.
NOTE: To optimize the search shown below, you should specify an index and a time range.
- Run the following search:
sourcetype="vmware:perf:mem" source="VMPerf:HostSystem"
|stats max(p_average_mem_vmmemctl_kiloBytes) AS max_p_average_mem_vmmemctl_kiloBytes BY moid
|eval is_high_ballooning = if(max_p_average_mem_vmmemctl_kiloBytes > 10, "Yes", "No")
|eventstats mean(max_p_average_mem_vmmemctl_kiloBytes) AS mean_host_population stdev(max_p_average_mem_vmmemctl_kiloBytes) AS stdev_host_population
|eval stdev_from_host_population = (max_p_average_mem_vmmemctl_kiloBytes - mean_host_population) / stdev_host_population
|sort - stdev_from_host_population
|table moid max_p_average_mem_vmmemctl_kiloBytes is_high_ballooning mean_host_population stdev_host_population stdev_from_host_population
|rename max_p_average_mem_vmmemctl_kiloBytes AS "Avg Memctl(KB)" is_high_ballooning AS "High Ballooning" mean_host_population AS "Mean of Host Pop." stdev_host_population AS "Stdev of Host Pop." stdev_from_host_population AS "Stdev Across Host Pop"
Search explanation
The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.
Splunk Search |
Explanation |
sourcetype="vmware:perf:mem" source="VMPerf:HostSystem" |
Search only VMware performance memory data and limit the search to the host system. |
|stats max(p_average_mem_vmmemctl_kiloBytes) AS max_p_average_mem_vmmemctl_kiloBytes BY moid |
Calculate maximum of the average amount of memory reclaimed by the vmmemctl memory balloon driver, in kilobytes, for each host managed object ID (MOID). |
|eval is_high_ballooning = if(max_p_average_mem_vmmemctl_kiloBytes > 10, “Yes”, “No”) |
Create the is_high_ballooning field for results where the driver has reclaimed more than 10 kilobytes. |
|eventstats mean(max_p_average_mem_vmmemctl_kiloBytes) AS mean_host_population stdev(max_p_average_mem_vmmemctl_kiloBytes) AS stdev_host_population |
Calculate the mean memctl reclaimed and the standard deviation. |
|eval stdev_from_host_population = (max_p_average_mem_vmmemctl_kiloBytes - mean_host_population) / stdev_host_population |
Calculate how many standard deviations away each MOID is from the average amount of memory reclaimed on all hosts and put the result in a field called stdev_from_host_population. |
|sort - stdev_from_host_population |
Sort results with the largest standard deviation first. |
|table moid max_p_average_mem_vmmemctl_kiloBytes is_high_ballooning stdev_from_host_population mean_host_population stdev_host_population |
Display the results in a table with columns in the order shown. |
|rename max_p_average_mem_vmmemctl_kiloBytes AS "Avg Memctl(KB)" is_high_ballooning AS "High Ballooning" mean_host_population AS "Mean of Host Pop." stdev_host_population AS "Stdev of Host Pop." stdev_from_host_population AS "Stdev Across Host Pop" |
Rename the fields as shown for better readability. |
Result
Sample results for this search are shown in the table below. Ballooning is one of the techniques used to reclaim memory and facilitates the guest OS to release memory for reclamation. The High Ballooning value is Yes or No based on the threshold set above. The statistical values show how each host is doing with memory pressure. These results help you determine which hosts have sustained high ballooning and which hosts do not. You can also select a candidate to balance load based on the mean and standard deviation of the hosts that are not ballooning. For example, based on the sample data, you might move load from host-26 to host-11 or host-20.
moid |
Avg Memctl(KB) |
High Ballooning |
Mean of Host Pop. |
Stdev of Host Pop. |
Stdev Across Host Pop |
host-26 |
22 |
Yes |
15.25 |
6.185 |
1.091 |
host-10 |
19 |
Yes |
15.25 |
6.185 |
0.606 |
host-11 |
10 |
No |
15.25 |
6.185 |
-0.849 |
host-20 |
10 |
No |
15.25 |
6.185 |
-0.849 |
Comments
0 comments
Please sign in to leave a comment.