ESXi hosts with sustained high ballooning
When an ESXi host is running low on available physical memory, it attempts to reclaim memory from one or more virtual machines through a process called ballooning.
While some ballooning on your ESXi hosts is normal, frequent and sustained ballooning is a sign that the host is experiencing memory pressure. This situation causes performance degradations to the virtual machines assigned to the host. You want to monitor your network for this situation so you can take corrective action as needed.
Data required
- VMware. This procedure depends on data primarily obtained from the Splunk Add-on for VMware Metrics; however, log and event data from the VMWare environment can also provide additional insights into general VMWare environment health. Therefore, for best performance, you should also download and install Splunk Add-on for VMware ESXi Logs and Splunk Add-on for vCenter Logs.
Procedure
- Ensure that you have installed the IT Essentials Work app to onboard VMware data and provide the various VMware entity type configurations and dashboards.
- Ensure that you are collecting VMware data through one or more Data Collection Nodes, which are essentially Splunk heavy forwarders with specific VMware collection configurations.
- Run the following search. You can optimize it by specifying an index and adjusting the time range.
| mstats max(vsphere.esxihost.mem.vmmemctl) AS max.vsphere.esxihost.mem.vmmemctl WHERE (index=vmware-perf-metrics) AND name=* AND cluster_name=* AND vcenter=* AND sourcetype=vmware_inframon:perf:mem BY name moid | eval is_high_ballooning = if('max.vsphere.esxihost.mem.vmmemctl' > 10, 1, 0) | eventstats mean(max.vsphere.esxihost.mem.vmmemctl) AS mean_host_population stdev(max.vsphere.esxihost.mem.vmmemctl) AS stdev_host_population | eval stdev_from_host_population = ('max.vsphere.host.mem.vmmemctl'-'mean_host_population') / stdev_host_population | sort - stdev_from_host_population
Search explanation
The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.
Splunk Search | Explanation |
---|---|
| mstats max(vsphere.esxihost.mem.vmmemctl) AS max.vsphere.esxihost.mem.vmmemctl WHERE (index=vmware-perf-metrics) AND name=* AND cluster_name=* AND vcenter=* AND sourcetype=vmware_inframon:perf:mem BY name moid |
Calculate maximum of the average amount of memory reclaimed by the vmmemctl memory balloon driver, in kilobytes, for each host managed object ID (MOID). |
| eval is_high_ballooning = if('max.vsphere.esxihost.mem.vmmemctl' > 10, 1, 0) |
Create the is_high_ballooning field for results where the driver has reclaimed more than 10 kilobytes. |
| eventstats mean(max.vsphere.esxihost.mem.vmmemctl) AS mean_host_population stdev(max.vsphere.esxihost.mem.vmmemctl) AS stdev_host_population |
Calculate the mean memctl reclaimed and the standard deviation. |
| eval stdev_from_host_population = ('max.vsphere.host.mem.vmmemctl'-'mean_host_population') / stdev_host_population |
Calculate how many standard deviations away each MOID is from the average amount of memory reclaimed on all hosts and put the result in a field called stdev_from_host_population . |
| sort - stdev_from_host_population |
Sort results with the largest standard deviation first. |
Next steps
Sample results for this search are shown in the table below. Ballooning is one of the techniques used to reclaim memory and facilitates the guest OS to release memory for reclamation. The High Ballooning value is Yes or No based on the threshold set above. The statistical values show how each host is doing with memory pressure. These results help you determine which hosts have sustained high ballooning and which hosts do not. You can also select a candidate to balance load based on the mean and standard deviation of the hosts that are not ballooning. For example, based on the sample data, you might move load from host-26 to host-11 or host-20.
moid |
Avg Memctl(KB) |
High Ballooning |
Mean of Host Pop. |
Stdev of Host Pop. |
Stdev Across Host Pop |
---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Finally, you might be interested in other processes associated with the Monitoring VMware virtual machine performance use case.