Monitoring VMware virtualization infrastructure
You work in the IT department for a large software development company that makes heavy use of virtual machines for testing during development.
It is crucial that the virtual machines that rely on the VMware ESXi hypervisor and vSphere remain available so that the software release cycle isn't disrupted. Part of your job is to monitor all VMware infrastructure and respond to any issues that might arise.
You can use the Splunk platform to monitor virtual machine and virtual machine host resource usage, watch for key events that might require troubleshooting, and obtain useful inventories of your VMware environment.
Data required
How to use Splunk software for this use case
You can run many searches with the Splunk platform software to monitor VMware virtual machine performance. Depending on what information you have available, you might find it useful to identify some or all of the following.
ESXi hosts
- ESXi hosts with sustained high ballooning
- ESXi hosts with sustained high swapping
- ESXi hosts with high CPU Ready summation value
- ESXi host version identification
vSphere
Other
Next steps
To maximize their benefit, the how-to articles linked in the previous section likely need to tie into existing processes at your organization or become new standard processes. These processes commonly impact success with this use case:
- Capacity planning, compute hardware, storage, and network
- Native monitoring tools integrated with the Splunk platform for cross-domain visibility
- Tooling for software provisioning and configuration management
- Backups, security, and compliance
Measuring impact and benefit is critical to assessing the value of IT operations. The following are example metrics that can be useful to monitor when implementing this use case:
- Mean time to problem resolution
- Mean time to root cause analysis
- Reduction in system degradation, such as underperformance or unplanned downtime
The content in this article comes from the free IT Essentials Work (ITE) application. ITE helps you correlate logs and metrics for each entity, and then use that information to observe and understand the performance of your infrastructure. The app helps you get started monitoring and analyzing essential IT infrastructures with out-of-the-box dashboards and pre-configured performance metrics. In addition, these Splunk resources might help you understand and implement this use case: