Scenario: You work in the IT department for a large software development company that makes heavy use of virtual machines for testing during development. It is crucial that the virtual machines that rely on the VMware ESXi hypervisor and vSphere remain available so that the software release cycle isn't disrupted. Part of your job is to monitor all VMware infrastructure and respond to any issues that might arise.
How Splunk software can help
You can use Splunk software to monitor virtual machine and virtual machine host resource usage, watch for key events that might require troubleshooting, and obtain useful inventories of your VMware environment.
What you need
To succeed in implementing this use case, you need the following dependencies, resources, and information.
The best person to implement this use case is a system administrator who is familiar with virtualization and vmware in particular. This person might come from your team, a Splunk partner, or Splunk OnDemand Services.
Monitoring VMware virtualization infrastructure using Splunk software can last up to a day or more to on board the data.
The following technologies, data, and integrations are useful in successfully implementing this use case:
- Splunk Enterprise or Splunk Cloud
- Data sources onboarded
- vCenter logs
- ESXi host logs
- API data per host
- API data per virtual machine
- Virtualization data
- Splunk Add-on for VMware
How to use Splunk software for this use case
You can run many searches with Splunk software to monitor VMware virtual machine performance. Depending on what information you have available, you might find it useful to identify some or all of the following:
- ESXi hosts with sustained high ballooning
- ESXi hosts with sustained high swapping
- Recently triggered vSphere alarms
- Virtual machines currently running on ESXi host
- VMotion events for a specific virtual machine
- Virtual machines with high CPU Ready summation value
- ESXi hosts with high CPU Ready summation value
- VMware datastores with highest utilization
- Virtual machines with large file size utilization
Other steps you can take
To maximize their benefit, the how-to articles linked in the previous section likely need to tie into existing processes at your organization or become new standard processes. These processes commonly impact success with this use case:
- Capacity planning, compute hardware, storage, and network
- Native monitoring tools integrated with Splunk for cross-domain visibility
- Tooling for software provisioning and configuration management
- Backups, security, and compliance
These additional Splunk resources might help you understand and implement this use case:
- Blog: New VMware vSphere and multi-cloud Monitoring (beta) with Splunk App for Infrastructure 2.0
- Blog: Splunk Stream in VMware environments
- Blog: VM Monitoring and capacity management
- Blog: What can you do with Splunk, VMware and AWS?
- Conf Talk: Monitoring your VMWare vSphere environment with Splunk
How to assess your results
Measuring impact and benefit is critical to assessing the value of IT operations. The following are example metrics that can be useful to monitor when implementing this use case:
- Mean time to problem resolution
- Mean time to root cause analysis
- Reduction in system degradation, such as underperformance or unplanned downtime