Observability Type: Availability
You've got your Kubernetes data into Splunk Observability Cloud, and now you're not sure what to do next. There are seemingly limitless ways to use Splunk to achieve different use cases, and you need to start learning how to get value from the platform.
Some questions you might have about using Splunk Infrastructure Monitoring to monitor your Kubernetes environment include:
- How can I identify which pods are failing or stuck in a pending state?
- How can I ensure that the number of running instances matches what I expect?
- How do I know if appropriate resource limits have been applied, and if any pods are exceeding those limits?
How to use Splunk software for this use case
Splunk experts have recorded short, 5-minute videos on some high-value foundational use cases. Each video is self-contained, so you can pick the capabilities most relevant to you and your organization.
We recommend that new users complete at least 2-3 of these capabilities to get comfortable with the Splunk Observability Cloud platform and its basic functionality.
By completing these capabilities, you will learn by doing and acquire the following skills:
- Interpret the built-in dashboards / Infrastructure Navigator to understand your Kubernetes environment
- Build custom detectors / visualizations
- Set-up custom alerts and notifications
- Interpret detectors, visualizations, and dashboards to complete basic troubleshooting
Video 1 - Detect Kubernetes nodes running out of resources or pods that are in a pending phase
In organizations that use the Kubernetes container management platform, it is common for nodes to run out of resources, rendering your applications unable to scale. It is imperative that Kubernetes nodes are monitored carefully to ensure you can take action quickly when this happens. In this video, you'll learn how to create a detector in Splunk Infrastructure Monitoring that monitors for this situation.
Video 2 - How to monitor CPU utilization for no-limit pod configuration situations
In the first video, you configured your Splunk Infrastructure Monitoring detector for pods that are in a pending state due to running out of resources. In a situation when pod limits are not set, it can consume more CPU usage than intended. In this video, you'll learn how to monitor node CPU usage in that situation so you can prevent impact to your customers.
Video 3 - How to create alerting – Splunk Oncall, Email, Slack
When a pending state is detected, every second counts - you'll need to quickly alert the Kubernetes support staff who can resolve this incident. Each organization is unique in how it alerts its support team members, whether that happens over email, Slack, or Splunk OnCall. In this video, you'll learn how to configure alerting your staff for all of these platforms so your support team can take action straight away.
Video 4 - How to review pod status in the Kubernetes navigator: running vs desired # of pods, pods in pending status, failed pods
In this video for Kubernetes Administrators, you'll learn how to easily view pod status so that you can quickly and easily troubleshoot and reduce your Mean-Time-To-Repair (MTTR).
To fully unlock the power of Splunk, we strongly recommend our comprehensive Splunk training. At this stage in your journey, we recommend the following courses:
Need technical help? Explore our customer success resources to find education and training, engage experts through OnDemand services, view support options, and more.