Skip to main content
 
 
 
Splunk Lantern

Detect Kubernetes nodes running out of resources or pods that are in a pending phase

 

You work for a large organization that uses the Kubernetes container management platform. However, you are often finding your nodes are running out of resources, rendering your applications unable to scale. You need to carefully monitor your Kubernetes nodes to ensure you can take action quickly when this happens.

Data required

Kubernetes data

How to use Splunk software for this use case

You can use Splunk Infrastructure Monitoring to create a detector that monitors for this critical situation.

  1. In Splunk Observability Cloud, click Alerts & Detectors in the left navigation pane. image7.png
  2. Click New Detector.image9.png
  3. Enter an appropriate name for the detector. The example below uses Pods in Pending Phase. Click Create Alert Rule to proceed. image3.png
  4. Click Infrastructure or Custom Metrics Alert Rule and then Proceed to Alert Signal. image10.png
  5. Enter the metric name. The example below shows the metric k8s.pod.phase with a preview of the metric values. Click Proceed to Alert Condition.

    image11.png

  6. Click Static Threshold so that the alert will trigger on a value of 1, then click Proceed to Alert Settings.
    clipboard_ed9cbcd274ce09924ab166186aef25570.png
  7. Enter the Alert settings for the conditions the alert will trigger on:
    1. Alert when: Select Within Range.
    2. Lower threshold: Enter .99 (so the condition is true when the metric value is 1.)
    3. Upper threshold: Enter 1.01.
    4. Trigger sensitivity: Set to duration.
    5. Duration: Set to 5m. Since pods take time to come up from pending, set the value to 5 minutes to avoid false positives due to system slowness. This means the condition must be true for 5 minutes before the alert will trigger.
    6. Auto-Clear alerts: Check the box and set to 5m. If the pod goes offline and the metric is no longer being recorded, the alert will clear.
    clipboard_e84bd4ace2d5a1fd265a4e1a619f89f76.png
  8. Click Proceed to Alert Message.
  9. Set the alert Severity. You can choose from Critical, Major, Minor, Warning, or Info, depending on how you perceive the alert severity.
    clipboard_e0f74083ef2da483124030619d0c4b15c.png
  10. (Optional) Enter a Runbook or dashboard URL, and enter a short tip for end users who might be troubleshooting the alert.
  11. (Optional) Click Customize to further customize the alert message.
  12. Click Proceed to Alert Recipients.
  13. Click Add Recipient to customize the recipients to send the alert message to. You can choose to add your own email address, any email, team, or webhook.

    clipboard_e6a7deee37351e67a97404f1cb0a30143.png

  14. Click Proceed to Alert Activation.
  15. Click Activate Alert Rule. image1.png

The appropriate recipients will now be notified when the alert rule conditions have been met.

image2.png

Next steps

These additional Splunk resources might help you understand and implement these recommendations: