Skip to main content

 

Splunk Lantern

Infrastructure monitoring overview

 

Splunk Infrastructure Monitoring (IM) is a product in the Splunk Observability Cloud platform that monitors and observes system metrics for physical and virtual components across enterprise hybrid and multi-cloud environments. Splunk Infrastructure Monitoring offers support for a broad range of integrations for collecting full-fidelity data, from system metrics for infrastructure components to custom data from your applications.

The infrastructure tier serves as the foundation of the Observability Full Stack. The key point is that the infrastructure tier is not monitored in a classic operational silo; it is a tier that is automatically connected to the other tiers in the Observability Full Stack in the context of metrics, logs, and traces, so that you have full observability and context.

Splunk Observability Cloud does this by providing auto-discovered and connected content called Related Content that interrelates the other stack tiers, such as the application tier that uses metrics, traces, and logs to accelerate MTTD and MTTR and service availability improvements, resulting in incremental business and customer value. 

Example

Note the IM diagrams below where a Kubernetes (k8s) cluster is being monitored and a specific Kubernetes pod is selected which presents the workload service, in this case a payment service. The tabs along the bottom are where the APM Service map supporting the service is automatically referenced and also where the logs generated by the service or workload running in the Kubernetes pod are referenced in context.   

When you select the paymentservice tab in the map, the Splunk APM navigator appears in the interface with the paymentservice context.

In addition, you can quickly reference specific logs for a Kubernetes pod in the context of a workload (here, payment service) by clicking the Logs for K8s pod paymentservice tab. You can configure this functionality using Splunk Log Observer or Splunk Log Observer Connect where Splunk indexes can be searched directly in context.

In summary, this connected experience provides the SRE, DevOps, and ITOps teams with the necessary tools to accelerate root cause analysis and get services quickly back to normal resulting in high levels of service quality to the customer.  

The guides in the next section use a fictitious company called CSCorp that hosts a cloud native application called Online Boutique deployed in a Kubernetes Cluster. The guides show how the different focal areas in the Full Observability Stack can be consolidated. The pink arrows in the solution flow diagram below point to the areas these guides focus on. Also, note the pink dotted lines, which represent an AIOps operational flow, that harvests the insights from the Observe stage (metrics, traces, logs, alerts), correlates and notifies (the Engage stage) the insight, and acts (the Act stage) upon the insight in the context of a service or application.

customer-uce-observe.jpg

Infrastructure monitoring prescriptive outcome guides

Now use the following guides to build out the monitoring of your infrastructure tier in the Full Observability Stack. This will support the observability of your infrastructure, applications, and workloads. This section is updated regularly, so check back often.

Container orchestration platform monitoring

Other use cases to consider

Use cases are specific to each organization, so also consider these thought starters as a way to help with your ideation of new use cases. Be sure to monitor availability, performance, capacity, and error conditions for any new components. 

  • Hosted services monitoring
  • Network monitoring
  • Serverless function monitoring
  • Middleware monitoring
  • Virtualization platform monitoring
  • Storage monitoring
  • Identity and access management (IAM) monitoring

What to do if you get stuck 

Still having trouble? Splunk has many resources available to help get you back on track.

Next steps  

Now you're doing more with your data, find out how to get data in from additional data sources or try some of these use cases:

Server and operating system monitoring

Event streaming platform monitoring

Cloud platform monitoring