Skip to main content

 

Splunk Lantern

Application monitoring overview

 

Splunk APM is a NoSample™, full-fidelity, application performance monitoring and troubleshooting solution for cloud-native, microservices-based applications. Splunk APM insights and observations provide the ability to quickly identify root causes and drive improvements of release frequencies, MTTD, MTTR, and service availability, resulting in incremental business and customer value.  

Let’s talk about how to monitor and observe your cloud native applications. We recommend the following process for your first Splunk APM implementation.

  1. Identify a good cloud native microservice-based application candidate within your company.  Use this to validate your implementation and to set appropriate enterprise wide standards for future application onboarding. 
    1. Select an application of low to medium complexity with clear problem, challenge, or situation statements. This assures value realization as you move forward on your APM journey with Splunk.   
    2. Make sure you agree on a measurable objective in support of the goal. This makes sure you can measure success and evangelize within our company. Some examples are:
      • Improve deployment frequency from every 3 days to every 2 days.
      • Improve application availability from 96 to 98.5%.
      • Improve customer order drop-out rates from 15% to 5%. 
  2. Design and instrument your application according to how you plan to monitor, investigate and diagnose issues to drive the goals and objectives in step 1. Well thought-out metrics, dimensions, tags, properties, and business workflows associated with your traces are critical to an optimized solution. Also, defining and adhering to enterprise standards, such as tag definitions and naming conventions with appropriate governance, is critical to success. 
  3. Operate. Make sure you have good operational alerts (detectors) that use proactive monitoring and data analytics to notify and engage operations according to priority and severity. This should be tightly coupled with your company's existing DevOps processes, event management, incident, change, and problem management processes.   These interlocks provide improved visibility and continuity.
  4. Improve. Your solution always requires attention to sustain and provide incremental value back to the business. Formalizing the process in step 3, specifically problem management, assures that the APM instrumentation remains relevant and drives operational excellence. Some examples are:
    • Add additional detectors or improve existing ones.
    • Create additional tags to segment data from scans and traces.
    • Create additional business workflows to interconnect business KPIs with critical business transactions.
  5. Stakeholders. Always make sure you have designated stakeholders that cross development, operations, and shared infrastructure process boundaries. They help keep the project on course and assist you in addressing any bottlenecks in achieving initiative goals and objectives.

The guides in the next section use a fictitious company called CS Corp that hosts a cloud native application called Online Boutique deployed in a Kubernetes cluster. The pink arrows in the solution flow diagram below point to the areas these guides focus on. Also, note the pink dotted lines, which represent an AIOps operational flow that harvests the insights from the Observe stage (metrics, traces, logs, and alerts), correlates and notifies (the Engage stage) the insight, and acts (the Act stage) upon the insight in the context of a service or application.

customer-uce-observe.jpg

Application monitoring prescriptive outcome guides

Now you have your selected, instrumented, and deployed your application, and metrics on spans and traces are flowing into Splunk Observability Cloud, it's time to get to value-add outcomes through prescriptive solutions. As part of your DevOps lifecycle, you use canary deployment methodology and introduce new versions of microservice code releases on a daily basis. Some questions you might want to answer are: 

  • As an SRE, how can I accelerate the ability to identify application degradations caused by microservice releases introduced downstream when we have hundreds of microservices being updated by multiple development teams?
  • As a service developer, how can I accelerate my ability to quickly identify degradations (MTTD) for my service release deployments; understand what business applications and workflows are being impacted; and quickly restore (MTTR) to minimize risks to the business?

GuideOptimizing APM operations using custom MetricSets

GuideOptimizing application, service and memory usage with AlwaysOn Profiling for Splunk APM

Guide: Monitoring AWS Lambda functions

Other use cases to consider

Use cases are specific to each organization, so also consider these thought starters as a way to help with your ideation of new use cases. 

  • Reduce application release cycle time

What to do if you get stuck 

Still having trouble? Splunk has many resources available to help get you back on track.

Next steps 

Now you're doing more with your data, get even more value through implementing additional use cases.