Skip to main content

 

Splunk Lantern

Optimizing application, service and memory usage with AlwaysOn Profiling for Splunk APM

 

AlwaysOn Profiling for Splunk APM can help you optimize application and service performance (CPU profiling) and memory usage optimization (memory profiling). AlwaysOn Profiling works by periodically taking call stacks from a runtime environment for analysis. Splunk APM provides the workflow and visualizations so you can quickly isolate and remediate service performance bottlenecks or high memory utilization. This drives service quality improvements back to the business and increases developer productivity.

The primary user persona for profiling is the service developer responsible for writing new code, troubleshooting latency, and optimizing performance and resource consumption of their code. However, other personas, such as SREs, can use the Splunk APM profiling visualizations to identify operational degradations that they can share with developers. Developers can then explore opportunities to optimize their code to improve application performance and availability.

There are two ways that AlwaysOn Profiling for Splunk APM can be used in your enterprise to improve service performance and resource usage efficiencies:

This article is part of the Splunk Use Case Explorer for Observability, which is designed to help you identify and implement prescriptive use cases that drive incremental business value. It explains the solution using a fictitious example company, called CSCorp, that hosts a cloud native application called Online Boutique. In the AIOps lifecycle described in the Use Case Explorer, this article is part of Application monitoring.


Scenario 1: Application performance degradation

This scenario uses AlwaysOn Profiling for Splunk APM initiated from a service or Splunk APM application performance detector alert. It walks through a situation that fictional company CSCorp is experiencing.

CSCorp wants to improve the overall quality of services delivered by optimizing service code performance and resource consumption. They want to be more proactive and introduce process and governance into their DevOps SDL (Software Development Lifecycle) to support this initiative.

The Online Boutique application consumes many microservices to run the online sales portal. CSCorp’s DevOps team has introduced a process where SREs can identify opportunities for code optimization. CSCorp employees have started noticing that a critical service called ‘adservice’ has experienced some performance degradations that are adversely impacting customer experience on the Online Boutique portal. They also see some trending performance degradations over time that are not impacting customers currently and are not breaching their SLA, but the SRE believes proactive code optimization might be beneficial. 

Overall, CSCorp's goals are:

  • For SREs to easily identify possible code-based performance bottlenecks, and submit a code optimization ticket with an attached flame graph call stack export to be used by a developer to hone in on bottlenecks in code.
  • For developers to effectively and efficiently use outputs from flame graph analysis to navigate and identify code inefficiencies, consequently leading to code optimization, improved performance, and developer productivity.

Solution

Watch how to use AlwaysOn Profiling for Splunk APM to improve application performance initiated from an APM alert in this video. You can also view the slide deck presented in the video.

‘adservice’ microservices are critical services at CSCorp. ‘Frontend’ service is the main entrypoint gateway for the OnlineBoutique app.

Scenario 2: Infrastructure resource usage

This scenario uses APM AlwaysOn Profiling initiated from an infrastructure detector alert in Splunk Infrastructure Monitoring to identify a probable cause for the alert firing, such as potential code inefficiencies or bottlenecks in code. The scenario walks through a situation that fictional company CSCorp is experiencing.

CSCorp wants to improve the overall quality of services delivered by optimizing service code performance and resource consumption. They want to be more proactive in how they monitor, observe, and alert on their mission critical infrastructure, but they want to understand quickly code inefficiencies are occurring.

The Online Boutique application consumes many microservices to run the online sales portal for CSCorp. The operations team has introduced a new process to identify opportunities for code optimization directly from infrastructure alerts. CSCorp has started noticing a Kubernetes pod hosting a microservice workload called ‘adservice’ that is breaching a historical anomaly detector for CPU and memory. They want to be able to investigate and determine if probable cause of the alert firing is inefficient code in the ‘adservice’, or if a scale-out is necessary.

Overall, CSCorp's goals are:

  • For the CloudOps support team to quickly determine if an infrastructure computational resource shortage might be caused by a code bottleneck.
  • For developers to effectively and efficiently use outputs from flame graph analysis to navigate and identify code inefficiencies, consequently leading to code optimization, improved performance, and developer productivity.

Solution

Watch how to use AlwaysOn Profiling for Splunk APM to improve application performance initiated from a Splunk Infrastructure Monitoring alert in this video. You can also view the slide deck presented in the video.

This video shows how Splunk Observability Cloud provides auto discovery and connected content called Related Content that interrelates the infrastructure with the connected application stack. This connection provides bi-directional navigation and visibility in the context of time and service. The video explores how a CloudOps engineer, SRE, or developer can easily navigate from an infrastructure degradation detector alert (such as high CPU or memory on a Kubernetes container) to a Splunk APM service map in the context of the application service impacted. Finally, the video examines Splunk APM traces and AlwaysOn Profiling to determine if there is a code bottleneck.  

‘adservice’ microservices are critical services at CSCorp. ‘Frontend’ service is the main entrypoint gateway for the OnlineBoutique app.

Next steps

These resources might help you understand and implement this guidance:

Still need help with this use case? Most customers have OnDemand Services per their license support plan. Engage the ODS team at OnDemand-Inquires@splunk.com if you require assistance.