Scenario: Applications and services that run or support the business can experience errors during runtime. These errors are called exceptions and are often handled by the application, but sometimes they are not. If they are serious enough, the application will stop running and, generally, dump a stack trace into the logs as it exits. A stack trace reveals many details about the error and shows the path the error took as it propagated through the various functions in the code stack. This is a rich source of application troubleshooting information. You want to develop some useful searches based on stack traces that you can use to investigate errors as needed.
How Splunk software can help
You can use Splunk software to identify the presence of stack traces in logs. These basic searches will find stack traces and give helpful information about the traces, such as the programs they came from, the time they occurred, and any trends with respect to time and frequency.
What you need
To succeed in implementing this use case, you need the following dependencies, resources, and information.
People
The best person to implement this use case is a site reliability engineer. This person might come from your team, a Splunk partner, or Splunk OnDemand Services.
Time
Detecting application errors using Splunk software can last up to a couple of hours. This includes data on-boarding and installation of the necessary add-ons.
Technologies
The following technologies, data, and integrations are useful in successfully implementing this use case:
- Splunk Enterprise or Splunk Cloud
- Data sources onboarded
How to use Splunk software for this use case
You can run many searches with Splunk software to monitor Windows account access. Depending on what information you have available, you might find it useful to identify some or all of the following:
- Trends in exceptions and stack traces
- First time seen stack trace
- Trends in application errors over time
Other steps you can take
To maximize their benefit, the how-to articles linked in the previous section likely need to tie into existing processes at your organization or become new standard processes. These processes commonly impact success with this use case:
- Application development process
- Dev/Ops site reliability workflows
Related resources
This use case is also included in the IT Essentials Learn app, which provides more information about how to implement the use case successfully in your IT maturity journey. In addition, these Splunk resources might help you understand and implement this use case:
- Conf talk: Zipkin and Splunk: Tracing transactions across your ecosystem
- Conf talk: How Kronos Consolidated Logging and Infrastructure Monitoring with Splunk
- Blog: Application performance redefined: Meet the new SignalFx Microservices APM
- Add-on: Splunk Add-on for IBM WebSphere Application Server
- Add-on: Splunk Add-on for Tomcat
How to assess your results
Measuring impact and benefit is critical to assessing the value of IT operations. The following are example metrics that can be useful to monitor when implementing this use case:
- Error rates by software version
- Error rate trends over time
- Reduction in outage or slowdowns identified by users
Comments
0 comments
Please sign in to leave a comment.