Skip to main content
 
 
 
Splunk Lantern

Troubleshooting application issues

 

Now that we have metrics, logs, and traces in Splunk Observability Cloud, let’s explore how we can troubleshoot application issues. To do this, this article will demonstrate three processes:

This article assumes you have configured Splunk Log Observer Connect to send logs from Splunk Cloud Platform to Splunk Observability Cloud. If you don't have that integration set up, follow the steps in this article first.

Enable feature flags to generate errors

  1. First, we’ll run the following command to make the feature flag service endpoint available on our local machine.
    kubectl port-forward svc/opentelemetry-demo-frontendproxy 8080:8080 -n otel-demo
    
  2. Then we can connect to the feature flag service by navigating to the following URL with our browser: http://localhost:8080/feature/
    clipboard_ea5072d4ef13957b95d990a54704fad45.png
  3. Next, let’s enable both the productCatalogFailure and adServiceFailure feature flags.
    clipboard_eef4ceec546f86dee35d44d119471877e.png clipboard_e03cee2e4a1da7ce5c86424818be9160a.png

The final result should look like this, with Enabled set to true on the feature flags of interest.

clipboard_ece9f4b6609c7dd0f8a1d403fb193f76b.png

Investigate application errors using traces and logs

We know that the ad service is experiencing errors in our demo application since we enabled that feature flag. To investigate what might be causing these errors, let’s start by using Trace Analyzer to find traces involving ad service that have errors.

clipboard_e1f4574471b3aaa5702d73c819049559e.png

Let’s click the Trace ID to drill into the trace and take a closer look. The waterfall view of the trace shows us which components are taking the most time. We can see that an error occurs when the frontend service calls adservice, and we can click on the span for further details. And at the bottom of the trace, there’s a button that we can use to find log events that are related to this particular trace.

clipboard_e6667ec2b98603d5baaccade4d52d7474.png

Click the button to go to Log Observer, which is pre-filtered on the trace_id of interest.

clipboard_e43c7d8e2fddf89006f8d2c30087fdfa3.png

We can select a log event to drill into it for further details. The log event details show that GetAds failed with a status code of “RESOURCE_EXHAUSTED”.

clipboard_e58de1d215cd91ae127fa57faf75e0884.png

Scrolling down further on the list of fields, we can see that the trace_id, span_id, and trace_flags attributes were included with this log event. The presence of these attributes allows Splunk Observability Cloud to correlate traces with logs.

clipboard_e47886793c525bb7a1e5ff294834bf69f.png

Use Related Content to jump between logs, traces, and infrastructure data

While the first troubleshooting scenario started by looking at traces, we could also start troubleshooting by looking at logs. For example, let’s use Splunk Log Observer to search for all log events associated with the recommendation service.

clipboard_edee17ffc8ef8e5b7ac62a38490526976.png

Click one of the log entries to see further details.

clipboard_e6d914e57e1e2061ed6e4c93ce55ba908.png

At the bottom of the screen, we can see three buttons that take us to Related Content. Specifically, the buttons link to:

  • The service map for the recommendation service.
  • The trace associated with this particular log entry.
  • The Kubernetes pod this instance of recommendation service runs on.

Click the first button to go to the service map for the recommendation service, which has its own set of related content at the bottom of the screen.

clipboard_e23e891c40093abd3c4e6cf875344621f.png

With metric, log, and trace data flowing to Splunk Observability Cloud, we can leverage the Related Content bar to seamlessly navigate from one view to another. Maintaining context as we jump from one signal to another allows us to troubleshoot issues more quickly.

Cleanup

If you want, you can clean up the application deployment by running the following commands:

kubectl delete --namespace otel-demo -f ./splunk/opentelemetry-demo.yaml
helm uninstall <helm release name>

Summary

In this article, we demonstrated how correlated log, trace, and metric data in Splunk Observability Cloud can be used to rapidly troubleshoot application issues. Looking for more useful advice on using Splunk Observability Cloud? See all our product tips

Splunk OnDemand Services: Use these credit-based services for direct access to Splunk technical consultants with a variety of technical services from a pre-defined catalog. Most customers have OnDemand Services per their license support plan. Engage the ODS team at ondemand@splunk.com if you would like assistance.