Skip to main content
 
Splunk Lantern

Debug Problems in Microservices

 

Depending on the issue that you're debugging, if you're a developer building software in microservices, you need a combination of different types of application data, such as logs, metrics, traces, and profiles for your services. However, each data type needs its own purpose-built tool with its own data ingest, interface, and workflows, even if they are from the same vendor. This makes it difficult for you to put together a clear picture of the issue you're addressing. As a result, you might have a harder time debugging problems and often send the same log data to multiple vendors, increasing telemetry costs and toil.

A lengthy and complicated debugging process adds stress, takes away time that you could invest in developing new applications, and, as long as the issue is not resolved, can result in poor user experiences. All of this costs your organization money through losing customers, slower rollout of new revenue-generating features, and unneeded spending on sending the same log data twice. By consolidating with Splunk software, your engineering teams get everything they need to debug issues in microservices, and your organization gets closer to achieving its cost-reduction goals.

How can Splunk platform, Splunk Infrastructure Monitoring, and Splunk Application Performance Monitoring help with debugging problems in microservices?

Receive granular and accurate alerts on service issues

Developers often instrument custom metrics to improve detection and isolation of problems in their service. The more detailed and granular the metric is, the more it helps developers understand the issue. Handling detailed metrics at scale is difficult. The metrics engine in Splunk software is designed from the ground-up for large scale deployments. Because metrics can get noisy, you can use SignalFlow to program your own alerts and to smooth out a signal from noisy business fluctuations in order to ensure the alerts you receive are accurate.

Have unified telemetry and visibility for each service

Splunk Observability Cloud brings together all the telemetry data that you need to debug issues in your services. Splunk provides out-of-the-box RED metrics (Rate, Error, Duration) alongside infrastructure dashboards. With Related Content, whenever you view the performance of a service, you can easily switch between metrics, traces, and logs of that service with the same context and filtering. All dashboards are customizable, so you can adjust your view to align with your preferences.

Isolate if its application, infrastructure, or business logic causing a problem

Tag Spotlight groups together different traces based on attributes they have in common, such as the host they’re running on, version, or http errors (if they exist). Then it uses a visual representation to show errors and latency for each group of traces. With this global view, you can more easily identify the cause of the problem since you can immediately identify what problematic traces have in common.

In cases where you learn of a specific issue from other sources, for example a customer complaint, you can use Trace Analyzer to search through all the traces just for the ones relevant to that issue, and drill down through the waterfall view to better understand the problem.

Diagnose accurate root cause within a service

Through Splunk Infrastructure Monitoring, you can understand issues caused by the infrastructure (such as low host memory) or the network.

Within the waterfall view, you can understand the impact of upstream and downstream services on their own service, and can identify poor database query performance. And with AlwaysOn Profiling, you can see how much memory and CPU each line of code consumes (in Java, .NET, and Node.js) to identify problematic code.

Use logs from advanced Splunk use cases for troubleshooting app services

Splunk Log Observer Connect automatically pulls relevant logs from the Splunk platform so that your engineering teams can send logs once to a single vendor and use them for multiple use cases. Logs in dashboards then provide teams with logs in context to the metrics, infrastructure, and traces that they are viewing so that they can easily understand the root cause of issues.

Instrument for the last time

One reason that developers are hesitant to change observability vendors is that each vendor requires their own instrumentation. OpenTelemetry is the de facto open standard of instrumentation, and Splunk Observability Cloud is OpenTelemetry-native. With Splunk Observability Cloud, developers have the peace of mind knowing that after they instrument their code with OpenTelemetry, they can send their data to any observability vendor without needing to re-instrument if they change tools or as they build new applications.

Use case guidance