Skip to main content
Splunk Lantern

Using OpenTelemetry annotations to lower MTTR

OpenTelemetry allows you to easily capture metrics from your applications and add custom dimensions for later analysis. In this article you'll learn how to use annotations to associate your captured measurements to provide contextual information about your distributed workloads. For example, you can add a version annotation to a metric to find all requests made by a version anywhere in your application.

Adding annotations to your spans adds cardinality to your telemetry, allowing you to better understand more about your application and get answers to what went wrong and why. Annotations added to your traces can help you narrow down your data best fit for your application's development and deployment, ultimately reducing your MTTR.

About OpenTelemetry

OpenTelemetry data pipelines are built with the OpenTelemetry Collector. It is responsible for aggregating workload telemetry and exporting this data to an analysis system like Splunk, or open-source systems like Prometheus. I’ll provide a brief introduction to annotations and configuration of the OpenTelemetry collector below.

Annotations, also known as tags, are key-value pairs of data associated with recorded measurements to provide contextual information, distinguish, and group metrics during analysis and inspection. When measurements are aggregated to become metrics, annotations are used as labels to break down the metrics.

The OpenTelemetry Collector configuration file is written using YAML and a full pipeline contains the following components: 

  • Receivers: How to get data in. Receivers can be push or pull-based.
  • Processors: What to do with received data.
  • Exporters: Where to send received data. Exporters can be push or pull-based.
  • Extensions: Provide capabilities on top of the primary functionality of the collector.

Each of these components is defined within their respective section and must also be enabled within the service (pipeline) section. 

Adding an annotation for the deployment environment

You can add a deployment environment to your workloads by adding the resource/add_environment processor to the Splunk OpenTelemetry Collector’s configuration file. The resource/add_environment processor adds the deployment.environment annotation to all spans to help you quickly identify your workloads within your analysis system, such as Splunk APM.

Here is an example of collected traces in Splunk APM with no named environment. Without a named environment, production and testing or staging data could be mixed together, making analysis difficult.

The bold text below highlights the addition to the processors section of the configuration file to aggregate the CloudProduction annotation to contain the specific deployment environment:

 
processors:
  resourcedetection:
    detectors: [system,env,gce,ec2]
    override: true
  resource/add_environment:
    attributes:
      - action: insert
        value: CloudProduction
        key: deployment.environment

You can then enable this processor in the pipelines section for your traces and logs of the configuration file to enable the resource/add_environment processor. Here is an example configuration file showing the resource/add_environment processor enabled:

When this configuration is done, Splunk APM now shows the CloudProduction annotation and you cna use this to filter throughout the backend based on which environment is handling the request. This is one of the default troubleshooting MetricSets, which Splunk APM automatically indexes.

In addition to deployment environment, you can aggregate any other annotations to help you identify application performance bottlenecks.

You can do this using the attributes/newenvironment processor, which adds a span annotation to any spans that don’t already have the annotation. This is useful to add metadata to your spans, like version numbers or deployment color when using blue/green deployments. Implementing the attributes/newenvironment processor is the same as resource/add_environment processor or any other processor when using OpenTelemetry.

Here is an example of what the attributes/newenvironment processor and the resource/add_environment processor look like as part of the same configuration. 

In the configuration file below, you can see the attributes/newenvironment processor added to the previous configuration to include both the version of your microservice application and deployment color:

processors:
  resourcedetection:
    detectors: [system,env,gce,ec2]
    override: true
  resource/add_environment:
    attributes:
      - action: insert
        value: CloudProduction
        key: deployment.environment
  attributes/newenvironment:
    actions:
      - key: version
        value: "v1.0.1"
        action: insert
      - key: deploymentcolor
        value: "green"
        action: insert

When you look at the trace in Splunk APM, you can see that the version and deployment color are now included as part of each span collected for your microservice application:

Why annotations and cardinality have an impact on MTTR

Adding annotations to your spans adds cardinality to your telemetry, allowing you to better understand more about your application and get answers to what went wrong and why.

For example, with {[apm}}, you can create MetricSets, which are categories of metrics about traces and spans you can use for real-time monitoring and troubleshooting. MetricSets are specific to Splunk but are effectively aggregates of metrics and metric time-series, enabling you to populate charts and generate alerts. Creating custom MetricSets from the annotations identified in the previous examples allows you to use specific filters to narrow down any bottleneck affecting application performance. For example, with Splunk Infrastructure Monitoring, you can narrow down all hosts particular to an application environment, such as a region or datacenter.

This example shows how you can use the annotation for your deployment environment CloudProduction as a filter to create a custom dashboard showing all hosts within the CloudProduction environment:

Since all of your data is tagged with these annotations and created as MetricSets, you can also use them within Splunk APM. You can see from the example below that the annotations are now available as part of Splunk APM’s Tag Spotlight and Dynamic Service Map:

You can then filter your application telemetry by these specific annotations so you can get a clear map of service dependencies and find granular trends contributing to possible application performance issues.

Next steps

The content in this article comes from a previously published blog, one of the thousands of Splunk resources available to help users succeed. In addition, these resources might help you understand and implement this guidance:

Still need help with this use case? Most customers have OnDemand Services per their license support plan. Engage the ODS team at OnDemand-Inquires@splunk.com if you require assistance.