Skip to main content

 

Splunk Lantern

Summarizing high-cardinality metrics by using metrics pipeline management

 

You are a site reliability engineer (SRE) for your organization in charge of monitoring observability ingest usage for your team. You need to make sure you stay within your company’s budget. 

You notice your team's metrics usage has recently increased. You obtain a detailed metrics usage report that gives you insights into the metrics volume, high cardinality dimensions, usage of the metrics in charts and detectors, and distribution of metrics.

The metrics usage report shows your team sends 128 metric time series (MTS) for the k8s.container.restarts metric to Splunk Observability Cloud. You know based on discussions with your team that not all the data is necessary at full granularity. To understand more about the cardinality of different dimensions, you review the report and notice that the container.id dimension is the highest cardinality dimension for k8s.container.restarts.

You know your team cares most about Kubernetes (k8s) container names when it comes to k8s restarts, so they only need to monitor the k8s.container.names dimension. The container.id dimension is not information they need to monitor. 

You need to discard the container.id from the data being sent to Splunk Observability Cloud.

Solution

In Splunk Observability Cloud, you can use metrics pipeline management to create an aggregation rule that reduces the cardinality of k8s.container.restarts by keeping the k8s.container.names dimension and discarding container.id.

  1. In the left navigation pane, click Metrics Pipeline Management, then + Create new rules.

    image2.png

  2. Search for the k8s.container.restarts metric and click OK.

    image3.png

  3. Click Add aggregation rule.

    add agg rule.png

  4. Under Show related dimensions, select container.id.image4.png
  5. In Specify dimensions, select Drop.
  6. In the New aggregated metric name field, enterk8s.container.restarts_name  and click Generate Name.
  7. Next, download the list of charts and detectors that use the k8s.container.restarts metric. Click View list of charts and detectors and then, in the pop-up window, click Download.

    view list of charts and detectors (3).png

  8. For each chart and detector identified in the list, replace k8s.container.restarts withk8s.container.restarts_name by editing the associated chart and detector in Splunk Observability Cloud. You now have a new aggregated k8s.container.restarts_name metric that yields an acceptable MTS level.
  9. You can now drop the unaggregated raw metric that the team no longer needs to monitor. Do this by selecting k8s.container.restarts on the Metrics Pipeline Management page to view current rules for the metric, and change Keep data to Drop data. Then click Save.

    image1.png

  10. Verify the new metric volume after dropping the data you don’t need, and save the rules.

By combining aggregation and data dropping rules, you have successfully summarized a high cardinality metric, creating a more focused monitoring experience for your team while minimizing storage costs for the company.

Next steps

These resources might help you understand and implement this guidance: