Skip to main content
Splunk Lantern

Managing a large number of metrics sources in Splunk InfraMon

Applicability

  • Product: Splunk Infrastructure Monitoring
  • Feature: Metric time series
  • Function: Plotting metrics

Problem

With a standard subscription, a single Splunk Infrastructure Monitoring plot line (represented by one letter or row in a chart or detector builder view) can process up to 5,000 metric time series. Each metric time series is equivalent to the combination of a measurement being taken (denoted by the metric name, such as memory.free) and unique permutations of sets of one or more dimensions used to characterize or describe the scope of the measurement (denoted as key:value pairs, such as hostname:host1234). If you exceed this per-plot line limit, Splunk Infrastructure Monitoring cannot perform computations accurately.

Solution

Break down the metric time series into groups that will fit within the per-plot line limits. For example, if you have 17,000 hosts emitting a free memory metric memory.free, and you want to sum up the total free memory across all of your hosts, then you need to use a dimension to filter the metrics into groupings of 5,000 or fewer hosts. A common example of such a dimension would be by datacenter or region (in the case of Amazon Web Services) or availability zone, each of which might have 3,000 - 4,000 hosts.

Each plot line used to compose the detector signal or dynamic threshold must also conform to the time series limit. For example, if you wanted to ensure that your aggregate free memory across 17,000 hosts exceeded a certain threshold, break the metric down as described in the previous paragraph. Assign the groups to plots, say A, B, C and D. Then use another plot line (plot line E). Therefore, E = A + B + C + D and E becomes your signal for the detector.

Additional resources

These additional Splunk resources might help you understand and implement these recommendations:

  • Was this article helpful?