Managing a large number of metrics sources

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

With a standard subscription, a single Splunk Infrastructure Monitoring plot line (represented by one letter or row in a chart or detector builder view) can process up to 5,000 metric time series. Each metric time series is equivalent to the combination of a measurement being taken (denoted by the metric name, such as memory.free) and unique permutations of sets of one or more dimensions used to characterize or describe the scope of the measurement (denoted as key:value pairs, such as hostname:host1234). If you exceed this per-plot line limit, Splunk Infrastructure Monitoring cannot perform computations accurately.

Break down the metric time series into groups that will fit within the per-plot line limits. For example, if you have 17,000 hosts emitting a free memory metric memory.free, and you want to sum up the total free memory across all of your hosts, then you need to use a dimension to filter the metrics into groupings of 5,000 or fewer hosts. A common example of such a dimension would be by datacenter or region (in the case of Amazon Web Services) or availability zone, each of which might have 3,000 - 4,000 hosts.

Each plot line used to compose the detector signal or dynamic threshold must also conform to the time series limit. For example, if you wanted to ensure that your aggregate free memory across 17,000 hosts exceeded a certain threshold, break the metric down as described in the previous paragraph. Assign the groups to plots, say A, B, C and D. Then use another plot line (plot line E). Therefore, E = A + B + C + D and E becomes your signal for the detector.

Next steps

These additional Splunk resources might help you understand and implement these recommendations:

Splunk Docs: Guidance for metric and dimension names
Splunk Docs: System limits for infrastructure monitoring
Splunk Docs: Create charts