Skip to main content

 

Splunk Lantern

Preventing premature bucket rolling in metrics indexes

 

Certain forms of metric data ingestion can cause issues where Splunk indexers prematurely roll buckets. The trigger for this issue is related to metrics with a high cardinality of dimension values. The issue relates to the maxMetaEntries inside an individual bucket, and it is not always obvious how this situation can be improved.

Within the internals of a metric bucket there is a Strings.data file. Strings.data plays a role in storing indexed fields in the Splunk platform under specific conditions. There is unfortunately limited official documentation on its impact on search performance for event or metrics indexes. The Strings.data file can grow large in certain metric indexes, often those that contain a large number of unique dimension values. If your environment has a large number of unique devices reporting data, this file could grow significantly and trigger a bucket roll based on the dimensions unique values.

If this issue arises, you might notice warnings related to the maxMetaEntries limit in a bucket. A typical log message looks like this:

finished moving hot to warm bid=indexname~3060~48ADFA56-6906-4D62-A998-7580450C2036 
idx=indexname from=hot_v1_3060 to=db_1724494222_1723608788_3060_48ADFA56-6906-4D62-A998-7580450C2036 
size=165703680 caller=strings_metadata entries=1000202 exceeds max=1000000, 
you may want to disable ANNOTATE_PUNCT in props.conf

In production environments, this growth in metadata and of the Strings.data file can result in premature bucket rolling, even at relatively small bucket sizes, such as 158MB, due to hitting the one million default maxMetaEntries limit. If your buckets roll before reaching their configured size limit, your environment will accumulate a large number of buckets, potentially degrading both indexing and search performance.

You can detect this issue using the alert “IndexerLevel — strings_metadata triggering bucket rolling” in the Alerts for Splunk Admins app. Additionally, a related message will appear in the splunkd logs on your indexers. Rather than discovering the issue after the fact, a better solution is to try to prevent the issue.

Solution

The Support Portal article Hot bucket rolling is frequently caused by Strings.data bloating suggests increasing the metadata entry limit. However, alternative ingestion methods could prevent this issue altogether.

Our testing revealed that the way metric data is ingested into the Splunk platform significantly influences bucket size. The following experiment was conducted:

import requests
import json
import time
import random

payload1 = []
payload2 = []

for x in range(1000):
    for i in range(1000):
        event = {
            "region": f"us-west-{i}",
            "datacenter": f"dc{i}",
            "rack": i,
            "os": "Ubuntu16.10",
            "arch": "x64",
            "team": "LON",
            "service": x,
            "service_version": x,
            "service_environment": "test",
            "path": "/dev/sda1",
            "fstype": "ext3",
            "metric_name:cpu.usr": random.random()*100,
            "metric_name:cpu.sys": random.random()*100,
            "metric_name:cpu.idle": random.random()*100
        }
        timey = float(time.time() + i + (x*1000))
        payload1.append({
            "time": timey,
            "event": "metric",
            "source": "metrics",
            "index": "test_fields",
            "sourcetype": "perflog",
            "host": "host_1.splunk.com",
            "fields": event
        })
        payload2.append({
            "time": timey,
            "event": event,
            "source": "metrics",
            "index": "test_event",
            "sourcetype": "_json",
            "host": "host_1.splunk.com"
        })

    r = requests.post("https://localhost:8088/services/collector/event", json=payload1+payload2, headers={"Authorization": "Splunk 479ce488-HEC-TOKEN"})
    print(r.text)

Both payloads use the same HTTP Event Collector (HEC) endpoint. The first payload utilizes the fields method, as described in Splunk Docs.

The second payload is nearly identical but uses _json as the source type, which enables INDEXED_EXTRACTIONS=json.

Although the data remained identical across both test indexes, test_fields resulted in a significantly larger Strings.data file than test_event. This confirms that using the fields parameter in HEC leads to increased Strings.data file sizes, causing premature bucket rolling.

Does INDEXED_EXTRACTIONS improve performance?

Benchmarking search performance on test indexes indicated minimal differences. While warm buckets showed up to a 2% performance improvement, searches on hot buckets exhibited negligible differences.

Introspection data, specifically phases.phase_0.elapsed_time_aggregations.avg, showed no measurable downside to changing the ingestion method.

How does Strings.data affect search time?

Although further investigation was conducted, no definitive answer was found regarding how Strings.data impacts search performance in a metrics index. A pending knowledge base article addresses this topic, but the key takeaway is:

  • If the fields argument is used in the HEC payload, dimensions are stored in Strings.data instead of the main event data.
  • Using INDEXED_EXTRACTIONS ensures dimensions are stored outside of Strings.data, reducing metadata bloat.

Conclusion

If your data includes a large number of dimension values, configuring the source type with INDEXED_EXTRACTIONS and using the event payload method is recommended. The fields option can lead to excessive Strings.data growth, resulting in premature bucket rolling.

No performance drawbacks were identified when switching to INDEXED_EXTRACTIONS, making it an effective way to optimize indexing behavior.

Next steps

Feedback has been submitted to Splunk Docs regarding additional HEC examples, and a response is pending.