Performance tuning the indexing tier
This article explores various tuning options for the indexing tier in the Splunk platform. Optimizing this tier is crucial for efficient data ingestion and overall system performance.
There are a number of settings you can tune on the indexing tier. This article shares tips for ensuring optimal performance in each of the following areas:
- Queue size tuning
- Asynchronous load balancing
- Parallel ingestion pipelines
- Maximizing indexer disk performance
- OS and Splunk platform version
- Indexer cluster size
- Disabling the KV Store
- Disabling unnecessary services
- Cluster manager — server.conf tuning
- Indexer settings — limits.conf and server.conf
- Indexer settings — indexes.conf
- SmartStore issues
- Linux-specific options
- Data parsing
- Avoiding hot buckets on a single indexer
To determine how many indexers might be required, refer to the Splunk Validated Architectures Topology selection guidance. You might also want to read the Lantern article Planning for infrastructure and resource scalability.
This article is one of three that explore performance improvements for the indexing tier, forwarding tier, and search head tier. Check out the other articles to gain insights into optimizing each tier.
Queue size tuning
You can tune inputs.conf (queueSize
) for the data receiving queue, and server.conf settings for most other in-memory queues. Larger values than those shown in examples can be used if that suits your needs, although opting for lower values might encourage upstream forwarders to shift toward less congested indexers when queues begin to fill.
Using very large queues can mask issues and result in longer ingestion times overall.
server.conf settings
[queue] maxSize = 4MB #default 1MB [queue=aggQueue] maxSize = 10MB # default 6MB [queue=parsingQueue] maxSize = 10MB # default not set (500KB?) [queue=indexQueue] maxSize = 20MB
In terms of monitoring the queues you can use the monitoring console in the Splunk platform, or the dashboard “indexer_max_data_queue_sizes_by_name” in Alerts for Splunk Admins.
If you are using a Splunk platform instance of version 9.4.0 or newer you might wish to use the auto queue sizing option in server.conf:
[queue] autoAdjustQueue = true
The community post How to improve indexing thruput if replication queue is full? also mentions adjusting the indexes.conf file to have maxMemMB=100
under the [default]
stanza.
Asynchronous load balancing
Asynchronous load balancing can dramatically improve performance on the indexing tier when data is evenly balanced among indexer peers, in terms of ingestion performance and replication queues, which can be used as a proxy measure for ingestion health. This improved distribution of data can also enhance search performance.
You can find more information on asynchronous load balancing in Performance tuning the forwarding tier.
Parallel ingestion pipelines
parallelIngestionPipelines
are documented on the indexing tier in terms of index parallelization. Limits exist around how many pipelines can be used on the indexing tier, and testing has shown that migrating to Kubernetes can also improve performance beyond what parallelIngestionPipelines can achieve. See the article Understanding how to use the Splunk Operator for Kubernetes for more information.
The Splunk Docs guidance Manage pipeline sets for index parallelization provides further detail on how parallel ingestion pipelines work. Each pipeline has a unique set of hot buckets, affecting per-pipeline tuning settings such as queue sizes.
Although the Splunk Operator for Kubernetes can optimize hardware utilization, be aware that Kubernetes is a complex technology and the decision to move into that space should involve careful consideration.
If you are running version 9.4.0 or newer, there is an option to automatically adjust the ingestion pipelines using server.conf with the parameter pipelineSetAutoScale
.
Maximizing indexer disk performance
In server.conf, maintaining free space is crucial for performance reasons. The minFreeSpace
setting allows a percentage to be used, and combining it with the eviction_padding
setting can prevent temporary pauses in searching during SmartStore bucket evictions.
server.conf example
[diskUsage] minFreeSpace = 5% [cachemanager] eviction_padding = 2180170
Benchmarking has shown ext4 might have lower latency for production workloads than XFS. Conducting your own testing is advised. The article Benchmarking filesystem performance on Linux-based indexers provides insights into this comparison. In this article and others, 20% free space was the cutoff point where latency started to degrade as the disk filled further. For newer hardware we purchased in 2025 we found that 10% free space caused no degradation in performance, so this should be tested on your own hardware.
OS and Splunk platform version
Kernel versions across indexers can impact disk write latency, with newer versions tending to correlate with lower latency. Keeping versions up-to-date, for example by using an n-1 version, generally improves performance.
Indexer cluster size
Running larger indexer clusters can slow recovery times to achieve a valid and complete state. Multiple indexer clusters with fewer nodes are advised to mitigate issues with high bucket counts and single-threaded components in cluster managers.
Having unique sets of indexes per indexer cluster and minimizing the number of search heads accessing required indexes can enhance performance. Clusters with identical configurations can distribute workloads effectively.
Advantages of this approach include:
- When an indexer restarts, you can isolate which cluster is affected, allowing you to reboot multiple indexers in one window since each is within a different cluster.
- Cluster managers take less time to recover from restarts of the cluster manager, and also from indexer restarts, due to the smaller total bucket count per cluster.
- The impact of a single cluster manager failure is reduced as it only effects a subset of the overall environment.
One disadvantage of this approach is that you'll have more cluster managers to watch and maintain, but this might not cause you any real issues.
Disabling the KV Store
The KV Store is required on search heads and forwarders for checkpoint tracking, but it is generally not needed on indexers. Disabling it is done in server.conf:
[kvstore] disabled = true
Disabling unnecessary services
Newer versions of the Splunk platform including the later 9.3.x, 9.4.x and version 10 introduced the concept of sidecars.
The sidecar configuration settings page provides settings that you can use to disable services that you are not using. On forwarders, you might wish to disable in server.conf:
[teleport_supervisor] disabled = true enable_splunk_spotlight = false [postgres] disabled = true
In versions newer than 9.4 you will also want to disable Prometheus as per Linear memory growth with Splunk 9.4.0 and above:
[prometheus] disabled = true
The scheduler is also optional on push-based forwarders, although I usually leave it enabled, on a pull-based forwarder you do want to leave the scheduler enabled. You can use an application to disable the scheduler, or you can create a default-mode.conf file with:
[pipeline:scheduler] disabled = true
Finally, you might wish to disable certain applications on all forwarders, such as the Splunk secure gateway application. To help with this I created the AppDisabler application on Splunkbase, or if you prefer to disable individual reports you can use the ReportDisabler.
Cluster manager — server.conf tuning
Several parameters can be tuned for a moderate-sized indexer cluster (for example 80 indexers). It is advised to research before changing any settings:
[clustering] max_peers_to_download_bundle = 20 send_timeout = 300 rcv_timeout = 300 cxn_timeout = 300 heartbeat_timeout = 120 restart_timeout = 120 percent_peers_to_restart = 6 heartbeat_period = 10 backup_and_restore_primaries_in_maintenance = True rolling_restart_condition = up constrain_singlesite_buckets = false searchable_rolling_peer_state_delay_interval = 120 localization_based_primary_selection = auto # You will want these settings to be lower such as 2 or 3 if you want a slower # recovery with better performance. I've priortised restoring rep/search factor max_peer_build_load = 20 max_peer_rep_load = 50 #max_peer_sum_rep_load = 2 # throttle the amount of bandwidth used for non-hot (warm/cold) replication # defaults to 0 or unlimited #max_nonhot_rep_kBps = 10000
Settings like backup_and_restore_primaries_in_maintenance
and localization_based_primary_selection
might enhance performance when using SmartStore.
Indexer settings — limits.conf and server.conf
limits.conf
[search] # this relates to a support case, added for consideration only max_rawsize_perchunk = 500000000 # also related to a support case of buckets failing to localize in time bucket_localize_max_timeout_sec = 600 # related to regex issues idle_process_regex_cache_hiwater = 210000 # Increase to 1000MB lookups to avoid the indexing of lookups unless we really need to [lookup] max_memtable_bytes = 1048576000 # Allow the indexers to read the various on-disk files they now track (such as telemetry) [inputproc] max_fd = 4000 # spath is distributed and it does not work as expected from SH if the setting is not on the indexer [spath] extraction_cutoff = 300000
server.conf
[clustering] # these can be increased if you are seeing indexer to indexer timeouts #rep_max_send_timeout = 180 #rep_max_rcv_timeout = 180 [general] # regex related default is 2500 regex_cache_hiwater = 210000 [httpServer] # allow just under 5GB of bundle to be uploaded max_content_length = 5000000000 # these timeouts are based on older versions and may no longer be required streamInWriteTimeout = 30 busyKeepAliveIdleTimeout = 180
Indexer settings — indexes.conf
Here are example default settings for all indexes:
[default] tsidxWritingLevel = 4 journalCompression = zstd repFactor = auto
For non-SmartStore setups, review how indexer clusters handle report and data model acceleration summaries. summary_replication
might be useful for you to use.
On SmartStore clusters, the summaries are uploaded and this setting is not required. For details, see How is the replication of summary bucket managed in Splunk Smartstore?
SmartStore issues
Within SmartStore environments, downloads can be tuned:
[cachemanager] max_concurrent_downloads = <unsigned integer>
You should test how different download settings (for example, going to to 12 from the default of 8) affect your system performance. Heavy SmartStore downloads can max out CPU or block ingestion queues, with effects less noticeable in newer Splunk platform versions.
You can visit the Getting smarter about Splunk SmartStore GitHub repository for dashboards. The “SmartStore Stats” dashboard in Alerts for Splunk Admins and the s2_traffic_report
dashboard on GitHub are also recommended.
To track problematic searches, use the “SearchHeadLevel — SmartStore cache misses — combined” or “IndexerLevel — SmartStore cache misses — remote_searches” dashboards to detect an actively running search downloading many buckets.
Linux-specific options
The newer systemd unit files set high limits. If you not using systemd, set limits in the limits.conf file instead.
Settings to adjust within this file are:
LimitCORE=infinity LimitDATA=infinity LimitNICE=0 LimitFSIZE=infinity LimitSIGPENDING=385952 LimitMEMLOCK=65536 LimitRSS=infinity LimitMSGQUEUE=819200 LimitRTPRIO=0 LimitSTACK=infinity LimitCPU=infinity LimitAS=infinity LimitLOCKS=infinity LimitNOFILE=1024000 LimitNPROC=512000
You can also set this on all enterprise instances in sysctl
:
kernel.core_pattern: "/opt/splunk/%e-%s.core"
This ensures core dumps are written to a directory the process can write to. Use /opt/splunk/var on Kubernetes instances, as the var partition is persisted to a larger disk. This allows investigation into core dumps of Splunk Enterprise.
Consider disabling transparent huge pages, as recommended in Splunk Docs.
Splunk Enterprise 9.3 also recommends changing the NUMA settings on servers. I have tested this by disabling NUMA at the BIOS level on one indexer with 2 NUMA nodes, and this appeared to result in higher CPU utilization with the same or less workload compared with another identically configured indexer.
I have also tested dynamically changing the NUMA balancing setting on 3 indexers with 2 NUMA nodes and this appeared to result in no measurable difference compared to 3 identically configured peers in the same cluster.
Data parsing
The quality of data parsing directly impacts indexer performance. Articles like Improving data onboarding with props.conf configurations and Clara-fication: Data onboarding best practices cover this topic well.
Setting SHOULD_LINEMERGE
to False
and using an appropriate LINE_BREAKER
setting can relieve pressure on the aggregation queue.
Use the “indexer_max_data_queue_sizes_by_name” dashboard in Alerts for Splunk Admins, or dashboards in the monitoring console to view queue-based performance.
Avoiding hot buckets on a single indexer
Automating the rolling of buckets can help avoid issues with data loss when instances using local NVMe disks fail and the bucket is not replicated. Consider creating roll_and_resync_buckets_v2.sh.
Next steps
The primary challenge on the indexing tier is ensuring enough indexers with reasonable hardware capacity, where hardware capacity is influenced by search and ingestion workloads.
Tuning involves testing different filesystems, setting valid limits in configurations, and adjusting queue sizes to improve performance.
If issues arise with cluster managers in terms of the “all data is searchable” state, consider creating smaller indexer clusters or relocating the cluster manager to a faster CPU.