Benchmarking filesystem performance on Linux-based indexers
A project with the primary goal of optimizing the indexing tier’s I/O performance had a secondary output of benchmarking filesystem performance on a Splunk environment with Linux-based Splunk Enterprise indexers.
Based on previous Splunk .conf presentations, the idea was to switch from ext4 to XFS to maximize disk performance. However, after changing to XFS, the I/O performance decreased, rather than increased, over time.
The Splunk-based indexer workloads tested included around a million searches per day and ingestion of around 350GB data per indexer per day. The ext4 filesystem consistently outperformed XFS in terms of the introspection measure avg_total_ms
on multiple indexer clusters. What caused a more significant performance impact was maintaining 20 percent free disk space versus 10 percent free disk space.
If you are interested in benchmarking your Linux-based Splunk Enterprise indexers, this article can help you decide on the best data and processes and settings to use. You can also use this link to jump directly to the results.
Measuring Linux I/O performance
There are multiple ways to measure I/O in Linux. Here are a few options.
iostat
For an excellent discussion on iostat usage, refer to Digging Deep into Disk Diagnoses (.Conf 2019).
Pros
- Very flexible
- Provides all required statistics accurately
Cons
- You may need to use per-second measurements so that you do not miss the spike in latency that affects indexing.
- iostat is a great CLI utility. However, you need to get the data into another tool to graph or compare it.
Linux kernel I/O statistics
As per the kernel documentation for I/O statistics fields, the /proc/diskstats
file is used by iostat to measure a difference in the I/O counters.
Assuming you have iostat running for a period of time, you can compare the counter values to the previously seen counter value. This is why the first iostat output is from system boot time unless the -y
flag is used.
Splunk Add-on for Unix and Linux
Pros
- Easy to setup
- Runs iostat as a shell script
Cons
- This add-on measures iostat data incorrectly as it doesn’t keep iostat running. Idea APPSID-I-573 brings up this issue. Developers and Support have also been advised in detail of the issue via a support case in 2023, but as of July 2024 the issue has not been resolved.
Metricator application for Nmon
The Nmon utility appears to result in accurate I/O data. However, the measurements are often different from iostat. For example, the disk service time is the average service time to complete an I/O. It is similar to await
or svctm
in iostat, but it is a larger value in Nmon. It does, however, correlate as expected.
Pros
- Metricator provides useful graphs of the I/O data
- Accurate data
Cons
- Measurements are different to those in other utilities
- Might have too much data for some
Splunk Enterprise _introspection data
Splunk Enterprise records I/O data in the _introspection index by default, and this data correlates with the Nmon/iostat data as expected. At the time of writing, there was no documentation on the introspection I/O metrics.
In Alerts for Splunk Admins, the dashboard splunk_introspection_io_stats
displays this data. The Splunk Monitoring Console also has views for this data.
Measurement summary
Measurement tool of choice
This project used Nmon and _introspection. The Splunk Add-on for Unix and Linux provided metrics that did not match the iostat data or Nmon or _introspection data. Therefore, results from this add-on were excluded.
Variation in I/O performance
Splunk user searches will change I/O performance. In particular, SmartStore downloads or I/O spikes changed disk service times. You can use the report “SearchHeadLevel — SmartStore cache misses — combined” in Alerts for Splunk Admins for an example query or the SmartStore stats dashboard.
Per-server variance
I/O performance also varied per server, irrelevant of tuning settings. For an unknown reason, some servers had slower NVMe drives than others with a similar I/O workload.
Choice of measurement statistic
There are many statistics for disk performance in the _introspection index. We used:
data.avg_service_ms
(XFS performed better)data.avg_total_ms
(ext4 performed better)
With the Nmon data, DGREADSERV/DGWRITESERV
were lower on ext4, and this correlated with data.avg_total_ms
from the _introspection index in Splunk Enterprise. Furthermore, this seemed to correlate with the await
time reported in iostat.
Additional measurements
DGBACKLOG
from Nmon was lower (back log time ms) on ext4. However, disk busy time was higher. ext4 also resulted in more write and disk write merge operations.
The total service time for an I/O operation was consistently lower under ext4 vs XFS, thus the recommendation and choice of ext4 going forward.
Filesystem tuning & testing
/etc/fstab settings
ext4 — noatime,nodiratime (also tested with defaults)
XFS — noatime,nodiratime,logbufs=8,logbsize=256k,largeio,inode64,swalloc,nobarrier (also tested with defaults)
Changing filesystems
To switch filesystems, I reformatted the partition with the required filesystem (a complete wipe), and I let SmartStore downloads re-populate the cache over time.
Metricator/nmon along with Splunk’s _introspection data was used to compare performance of the filesystems on each server.
Performance improved (initially) after the switch to XFS. However, we later determined that the performance improvement related to the percent of the partition / disk that was left free. There was a noticeable increase in response times after the partition dropped below 20 percent free space towards the 10 percent free set in the Splunk platform server.conf settings.
Keeping 10 percent of the disk free is often recommended online for SSD drives. We increased our server.conf setting for minFreeSpace
to 20 percent to maximize performance.
Server setup
- All servers were located on-premise (bare metal), with 68 indexers in total.
- 4 NVMe drives per server (3.84TB read intensive disks), Linux software raid (mdraid) in RAID 0
- Total disk space was 14TB per indexer on a single filesystem for the SmartStore cache and indexes/DMA data
Results
Graphs for ext4 versus XFS
The graph below depicts, for an equivalent read/write workload, the “average total ms” value, which is named “average wait time” in the graphs. The graphs show results for the total response time (sum) of the four disks on each server across multiple servers and alternative ways to measure this value, such as perc95
of response times across the four disks.
ext4 appeared to be faster in all cases.
The following graphs depict a similar read/write workload with a 24-hour timespan.
The following graphs show reads/writes per second. ext4 has more writes per second in some instances. However, XFS has longer wait times.
What about OS version changes?
The general trend was that a newer kernel version resulted in lower service times on ext4.
Cent OS 7 / kernel 3.10 generally had lower performance than servers running Redhat 8.5 / kernel 4.6.x. This in turn was slower than servers with Oracle 8 / kernel 5.4.x UEK.
There was not enough data to draw a conclusion, but there was a definite trend on the servers with newer kernel versions having lower latency times at disk level.
Conclusion
The ext4 filesystem for our Splunk indexer workload, which involved over 1 million searches day and around 350GB/data/day per indexer, was generally faster than XFS in terms of the avg_total_ms
measurement. What made a greater difference in performance was leaving 20 percent of the disk space on the filesystem free. This was true for both ext4 and XFS. Finally, newer kernel versions appear to also improve I/O performance with ext4, but this comparison was not done with XFS.
If you are running a Splunk indexer cluster, test out ext4 if you are currently using XFS. Let me know what you find by logging in using the icon in the upper right and leaving a comment.
These additional resources might help you understand and implement this guidance:
- Splunk Docs: What does platform instrumentation log?
- Splunkbase: Alerts for Splunk Admins