Comparing Intel and AMD hardware performance for the indexing tier

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Historically, your on-premises Splunk platform indexing tier might have run exclusively on Intel-based hardware, including Kubernetes nodes managed through the Splunk Operator for Kubernetes (SOK).

A hardware procurement cycle can present an opportunity to evaluate AMD-based alternatives within the same server generation. This article compares two distinct processor configurations:

INTEL(R) XEON(R) PLATINUM 8558P with a base clock speed of 2.7Ghz, 4Ghz max turbo speed, and 260MB L3 cache.
AMD EPYC 9475F 48-Core Processor with a base clock speed of 3.65Ghz, 4.4Ghz max all core boost, maximum boost of 4.8Ghz, and 256MB L3 cache.

This article shows you the testing methodology and findings from comparing these two platforms in a Splunk platform indexing tier, including search performance, non-uniform memory access (NUMA) balancing behavior, disk I/O performance, and CPU utilization.

The findings in this article reflect a specific hardware comparison carried out by the author, Gareth Anderson, in his own on-premises environment, so your results might vary depending on workload, hardware generation, firmware, OS configuration, and indexer sizing. You should treat these results as a reference point and benchmark against your own environment before making procurement decisions.

Did Splunk platform search performance improve?

All servers operated under light load during testing — total CPU utilization remained below 30%, and disk I/O was minimal. Because the disk cache had not yet filled on any server, cache thrashing was not an issue.

The AMD processor is approximately 35% faster than the Intel equivalent in terms of base clock speed. Each server runs a dual-socket configuration, resulting in 192 logical processors with hyper-threading enabled. Using the SOK, 6 Kubernetes pods run per physical server (6 indexers).

Search results were generally between 10% and 30% faster on the AMD hardware in terms of output per second, although there is large variance between indexers as expected due to the data volumes returned. In some cases, an AMD server can perform more slowly than an Intel server when using this benchmark:

| rest splunk_server=local /services/search/jobs/<sid>
| fields performance.dispatch.stream.remote.*.duration_secs performance.dispatch.stream.remote.*.output_count performance.dispatch.stream.remote.*.invocations 
| untable perf field value 
| rex field=field "performance\.dispatch\.stream\.remote\.(?<indexer>.*)\.(?P<type>.*)" 
| eval combined=type . "!" . value 
| stats values(combined) AS combined by indexer 
| eval combined_count=mvcount(combined) 
| where combined_count==3 
| eval duration_secs=mvindex(split(mvindex(combined,0),"!"),1) 
| eval output_count=mvindex(split(mvindex(combined,2),"!"),1)
| eval invocations=mvindex(split(mvindex(combined,1),"!"),1)
| eval output_per_duration=output_count/duration_secs 
| fields - combined, combined_count

The conclusion at this point is that the AMD hardware is generally faster for these indexers. The remainder of this article discusses other findings related to CPU usage, NUMA balancing, and I/O performance differences.

NUMA balancing

Splunk Help's Splunk Enterprise 10.2 release notes recommend disabling NUMA, however the author's prior testing on the Intel-based servers indicated that disabling NUMA at the BIOS level had a negative impact on performance for this hardware configuration.

This testing was repeated on the AMD-based servers; however, the BIOS server performance profiles introduced variability in NUMA topology — ranging from no NUMA nodes to as many as four nodes per processor, making a controlled comparison difficult. To ensure a consistent configuration across all AMD servers, NUMA was disabled at the kernel level by passing numa=off as a boot parameter on Red Hat Enterprise Linux 8. This setting resulted in the OS seeing 1 NUMA node instead of 4.

The server setup for the test was as follows:

idx-053 — numa balancing disabled at runtime
idx-054 — numa balancing disabled at runtime
idx-055 — numa disabled at OS level (numa=off)
idx-056 — no changes
idx-057 — numa disabled at OS level (numa=off)
idx-058 — no changes
idx-059 — no changes
idx-060 — no changes

The configuration was enabled on Monday 20th around 13:30 and completed on Thursday 23rd around 13:00. The following graph shows the CPU load across the AMD servers:

Highlighting the two nodes with NUMA disabled we can see a clear trend:

Before the NUMA setting was disabled, the servers were consuming a similar amount of CPU. After disabling NUMA on the 20th of April, the CPU utilization increased. The CPU utilization returned to the baseline after re-enabling NUMA on the 23rd of April.

Disabling automatic NUMA balancing using the setting /proc/sys/kernel/numa_balancing did not appear to make a noticeable difference in CPU utilization.

A comparison of the output_count versus duration for selected Splunk platform searches showed potentially a small correlation with slower search results. The results were not clear enough from the searches examined to draw a conclusion of improving or decreasing performance.

NUMA was kept enabled, which is the default setting. Test your environment to determine whether disabling NUMA improves performance.

Disk setup

Disk setup is critical in Splunk platform indexer workloads, so a comparison was run between the servers' disk performance using the Metricator for Nmon application. The I/O stats from this tool have previously been confirmed as accurate; more details are available in the article Benchmarking filesystem performance on Linux-based indexers.

`dgreadserv` disk group read service times and disk read IOPS

Comparing dgreadserv (disk group read service time) and disk reads in terms of IOPS shows that while the read I/O traffic is similar, the AMD servers appear to be slightly faster when it comes to reading. Both servers have excellent response times.

The idx-053 to idx-060 lines on the graph below show AMD-based disk group read service times, and idx-049 to idx-052 show Intel-based disk group read service times:

The idx-053 to idx-060 lines on the graph below show AMD-based disk read IOPS, and idx-049 to idx-052 show Intel-based disk read IOPS:

When comparing another set of indexers (pods) on the same AMD servers with a different set of Intel servers, the read service times are extremely fast on this setup of pods, likely due to a lower workload. The AMD-based servers are generally doing fewer read IOPS.

The idx-053 to idx-060 lines on the graph below show AMD-based disk group read service times, and idx-036 to idx-039 show Intel-based disk group read service times:

The idx-053 to idx-060 lines on the graph below show AMD-based disk read IOPS, and idx-036 to idx-039 show Intel-based disk read IOPS:

`dgwriteserv` disk group write service times and disk write IOPS

Comparing dgwriteserv (disk group write service times) and write IOPS shows that the disk write IOPS are slightly lower on the AMD servers, and the disk write service times are also lower, staying closer to 1 ms or less. The Intel-based servers are close to a 4 ms average.

The idx-053 to idx-060 lines on the graph below show AMD-based disk group write service times, and idx-049 to idx-052 show Intel-based disk group write service times:

The idx-053 to idx-060 lines on the graph below show AMD-based disk write IOPS, and idx-049 to idx-052 show Intel-based disk write IOPS:

When comparing another set of indexers (pods) on the same AMD servers with a different set of Intel servers, the AMD servers are doing less writing and have a much lower I/O latency for writes.

The idx-053 to idx-060 lines on the graph below show AMD-based disk group write service times, and idx-036 to idx-039 show Intel-based disk group write service times:

The idx-053 to idx-060 lines on the graph below show AMD-based disk write IOPS, and idx-036 to idx-039 show Intel-based disk write IOPS:

In summary:

Write latency is under 1 ms on AMD; Intel-based servers are 2–8 ms, although they do have about 30% more write activity.
Reads are very fast: Intel is 0.25 ms vs 0.18 ms for the AMD servers.

Why are the I/O service times lower on the AMD servers?

The reason why the I/O service times are lower on the AMD servers was not determined. The NVMe drives are in theory the same, although the AMD servers arrived 6 months or 9 months after the Intel servers, so perhaps they received faster disks.

There are also some potential advantages in the AMD architecture related to PCI-E bus setup and memory bandwidth. Because HBA controllers are used in front of the NVMe disks on both sets of servers, this might be impacting the I/O latency. This discussion is speculative as the NVMe drives cannot be confirmed as identical. From a specification standpoint, both platforms utilize PCI-E 5 and equivalent memory speeds.

The AMD servers had a clear benefit here, but it cannot be confirmed whether this relates to the processor.

Other notes on disk usage

On older NVMes that were read-intensive, performance degradation occurred when the free space dropped below 80–90%, so to be on the safer side, 20% free was reserved.

On AWS, this experiment was repeated on i3en.6xlarge instances and the number is closer to 3.5% before performance degradation occurs.

On the more modern on-premises NVMes of 14TB, no issues with degradation in performance were found when decreasing free space to 5%. The threshold was kept at 5% because going too low might cause ext4 fragmentation or potential premature disk wear.

Server CPU usage

The AMD servers are using roughly half the CPU of the Intel servers overall, though the workloads differ because each server has a different mix of indexers. All nodes are running six indexer pods each. As noted earlier, disk usage is still low — around 25% of the disk is used. CPU utilization is expected to rise as ingestion rates per indexer increase and approach the cache limits.

Using kubectl top pods to show the pods' millicores CPU usage for all indexers on AMD and Intel hardware, the average usage was:

AMD: 3536.83
Intel: 5148.02

The Intel pods are using approximately 45% more CPU usage on average, which is slightly more than the 35% base clock speed difference.

The graph below shows server CPU usage, with the upper lines showing Intel servers, and the lower lines showing AMD servers:

The chart below shows a per-pod CPU comparison from kubectl top pods, with the purple lines showing AMD pods and the pink lines showing Intel pods:

Conclusion

The CPUs used in this article were quite different in terms of base clock speed, as this is what the hardware vendor sold as part of their solution. The faster CPU clock speed does appear to benefit the Splunk platform when returning search results, and no drawbacks have been identified to using AMD instead of the traditional Intel-based hardware choice.

For the current indexer setup described here, AMD-based processors appear to provide a clear advantage.

Additional resources

These resources might help you understand and implement this guidance:

GitHub: Millicore CPU usage from kubectl top pods CSV file
Splunk OnDemand Services: Use these credit-based services for direct access to Splunk technical consultants with a variety of technical services from a pre-defined catalog. Most customers have OnDemand Services per their Success Plan. Engage the ODS team at ondemand@cisco.com if you would like assistance.