Using Edge Processor to save Splunk Virtual Compute
Splunk Virtual Compute (SVC) utilization is a measurement of the resources that are employed by the Splunk stack. There are several aspects of an efficient and healthy Splunk Cloud Platform stack. The more efficient a stack is, the more value you will get out of your Splunk deployment. In this article, we elaborate on how Data Model Accelerations (DMA) in conjunction with Splunk Edge Processor can theoretically be used in order to optimize your SVC utilization.
Background
Splunk Edge Processor (EP) is a data processing solution that works at the edge of your network and can be used to filter, mask, and route your data close to its source before it gets to the Splunk platform or other external environments. For example, EP can be used to create indexed fields from the raw data. Indexed fields are indexed in the Splunk platform upon ingestion, rather than evaluated at search time. Such fields can improve the efficiency of compute-heavy Splunk processes which in turn, can contribute to a reduction in the overall SVC usage.
One such compute-heavy process is Data Model Acceleration (DMA). DMA is used by the Common Information Model (CIM) application, which is used by Splunk Enterprise Security (ES) but can also be used independently. DMA indexes CIM data which enables faster search performance on this data. In effect, DMA runs SPL queries periodically in the background to index CIM data, and if these queries are complex or the dataset is large, DMA can consume a considerable amount of compute power. In this article, we demonstrate how Splunk Edge Processor can be leveraged to extract indexed CIM fields, enabling you to optimize your DMA usage process. This approach can lead to more efficient usage of SVCs.
In the following section we explore how, given two different test settings (with and without Edge Processor), you can significantly optimize the usage of SVCs.
The following tests do not include performance comparisons for searches currently dependent on DMAs, nor the level of effort required to transform existing DMA-based searches into non-DMA searches. The primary focus of this analysis is on providing proof-of-concept for SVC optimization results by leveraging Edge Processor, rather than on re-engineering existing processes. Actual implementation of such suggestions could involve significant changes to established workflows (for example, consideration of impact to downstream detections in the case of Splunk Enterprise Security), which fall outside the scope of this study.
Test settings
In order to measure the amount of potential SVC savings that can be gained by eliminating the DMA process we created the following setups:
- Setup 1 (with DMA, without Edge Processor): Data is sent directly to an indexer cluster with the Splunk Add-on for Microsoft Windows installed on the search head. DMA is scheduled to run every 5 minutes, accelerating the 19 default models.
- Setup 2 (without DMA, with Edge Processor): Data is sent to two Edge Processor instances where CIM fields are extracted. Both the raw event and the extracted fields are then forwarded to the same indexer cluster as in Step 1. Because the CIM fields have already been extracted, the DMA process becomes redundant and is therefore disabled for this experiment.
We used 34 Windows events with the source type WinEventLog:Security
.
Results
Setup | DMA count | Throughput (Events/sec) | Indexer throughput (Events/sec) | Total SVC usage | SVC consumers |
Edge Processor CIM field extraction | N/A | 20k | 20k | 11.18 | ingest 10.07 search 1.04 shared 0.07 |
Accelerated data models | 19 | 20k | 20k | 19.42 | ingest 10.00 search 8.04 shared 1.38 |
Conclusion
In our test, Edge Processor achieved a 42% reduction in SVCs compared to the DMA process. Much of these savings were attributed to the search process, which can be optimized when using Edge Processor to extract CIM fields. However, it's important to note that not all CIM fields can be easily extracted at ingest time, as some require lookup-based or calculated values. While Edge Processor offers significant benefits, replicating certain logic within the pipeline may require additional effort from the customer.
It's also important to note that only the Windows TA was installed on the system when DMA was run in our experimental setup. In a typical Splunk deployment, dozens or even hundreds of TAs are often installed, making the search queries run by DMA significantly more complex and resource-intensive in terms of processing power and SVCs. However, if your existing TAs or content are based on DMA searches, transitioning to Edge Processor will require modifications to your current setup. That said, the potential benefits of using Edge Processor in such scenarios could be considerable, provided the necessary adjustments are made.
Running Edge Processor nodes does come with associated costs, such as the expense of cloud services or physical instances where the EP nodes operate. These costs will vary depending on the company, as will the value of the SVCs saved. It’s crucial to consider these factors when determining the most suitable architecture for your needs.
Finally, your results might vary from one use case to the other. The test described above is not meant to be comprehensive or account for all the variables associated in a real-world Splunk deployment. We recommend conducting a similar test in your own environment, with your data and add-ons, to determine how Edge Processor might help optimizing resource utilization as per your unique setup and specific use case.
Next steps
To get access to Splunk Edge Processor, email edgeprocessor@splunk.com or reach out to your account team.
In addition, these Splunk resources will help you better understand SVCs and Edge Processor.
- Blog: Workload pricing and SVCs: What you can see and control
- Blog: What is Splunk Virtual Compute (SVC)?
- Blog: Cloud Monitoring Console’s Health Dashboard
- Getting Started: Getting Started with Splunk Edge Processor
- Brochure: Splunk pricing options
- Video: Splunk workload management
- .Conf Talk: Busting the curve: Specific techniques to decouple Splunk cost from your exploding machine data volumes
- .Conf Talk: So now you have workload pricing, how does it make cents?