Accelerating an implementation of Splunk Edge Processor
Splunk Edge Processor (EP) is a solution Splunk provides to all Splunk Cloud Platform customers, to permit easier processing of edge-of-environment forwarding to other entities, such as Splunk Cloud-managed indexers, Bring-Your-Own-License (BYOL) indexers (whether in a cloud environment or On-Premise), or AWS S3 buckets.
This Splunk Edge Processor accelerator offering is a way to get an EP integrated into your environment rapidly with defined outcomes that are repeatable and will give you a working EP receiving data from your chosen forwarders and outputting to various destinations. In addition, this process should allow continued development and implementation of use cases.
This Splunk Edge Processor (EP) Accelerator is available as an engagement with Splunk Professional Services. If you do not feel comfortable completing this accelerator on your own, or would like hands-on training with any of the concepts and processes included in this accelerator, contact our Professional Services experts.
Before beginning, obtain buy-in and participation agreement from stakeholders, including Splunk admins, SMEs for proposed data sources, and network, security, and infrastructure administrators. Then, you and your team will work through the following stages:
- Familiarize yourself with knowledge prerequisites
- Identify compute resources
- Identify network resources
- Establish Splunk Cloud Platform tenant and role-based access control
- Create pipeline elements
- Create pipelines
- Enable pipelines
- Next steps
Familiarize yourself with knowledge prerequisites
Before proceeding with this accelerator, you should have some familiarity with the following:
- SPL2
- Getting data in (for Splunk Enterprise or Splunk Cloud Platform)
- Understanding of transport layer security certificate requirements and provision of certificates
- Understanding and definition of sources (including all forms, UF, HF, HEC, and other), filtering, and destination routing (including using S3 as a destination) as described in the Create pipeline elements below
- Understanding of RE2 filter regex syntax
Identify compute resources
The components of an EP implementation include:
- the Splunk Cloud Platform which the EP tenant uses for Single Sign-On (SSO) authentication and authorization
- the EP enant, which is the control interface to install, manage, and configure the EP software installed on your node
- the compute platform for the EP node itself, either virtual or physical
- the various network resources that provide access to the components, like switches and firewalls
In this first stage, you will review your infrastructure and preparedness to implement Splunk Edge Processor. Use the Installation requirements for Edge Processors documentation as guidance. Be sure to do the following:
- Identify core Splunk environment (Splunk Cloud tenant in the correct zones or on-premise) components and validate versions.
- Identify virtual machine or hardware system and availability for Splunk Edge Processor install.
- Validate Splunk Cloud Platform control plane availability as well as Splunk Cloud Platform tenant or Splunk Enterprise installation.
Nodes
The EP node can be either a physical or virtual host. It must also run one of the supported operating systems listed in Operating system support. Failure to provide resources that meet the requirements will result in unpredictable results, and unsupported EP instances. These requirements are essential to ensuring the EP is able to process the data as efficiently as possible. Configuring and setting up the EP is described in Set up an Edge Processor.
Currently, performance scaling appears to be linear in that even the base system requirements are capable of handling significant workload, and, for the initial install of the EP software in your environment, likely able to handle whatever is required. As additional load builds on the node, if the node is a Virtual Machine (VM), there is certainly the capability to easily add machine resources.
Configuring the system to enable moving the IP address off the VM and onto a load balancer ensures an easier transition as workload builds. In other words, stack a secondary address and use that as the location to send the data from the sources. Additionally, to help facilitate rapid deployment of EP, use a CNAME or alias in DNS connected to the secondary IP address. For more information see Configuring IP Networking with ip Commands (RHEL) or Multiple IP addresses on one Interface (Debian).
Identify network resources
When configuring the EP, it is crucial to allocate the necessary network resources to ensure data flows smoothly and efficiently. Proper planning minimizes production issues and provides a clear path for predictable growth.
To future-proof your network architecture, configure the "send-to" address from your data sources as a secondary address on the node's network interface. This approach simplifies scaling and enhances resilience, as moving the secondary address to a load balancer and using the secondary is easier than re-addressing an actual node.
To ensure you are accessing up-to-date and accurate information for configuring your network and firewall infrastructure, refer to Network requirements.
- IP/names of indexers. Create a list of the names of the indexers that will constitute the destination definition, including the DNS-resolvable name of the indexer or S3 bucket and the IP address of the indexer or S3 bucket.
- Certificates. Either extract the certificates of the destinations from the Splunk-provided Splunk Cloud Universal Forwarder app (splunkclouduf.spl) or use them wherever you might have them stored. You should make note of the name of the certificate file for the indexer and the location or actual contents of the certificate.
- S3 destinations. S3 destinations require somewhat different handling, and therefore, have some different configuration requirements. There are two different forms of authentication. In the event that all S3 bucket storage locations are in Amazon Web Services, the recommendation is to use an IAM role. Otherwise, in a “mixed” destination, configure the AWS Access key ID and AWS secret access key. The information you will need includes: destination name, keyname (including bucket name, folder name, and file prefix), region, authentication, security, and output data format. For more information, refer to Send data from Edge Processors to Amazon S3.
- Network routes. How will the data route from the sources to the EP and then from the EP out to whichever destination be defined? Discuss this with your stakeholders carefully to ensure that the inbound-to-EP route is understood and crosses boundaries at specific and well-understood points. Understand the outbound-to-destination pathway thoroughly to ensure that there are no constraints, including any performance issues that might need to be addressed. If an outbound route is already saturated, this can be an issue that must be addressed prior to enabling the EP and sending data. Access from the EP Cloud tenant to the EP node via the network is essential, and understanding how that traffic will route is also essential. Make note of the source subnet/device, intermediate devices, and destination subnet/device. To reiterate, this will be an inbound-from-the internet (Splunk Cloud Platform) connection to an internal device. The networking and security teams must be involved to ensure that this is fully understood. If you are already a Splunk Cloud Platform customer, this will, at least partly, have occurred, but most of the traffic discussed will have been outbound to Splunk Cloud Platform indexers.
- Network access control lists (ACLs). If there are route ACLs that need to be modified or added to permit inbound-to-EP pathing or outbound-to-destination pathing, the modifications must be identified and documented during the design process to ensure rapid and successful deployment of the first EP node. Ensuring that any ACLs permit traffic from the EP Cloud tenant to the EP node must be clearly defined. Make note of the following: Type (SSH, HTTPS), Protocol (should be TCP), Source (Inbound only), Port (Range), Destination (Outbound only), and Allow/Deny. For further information, refer to Network requirements.
- Firewall rules - boundary traversal. There will likely be firewalls traversed for both inbound-to-EP and outbound-to-destinations, whether those are internal or external. Traversal of “management” traffic inbound from the EP Cloud tenant to the EP node must be permitted, as well as traffic from the sources to EP and EP to destinations. Documenting the required rules will help ensure that any changes to your firewalls are is implemented without delay. Make note of the following: Type (SSH, HTTPS), Protocol (should be TCP), Source (Inbound only), Port (Range), Destination (Outbound only), and Allow/Deny. For more information, refer to Firewall settings.
Establish Splunk Cloud Platform tenant and role-based access control
The Splunk EP tenant uses Splunk Cloud Platform as the single sign-on (SSO) user and role-based access control (RBAC) definition. All users with access to the Splunk Cloud Platform will have access to the Splunk EP tenant. However, only groups assigned to roles with either admin_all_objects
or edit_edge_processor
will have the ability to administer the EP. The Splunk standard of least-privilege-access model is the recommended solution, and there will certainly be situations where you have employees who will not require admin_all_objects.
Therefore the capabilities should be carefully considered prior to wholesale granting of the privilege. The Splunk Cloud Platform standard admin for customers sc_admin
should have admin_all_objects
and any Splunk Administrators would be expected to have the sc_admin
and therefore the ability to administer the EP.
The most scalable solution is to create a group in the Active Directory (AD)/Lightweight Directory Access Protocol (LDAP) or third-party SSO provider that is then mapped, in the Splunk Cloud Platform, to a role with the appropriate capabilities. AD or LDAP users can be assigned to the roles directly, but this is not a recommended solution. For more information, see Create and manage roles with Splunk Web.
- Roles. Identify the roles that will have
admin_all_objects
oredit_edge_processor
capabilities. - AD/LDAP groups. Identify which AD/LDAP groups will be associated with which of the Splunk Cloud Platform roles.
- Users. Identify the users that will have the roles defined in the previous bullets to administer the EP.
Create pipeline elements
The data components of the EP environment include sources, destinations, and filters. Clear and consistent naming conventions for these components are essential for designing an effective EP architecture. To ensure clarity, it's important to carefully plan and discuss naming conventions during the EP deployment process.
Consider any existing naming standards in your environment, and incorporate Domain Name Service (DNS) records, such as CNAMEs or aliases. Doing this makes future modifications or scalability efforts easier.
Sources
Sources are the originators of the data that the EP ingests and then forwards. They can include:
- Generators such as a universal forwarder sending application or OS logs
- A network device routing (potentially) through a syslog installation or directly
- An HTTPS Event Collector (HEC) ingest from a network device like a medical device or any other event generator
Defining which you will use early on helps reduce duplication of effort and ensures that security is integrated into your design, including implementing TLS certificates. If you do not already have a standard in place, the following links can help you decide which to use:
Create sources by following the instructions in Add source types for Edge Processors. Make sure to do the following:
- Ensure that data sources ingested will be in text format and accessible to the Splunk platform.
- Configure the appropriate access levels for data sources that require vendor-specific output configuration.
- Identify add-ons that are designed for specific versions of data sources and that might not be compatible when source technology changes are introduced which affect the data source.
The definition and configuration of sources can be found in Get data from a forwarder into an Edge Processor. The following table provides some examples of how to define the sources, and ensure all of the information necessary for pipeline creation has been captured.
Data source | Index name | Details | Priority | Consumer | Collection method |
---|---|---|---|---|---|
Windows system logs | winevent | Local log data forwarded through EP | 1 | SOC/NOC | UF > EP > Indexer |
PAN | network-firewall | Syslog data via SC4S | 2 | SOC | PAN > Panorama > UF/SC4S > EP > Indexer |
In addition, you should capture the following information:
- S2S. Certificate names or class and the location/certificate
- HEC. Location/token and HEC endpoint
Destinations
Destinations are locations that events will be forwarded to. An Edge Processor forwards to a:
- Splunk HTTP Event Collector
- Splunk platform Splunk-to-Splunk (S2S) destination
HEC indexer acknowledgment behavior is different from S2S as the HEC receiver automatically responds with an HTTP status of 200 when the event packet arrives, but that does not indicate that the events have been written to an index. For more information, refer to Sending data from Edge Processors to Splunk Cloud Platform or Splunk Enterprise.
Indexer acknowledgment capability differs between Splunk Cloud Platform and Splunk Enterprise.
- Splunk Cloud Platform does not support events with indexer acknowledgment enabled, whether HEC or S2S.
- On-premises Splunk Enterprise supports indexer acknowledgment to ensure that events are transmitted successfully to indexers called “useAck”, which is set in the outputs.conf file on a Universal Forwarder. However, Edge Processor does not support using indexer acknowledgment from sources. Therefore, do not enable it either in the HEC configuration files or the UFs sending to the EP.
For more information, refer to About HTTP Event Collector Indexer Acknowledgment.
Additionally, a destination can be an AWS S3 bucket.
Configure destinations by following the instructions in Add or manage destinations, along with a default destination/null queue if required. Defining which destinations you will use early on helps reduce duplication of effort ensures that security is integrated into your design, including implementing TLS certificates.
Filters
Filters can take many forms, including:
- adding a field with new data
- transforming data in a field (similar to ensuring CIM compliance)
- dropping data per your desired criteria
- filtering data to push certain forms (source type, source, host, or information within the event) to different destinations
- obfuscating or masking data or other transformations as required
Configure filters and field destinations by following the instructions in Filter and mask data using an Edge Processor.
These preliminary filters are not necessarily created in SPL2 but are defined enough to permit the creation of the SPL2 as required. The filters should have as many of the new fields to be created, and how those fields are to be filled, as possible. Additionally, any transformations, filters, and other actions in the filter definition should be defined at this stage. For more information, see Edge Processor pipeline syntax.
Create pipelines
Use the information collected in previous stage to define the source, filter, and destination configurations that route, filter, and transform data in the EP. The combination of these configurations is called a “pipeline” and is the unit of control in which the EP operates. Each pipeline will accept data from a source, possibly do something specific with the data, and then forward that data to the defined destination. The pipeline table should be used to also define the testing matrix, to ensure that the data is flowing to the correct destination, and that filters or transformations are operating as expected.
For more information, refer to Create pipelines for Edge Processors. Best practices for building pipelines include the following:
- Always configure a test index and use it to test and debug your pipeline.
- If possible use a test forwarder, or test source, to allow the testing of the default destination and pipeline prior to reconfiguring the live forwarder.
- If possible, temporarily configure data cloning so the live forwarder to also forwards to the EP without changing the existing routes in outputs.conf. If the EP forwarded copy is sent to a test index, comparisons can be made with the existing index to ensure all data is being indexed as expected.
- Create a field in your pipeline called
EP_pipeline_ver
and set it to the name of your pipeline and version. This is to facilitate debugging and allows you to easily compare outputs from different pipeline versions.
Enable pipelines
After the pipelines are defined, they must be applied to each Edge Processor to enable the data processing to begin. When the pipeline is applied to the EP, the EP is ready to process the data flowing through the pipeline and send it to the defined destination. However, as the sources have not been modified to send to the EP, no work occurs on the EP yet.
At this point, the final step is to enable at least the first test data source to route data through the EP. Using whichever method is appropriate for the environment (direct modification, Deployment Server, or third-party orchestrator), modify the selected source’s outputs.conf, or a service generating HEC events, to send data to the EP. Validate the data passing through the pipeline and on to the destination by following the instructions in Verify your Edge Processor and pipeline configurations. Use searches against the appropriate search head to validate those sources, filters, and destinations of the implemented use cases, and ensure within reason that non-EP processed data flows as it did before.
This testing should be done with development or test data prior to any production data cutover, to validate that the EP is working and sending data to the destinations correctly. After this is satisfactorily established, each source can be then modified.
Next steps
Edge Processor development / test nodes
Establishing a test environment, either for Splunk Cloud Platform or Splunk Enterprise, is a best practice. A test or development Edge Processor will permit the development of the pipelines in a way to ensure that there is less chance of a failure occurring to production data. You should route initial test data, even dual-routed production data, through the test EP and validate the designed behavior prior to promoting the configured pipelines to production.
High availability / scaling
EPs can be scaled to provide sufficient computing resources for data processing workloads by adding more EP instances and routing the network traffic through a load balancer. Presenting a secondary IP address on the initial EP node that is not the IP address of the node itself permits the migration of that IP to the load balancer as growth occurs.
The EP configuration currently does not permit the clustering of EPs such as in either indexer clustering or search head clustering. Because there is no concept of a cluster among EPs, there is no communication among EPs, which means that there is no knowledge of current-state for in-flight data in the event of a loss of an EP other than the TCP error checking. If you require high availability and need to be sure of EP availability, separate monitoring must be built into the environment.
For more information, refer to Add more instances to an Edge Processor.
Additional resources
Splunk Lantern offers additional self-help guidance for using Splunk Edge Processor, including information on running EP in containers, scaling your EP infrastructure, and load balancing traffic. Click here for a comprehensive list.
In addition, Splunk Professional Services can provide hands-on Splunk Edge Processor guidance for you and your team. Click here to learn more about working with Professional Services.