Skip to main content
 
Splunk Lantern

Configuring and deploying Splunk Edge Processor

 

Splunk Edge Processor is a data processing solution to help optimize costs around data access, storage and transfer. It also significantly improves your time to value given that you are working with smaller, enriched and more contextual data sets at search time. It works at the edge of your network. You can use it to filter, mask, and transform your data close to its source before routing the processed data to supported destinations. Learn more about how the Splunk Edge Processor solution works.

This article will walk you through how to configure and deploy Splunk Edge Processor so you can create a pipeline to transform and manage data. It will also introduce some of the core features of this powerful tool and how it can help you manage your data more efficiently.

Note: You are currently at Phase 2 in the Splunk Edge Processor getting started guide. Navigate to Phase 1 for an overview of getting started with Edge Processor.

 

Prerequisites 

  1. Verify that you have access to the Splunk Edge Processor solution (learn more). 
  2. Connect your tenant to a Splunk Cloud Platform deployment. If you are the first Edge Processor user on your tenant, you need to complete the first-time setup instructions to connect your tenant. This connection will provide indexes for storing the logs and metrics passing through the processors. 

Splunk Edge Processor is included with your Splunk Cloud Platform, available at no additional cost. Learn more about the requirements to use Edge Processor and how to request access if you do not already have it. 

 

Navigating the homepage tabs

Splunk Edge Processor - navigating the homepage tabs.png

  • Data Management. Get started with Edge Processor and stay up-to-date on what’s new, view release notes, monitor your system logs, get documentation, or provide your feedback.
  • Sourcetypes. Provides custom configuration of event breaking for inbound data.
  • Edge Processors. Add new Edge Processor nodes and view currently deployed nodes.
  • Pipelines. Use SPL2 to construct filtering, masking and routing logic for your inbound data.
  • Destination. The current routing destinations that are available. You should see the Splunk Cloud Platform account that you specified. You can configure additional Splunk or AWS S3 destinations from there.

Configure and deploy Splunk Edge Processor

Login to Splunk Cloud Platform and navigate to the Splunk Data Management console to start using Edge Processor. You only need to copy and paste a command line into your Linux machine to install your first Edge Processor node.

Watch the video or follow the steps below to create a basic Splunk Edge Processor and pipeline setup that receives data from a forwarder, processes the data as needed, and then sends the processed data to an index or an Amazon S3 bucket for low-cost storage. 

  1. Access Splunk Edge Processor in one of the following ways:
    1. Directly navigate to the Edge Processor using the following link: https://px.scs.splunk.com/<your Splunk cloud tenant name>/data-management/
    2. Use the same username and password as you would when logging in to your Splunk Cloud Platform deployment.
  2. From the Splunk Edge Processor landing page, click the Edge Processor tab on the left. 
    1. You can deploy Edge Processors on the infrastructure of your choice, including an Amazon EC2 instance, or a virtual or physical server in your data center. 
  3. Create a new Edge Processor cluster.
  4. Give it a name and description.
  5. Optional steps: 
    1. Customize the default destination for storing unprocessed data, such as Amazon S3 (see Add or manage destinations for more information).
    2. Configure settings relating to the transport protocol such as S2S or HEC, and enable TLS encryption for additional security. 
  6. Copy and paste the output script on a machine in your network (e.g. Linux server) to install the Edge Processor. After the instance is registered, you can monitor its status from the data management console. 

Create a pipeline 

Pipelines are SPL2 statements that specify what data to process, how to process it, and what destination to send the processed data to. It is how you dictate your desired filtering, masking, and routing logic. Pipelines allow you to optimize data storage and transfer costs while also getting a more contextual dataset for search. Create a new pipeline to specify how you want to process and route your data using SPL2 (see pipeline syntax and SPL2 search manual for more information).

  1. Click the Pipelines tab on the left and select New pipeline. You have two options to build your pipeline:
    1. Use a pre-built template to easily create a new pipeline for Linux, Syslog, and Windows data. Templates are Splunk-built SPL2 for preprocessing and are customizable. Note that new templates will be added over time.

    2. Write your own SPL2 scripts by clicking New pipeline. 
  2. After you've built your pipeline, you can use the preview feature to test it and see the data before and after your logic is applied. 
  3. Apply the pipeline to an Edge Processor cluster where you can monitor the volume of data coming in and out. You can also track the performance of your Edge Processor instances to decide when to scale up or down. 

Example: Create a pipeline to transform and manage data

This video walks you through how to create a pipeline to filter, enrich, and route data. In this demo, a pipeline is created to only retain events where the event code matches a certain value, and enrich those events with additional fields. You can then route the data to a Splunk index or an Amazon S3 bucket. See implementing use cases in Edge Processor for more use case ideas.

Detailed cluster view 

The detailed Splunk Edge Processor cluster gives you a panoramic view of everything the cluster is servicing, including:

  • Information relating to incoming sources and pipelines applied
  • The volume of incoming and outgoing data
  • Metrics around the performance of your edge processor instances to help you make decisions on scaling up or scaling down
  • Visual graphs to help you understand the data that is being processed and sent downstream