Getting started with Splunk Data Management Pipeline Builders
Splunk’s Data Management Pipeline Builders are the latest innovation in data processing. They offer more efficient, flexible data transformation – helping you reduce noise, optimize costs, and gain visibility and control over your data in motion.
Splunk Data Management offers two pipeline builders with a choice of deployment model:
- Edge Processor is a customer-hosted offering for greater control over data before it leaves your network boundaries. You can use it to filter, mask, and transform your data close to its source before routing the processed data to the environment of your choice.
- Ingest Processor is a Splunk-hosted SaaS offering ideal for customers who are all-in on cloud and prefer that Splunk manage the infrastructure for them. In addition to filtering, masking and transforming data, it enables a new capability - converting logs to metrics.
Both Edge Processor and Ingest Processor allow you to easily configure SPL2 based pipelines to filter, mask, transform and route data to destinations. They support most SPL2 based commands for pre-ingest data processing (e.g., regex, eval, etc). Learn more about SPL2 profiles and view a command compatibility matrix by product for SPL2 commands and eval functions.
Data Management Pipeline Builders allow you to:
- Filter: Easily filter low-value or noisy data, such as DEBUG logs, heartbeat messages or repetitive health check messages, and focus on data that matters the most.
- Mask PII: Ensure organizational data compliance and data privacy by easily masking or encrypting Personally Identifiable Information (PII) data.
- Enrich and extract: Enrich events with contextual data before sending to Splunk for high value search, monitoring and analysis for various ITOps and SecOps teams.
- Route: Route different “slices” of data to the Splunk platform, Splunk Observability Cloud* and Amazon S3 for low-cost storage and have granular control over your data placement.
- Logs-to-Metrics*: Transform your logs into real-time metrics for faster MTTD and MTTR.
*Currently applies to Ingest Processor only.
Benefits and value of Data Management Pipeline Builders
- Reduce data noise and costs
- Gain increased visibility into streaming data
- More efficient, flexible data transformation
- Accelerate MTTD with real-time metrics
- Centralized control through cloud control plane
- Leverage SPL2 for advanced data processing
- A guided pipeline builder to simplify data routing
- Computes at a much faster rate, with fewer compute resources required compared to ingest actions or heavyweight forwarders
- Reduce search time
Edge Processor is included with your Splunk Cloud Platform subscription at no additional cost, as is the Ingest Processor “Essentials” tier. Learn more about the requirements to use Edge Processor or Ingest Processor and how to request access if you do not already have it.
How Splunk Edge Processor works
Splunk Edge Processor combines Splunk-managed cloud services, on-premises data processing software, and SPL2 to support data processing at the edge of your network. It allows you to ingest data into the Splunk platform, Amazon S3 or other systems. This service offering is delivered through the cloud control plane, with an edge processor node installed and managed in the customer infrastructure for data processing (i.e. data plane). Learn more about the Edge Processor system architecture.
Using simple-to-deploy nodes, Splunk Edge Processor allows you to filter, route and process data generated by Splunk Forwarders and other sources before it is ingested into Splunk Enterprise or Splunk Cloud Platform. You define where you want to deploy the edge processor nodes, as well as the Edge Processor node name, description, and tags.
When Splunk Edge Processor nodes are deployed, you control the destination to where your Edge Processors and pipelines send data. You can also configure a “default destination” per Edge Processor node to route unprocessed data. If you don't specify a default destination, Edge Processors will drop unprocessed data by default.
The statuses of the capabilities and limitations of Edge Processor (as of Splunk Cloud Platform version 9.2.2403) are:
- Supported actions: Filtering, transforming, masking, routing (stateless, lightweight operations), lookups, cryptographic functions, and stats functions
- In the roadmap: Dedup, logs to metrics, metrics processing, and summarizing
- Not supported: Data decryption
How Splunk Ingest Processor works
Splunk Ingest Processor combines Splunk-hosted cloud services and SPL2 to support processing of data that has been ingested into your Splunk Cloud Platform deployment. Ingest Processor is a cloud service offering that provides a centralized console for managing Ingest Processor pipelines.
By using Ingest Processor, you can process, manage and monitor your data ingest ecosystem from a Splunk-hosted cloud service. This requires no infrastructure setup, making it easy to get started. You can also collect, pre-process, and route metrics to the Splunk Observability Cloud for infrastructure and application monitoring.
When to use which data processing capability
|
Edge Processor | Ingest Processor | Ingest Actions |
---|---|---|---|
Capabilities |
Filter, mask, and route data before indexing |
||
Processing method | SPL2-based pipelines | UI over props and transforms | |
Availability | Splunk Cloud Platform (AWS) | Splunk Cloud Platform (AWS/GCP) & Splunk Enterprise | |
Deployment model | Process data on customer-hosted edge using SPL2 processing engine | Process data using Splunk-managed SPL2 processing engine | Process data on HWF or Indexer using rulesets |
Supported sources (ingest data from) | S2S, HEC, RawHEC, and Syslog | Any Splunk Cloud Platform (Victoria) input | Any Splunk supported data input |
Data Preview |
|
|
|
Supported destinations (route to) |
|
|
|
Cost | No additional costs with Splunk Cloud Platform |
|
No additional costs with Splunk Enterprise or Splunk Cloud Platform |
How to get started
Log in to Splunk Cloud Platform and navigate to Splunk Data Management console to start using Edge Processor or Ingest Processor today. You can access Data Management in the following ways:
- If using Splunk Web UI, from the homepage, click Settings > Add data > Data Management Experience.
- You can also directly navigate to the Data Management using the following link: https://px.scs.splunk.com/<your Splunk cloud tenant name>
If you are the first user on your Edge Processor or Ingest Processor tenant, you need to complete the first-time setup instructions to allow your tenant to access Splunk Cloud Platform indexes for storing the logs and metrics passing through the processors.
- First-time setup instructions for Edge Processor.
- First-time setup instructions for Ingest Processor.
The first-time setup instructions are very similar for both pipeline builders; however, since Splunk Edge Processor is customer-hosted, you will be required to setup the edge node in your environment, which Splunk has simplified to running one command on a Linux machine.
Next steps
Review the additional resources below, then click the Next step button below to learn to configure and deploy your Splunk Data Management Pipeline Builders with step-by-step guidance.
- Join the #edge-processor Slack channel for direct support (request access: http://splk.it/slack)
- Website: Data Management resource hub
- Blog: Introducing Edge Processor: Next Gen Data Transformation
- Tech Talk: Introducing Edge Processor
- .conf23: Getting Data in More Efficiently Using the Splunk® Edge Processor (session slides)
- Blog: Data Preparation Made Easy: SPL2 for Edge Processor
- Blog: Addition of Syslog in Splunk Edge Processor Supercharges Security Operations with Palo Alto Firewall Log Reduction
- Blog: Splunk Edge Processor Enhancements Offer Greater Data Access and Improve Data Management
- Stay up-to-date with release notes for Edge Processor and Ingest processor