Implementing use cases with Splunk Data Management Pipeline Builders

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Splunk Data Management Pipeline Builders provide you with new abilities to filter and mask, and otherwise transform your data, before routing it to supported destinations.

You are currently at phase 3 in the Splunk Data Management Pipeline Builders getting started guide. Navigate to phase 1 for an overview of getting started with Pipeline Builders or to the phase 2 for step-by-step guidance on configuring and deploying Pipeline Builders.

About SPL2

What gives Splunk Edge Processor and Ingest Processor pipeline builders their data transformation power is Splunk’s next generation data search and preparation language, SPL2. SPL2 provides a powerful, flexible, and intuitive way for Splunk admins and data stewards to interact with data to shape, enrich, filter, transform, and route data – in a manner familiar to Splunk's SPL users, while also introducing optional SQL syntax known to users around the world.

Pipelines allow you to use SPL2 to construct filtering, masking and routing logic for your inbound data, so you can ingest only the data you need – nothing more, nothing less. Pipelines specify what data to process, how to process it, and the destination to which the processed data should be sent. Pipelines allow you to optimize data storage and transfer costs while also getting a more contextual dataset for search. For more information, see pipeline syntax and SPL2 search manual.

The pipeline builders support most SPL2-based commands for pre-ingest data processing (for example, regex, eval, etc). Learn more about SPL2 profiles and view a command compatibility matrix by product for SPL2 commands and eval functions.

Common pipeline builder use cases

The links below walk you through common use cases that Edge Processor and Ingest Processor can address. These can help you reduce ingest volume to optimize costs around data storage and transfer, protect sensitive information, and significantly improve your time to value.

Since Edge Processor and Ingest Processor both leverage SPL2 pipelines, many of the use cases below can be applied across both pipeline builders, unless otherwise stated.

Use case prerequisites

Before you can implement use cases with Edge Processor or Ingest Processor, make sure you have:

Connected your Edge Processor or Ingest Processor tenant to your Splunk deployment via the first-time setup instructions for Edge Processor (for Cloud), Edge Processor (for Enterprise) or Ingest Processor.
Created an Edge Processor or Ingest Processor instance by following the steps under Configure and deploy Data Management Pipeline Builders.

Edge Processor is included with your Splunk Cloud Platform or Splunk Enterprise subscription at no additional cost. The Ingest Processor “Essentials” tier is also included with Splunk Cloud Platform subscriptions. Learn more about the requirements to use Edge Processor or Ingest Processor and how to request access if needed.

Use cases to filter and route data

Blog: Reduce and route logs for cost-effective storage (Edge Processor only). Step-by-step guidance to reduce substantial volumes of ingested logs and route them to Amazon S3 for cost-effective storage.
Use Case: Reduce security firewall logs (PAN and Cisco) with Splunk Edge Processor. Are you swamped by the relentless surge of log data from your Palo Alto Networks (PAN) and Cisco devices? Follow this step-by-step guidance to reduce your firewall logs with Edge Processor. You can also watch the demo video walkthrough or read the blog for more context.
Blog: Filter verbose data sources and transform content for Windows system events. Scroll down this blog to see how to filter verbose data sources, such as Windows event logs, and to retain selected events or content within an event. Then route an unfiltered copy to AWS S3 bucket.
Video: Filter Kubernetes data over HTTP Event Collector (HEC) (Edge Processor only). The video below walks you through how to build a pipeline to filter noisy events from Kubernetes pods using the HTTP Event Collector (HEC). Before building your pipeline, learn how to get data into Edge Processor using HEC to receive Kubernetes data. Then, follow along to quickly and easily start using Edge Processor to monitor and analyze your Kubernetes clusters.

Use cases to transform, mask, and route data

Product Tip: Enrich data via real-time threat detection with KV Store lookups. By creating and applying a pipeline that uses a lookup, you can configure an Edge Processor to add more information to the received data before sending that data to a destination (Splunk Docs). In this case, our objective is to use the event fields present in your ingested data to preemptively identify and flag malicious activity.
Video: Modify raw events to remove fields and reduce storage. Splunk Edge Processor is an effective tool to reduce the size of the payload and only index fields that provide high value. Watch the video below to learn how to remove unwanted fields from a raw event and reconstruct it with a reduced number of fields to optimize storage in the Splunk platform. Similar logic can be used to drop as many fields as desired to reduce your storage footprint and improve performance.

Product Tip: Convert complex data into metrics. This article refers to Edge Processor, but the same process could also be applied to Ingest Processor. This step-by-step guide walks you through how to transform complex bloated data into metrics by pre-processing your data with Edge Processor so you can cut storage costs. For a simplified version of this process, see Converting logs into metrics with Edge Processor for beginners.
Use Case: Route root user events to a special index. This use case provides step-by-step guidance to filter any events relating to the “root” user in your Linux authentication data and send them to an index they’ve created for you called admin.
Use Case: Mask IP addresses from a specific range. There are multiple ways of achieving this IP masking use case with SPL2, depending on how flexible you want your pipeline to be. This article looks at two possible methods 1) using eval replace and 2) using rex and cidrmatch.
Video: Mask sensitive credit card information. Splunk Edge Processor can help protect sensitive information by masking incoming data, allowing your business to comply with data privacy regulations while ensuring the data remains secure. Watch the video below for a demonstration of how masking logic can be applied on credit card information to extract the card number field and replace the value with a string of your choosing, ensuring that the data is secure. By using similar masking logic, organizations can protect any sensitive information, for example personally identifiable information (PII), from unauthorized access before the data is indexed in the Splunk platform.

Additional resources

Join the #edge-processor on the Splunk Community Slack for direct support (login with SSO here: http://splk.it/slack)
Website: Data Management resource hub
Tech Talk: Introducing Edge Processor
.conf23: Getting data in more efficiently Using the Splunk Edge Processor (session slides)
Blog: Data preparation made easy: SPL2 for Edge Processor
Blog: Addition of Syslog in Splunk Edge Processor supercharges security operations with Palo Alto Firewall log reduction
Stay up-to-date with release notes for Edge Processor and Ingest Processor

Previous step