Sampling data with ingest actions for data reduction
As a Splunk admin, there are many reasons you might not want to index all the data sent to your Splunk instance. Common reasons are either as a cost-saving measure (via storage or ingest license costs) or reducing noise of certain log sources (to make search time exploration a bit easier). With ingest actions (Splunk Enterprise or Splunk Cloud Platform) it is possible to set up sampling with a UI that enables both the creation of the sampling logic and the deployment of these changes so they can take immediate effect at whatever tier you want to sample. You want to learn how to implement a few different sampling strategies available with ingest actions.
How to use Splunk software for this use case
A direct sampling strategy
In the simplest case, you might want to index 10% of your events. This would reduce ingest volume by 90%, which could be quite a large cost saving.
With the Filter using Eval Expression rule, you can do a 10% sample of data with this eval expression:
(random() % 10) > 0
There are two things happening in this expression:
random() % 10)← generate a random number between 0 and 9.> 0← if the number is 0, then returnfalse, otherwise returntrue.
The Drops Events Matching Eval Expression rule is why we need to invert the matching check here to get a 10% filter. We want to drop 90% of the data, so the data we want to drop is when the number is NOT 0. If we had done = 0 instead, we’d be keeping 90% of the data.
What this ruleset does:
- Calculate a 10% filter and drop 90% of the events - the
Filter using Eval Expressionrule. - Index the remaining 10% that passed through the filter - the
Final Destinationrule.
Expanded sampling strategies
The above example is great in its simplicity; however, it is very rarely sufficient as a general sampling strategy. You could have compliance requirements for all data to be stored, you could have keywords that you always want indexed, or you might need to meet both of these requirements and others at the same time. With ingest actions, you can meet these requirements by adding additional rules.
Store all data in s3, index a sample
As an example of how to implement the requirement to store all data, but still save on indexing storage costs, we could write all data to s3 but only index a sample. This can be accomplished by adding the Route to Destination rule to our sampling strategy from before.
- Add a
Route to Destinationrule to our rulesets before the filter. - Set the condition to
Nonebecause you want to send everything to s3. - Set the
Immediately send tooption to a bucket you want to receive these events under theS3heading. - Toggle the
Clone events and apply more rulesoption.
That process creates a rule that sends all data to your configured S3bucket but also keeps the data to get processed by other rules. The prior Filter using Eval Expression rule is unchanged. It filters out 90% of the data and is exactly what you want for this example.
What this ruleset does:
- Sends all data to the configured
S3bucket - theRoute to Destinationrule. - Calculates a 10% filter and drops 90% of the events - the
Filter using Eval Expressionrule. - Indexes the remaining 10% that passed through the filter - the
Final Destinationrule.
Always index certain kinds of data, sample the rest
Sampling all data is great in its simplicity, but it is admittedly a blunt method. Realistically, there are always certain kinds of data you cannot ever leave to chance to be indexed and subsequently detected in monitoring. For example, let's set up the ingest actions ruleset to always index events with the keyword error in any sort of case pattern, for example, Error, ERROR, eRroR.
- Start with a
Route to Destinationrule with aRegexcondition of:(?i)error. - Set the
Immediately send toset toDefault Destination.This should already be pre-filled, but is under theSPLUNKheading in the drop-down menu for this field. - Leave the
clone eventsnot toggled. - Keep the original
Filter using Eval Expressionsample rule in place.
What this ruleset does:
- Checks if an event matches on the case-insensitive regex of
(?i)error. If it does, then indexes the event - theRoute to Destinationrule. - For the events that don’t have some variant of
error, calculates a 10% filter and drops 90% of those events - theFilter using Eval Expressionrule. - Indexes the remaining 10% that passed through the above filter - the
FinalDestinationrule.
Store all data in s3, index all of certain kinds of data, sample the rest
Combining all of the above, we could use the below set of rules to make a ruleset that stores all data in s3, always indexes certain kinds of data, and then indexes a sample of the remaining data.
Route to Destinationrule with the following configuration:- Set the condition to
None. - Set the
Immediately send tooption to a bucket you want to receive these events under theS3heading. - Toggle the
Clone events and apply more rulesoption.
- Set the condition to
Route to Destinationrule withRegextoggled and this regular expression:(?i)error
Filter using Eval Expressionrule withDrop Events Matching Eval Expressionset to:(random() % 10) > 0
Final Destinationrule that is inherent to all IA rulesets. There is no action for you to take for this.
Next steps
The content in this guide is just one of the thousands of Splunk resources available to help users succeed. These additional resources might help you understand ingest actions and implement data reduction strategies:
- Splunk Lantern Article: Reducing low-value data ingestion to improve license usage
- Tech Talk: Introducing ingest actions: Filter, mask, route, repeat
- Splunk Blog: Ingest actions: Data access when, where and how you need it

