Using file system destinations with file system as a buffer
Splunk Enterprise 9.3 introduces new capabilities with ingest actions to write a copy of events ingested to file system or directly to file system without local indexing.
This new feature has a number of benefits, including:
- Acting as a buffer when the destination is unreachable, preventing disruptions to local indexing.
- Supporting remote NFS shares, Azure blob storage, and other file system mounts.
- Enabling file copying via data diodes from high side to low side.
- Supporting tactical IT platforms with intermittent or no connectivity, allowing local indexing and later transmission once online.
- Managing data transmission outside of peak hours.
Data remains unchanged from its original raw payload, ensuring compatibility with Splunk add-ons and apps on both sending and receiving sides.
Introducing Splunk Enterprise 9.3 file system destinations
Using the features included in Splunk Enterprise 9.3, events selected on a per source type basis can be output in newline delimited JSON (NDJSON) format to the file system, including all indexed fields.
By adding a special configuration on the receiving side, the JSON syntax can be removed and events restored to their original format. The NDJSON format is used as a way of retaining all of the indexed fields.
The following diagram shows the configurations enabled by this feature.
Because the events are written to the file system, a remote file mount or an external process, such as a shell or python script, can now also be used to write events to Azure Storage Accounts, Azure Event Hub, or other storage locations not directly supported by Splunk Enterprise.
Sending side configuration
There are two options for configuration of the sending side: using the GUI or using configuration files.
Option 1: Using the GUI
- Configure a destination in Settings > Ingest actions.
- Configure a Ruleset to clone the data.
Note that Data Preview does not work for the default source type.
- Add the rule Route to Destination.
- Select the previously created destination. You can also optionally route to Default Destination for local indexing.
- Review the configuration.
Option 2: Using configuration files
These configurations must be added to an indexer or heavy forwarder.
outputs.conf
[rfs:filesystem] path = file:///dumps/splunk9.3-filesystemout/ description = Test of file system out functionality in 9.3 partitionBy = day, sourcetype compression = zstd format = ndjson format.ndjson.index_time_fields = true
transforms.conf
[_rule:ruleset_Wildcard:route:eval:revxh5fb] INGEST_EVAL = 'pd:_destinationKey' = if((true()),"rfs:filesystem",'pd_destinationKey') STOP_PROCESSING_IF = NOT isnull('pd:_destinationKey') AND 'pd:_destinationKey' != "" AND (isnull('pd:_doRouteClone') OR 'pd:_doRouteClone' == "")[default] # Can be changed to specific sourcetypes or use default for all data RULESET-ruleset_wildcard = _rule:ruleset_Wildcard:route:eval:revxh5fb
Receiving side configuration
Add the following configuration to a heavy forwarder or indexer to decompress zstd files and remove the JSON formatting from events.
transforms.conf
[rfs_ndjson_rewrite] INGEST_EVAL = host:=json_extract(_raw,"host"), sourcetype:=json_extract(_raw,"sourcetype"), source:=json_extract(_raw,"source"), _time:=json_extract(_raw,"time"), indexed_fields:=json_extract(_raw,"fields"), _raw:=json_extract(_raw,"event") [rfs_ndjson_write_indexed_fields] SOURCE_KEY = field:indexed_fields REGEX = (?:\"|\')([^"]*)(?:\"|\')(?=:)(?:\:\s*)(?:\")?(true|false|[-0-9]+[\.]*[\d]*(?=,)|[0-9a-zA-Z_\(\)\@\:\,\/\!\+\-\.\$\ \\\']*)(?:\")? FORMAT = $1::$2 REPEAT_MATCH = true WRITE_META = true [rfs_ndjson_null_indexed_fields] INGEST_EVAL = indexed_fields:=null()
props.conf
[source::....zstd?(.\d+)?] unarchive_cmd = zstd --stdout -d sourcetype = preprocess-zstd NO_BINARY_CHECK = true [preprocess-zstd] invalid_cause = archive is_valid = False LEARN_MODEL = false # NDJSON from Ingest Actions [rfs_input] KV_MODE = json TRUNCATE = 0 TRANSFORMS-rfs_ndjson_rewrite = rfs_ndjson_rewrite, rfs_ndjson_write_indexed_fields, rfs_ndjson_null_indexed_fields LINE_BREAKER = ([\r\n]+) disabled = false pulldown_type = true
Using your new configurations
On your receiving side, set up a monitor://
input to monitor the generated zst files and index them. Set the source type to rfs_input
. This will be overridden based on the source type set in the NDJSON events.
It's recommended to write a script run by cron to look for old files and delete them to prevent files from building up on the file system for longer than your desired retention time.
If the flow of data from the sending side to the receiving side must be restricted to specific times of day, a time based iptables rule can be used to block the forwarding port outside of these hours.
If you want to index data compressed with zstd, please make sure the zstd
command is installed on your system. If zstd
is unavailable, change the compression set in the file system destination to gzip or none.
Next steps
These additional resources might help you understand ingest actions and implement data reduction strategies:
- Blog: Ingest actions: Improved usability and S3 output optimizations
- Product Tip: Sampling data with ingest actions for data reduction
- Product Tip: Using ingest actions with source types that are renamed with props and transforms
- Medium.com: Throughput of Splunk ingest actions with regular expressions: best practices