Using file system destinations with file system as a buffer

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Splunk Enterprise 9.3 introduces new capabilities with ingest actions to write a copy of events ingested to file system or directly to file system without local indexing.

This new feature has a number of benefits, including:

Acting as a buffer when the destination is unreachable, preventing disruptions to local indexing.
Supporting remote NFS shares, Azure blob storage, and other file system mounts.
Enabling file copying via data diodes from high side to low side.
Supporting tactical IT platforms with intermittent or no connectivity, allowing local indexing and later transmission once online.
Managing data transmission outside of peak hours.

Data remains unchanged from its original raw payload, ensuring compatibility with Splunk add-ons and apps on both sending and receiving sides.

Introducing Splunk Enterprise 9.3 file system destinations

Using the features included in Splunk Enterprise 9.3, events selected on a per source type basis can be output in newline delimited JSON (NDJSON) format to the file system, including all indexed fields.

By adding a special configuration on the receiving side, the JSON syntax can be removed and events restored to their original format. The NDJSON format is used as a way of retaining all of the indexed fields.

The following diagram shows the configurations enabled by this feature.

unnamed - 2024-08-29T120155.221.png

Because the events are written to the file system, a remote file mount or an external process, such as a shell or python script, can now also be used to write events to Azure Storage Accounts, Azure Event Hub, or other storage locations not directly supported by Splunk Enterprise.

Sending side configuration

There are two options for configuration of the sending side: using the GUI or using configuration files.

Option 1: Using the GUI

Configure a destination in Settings > Ingest actions.
Configure a Ruleset to clone the data.

Note that Data Preview does not work for the default source type.
Add the rule Route to Destination.
Select the previously created destination. You can also optionally route to Default Destination for local indexing.
Review the configuration.

Option 2: Using configuration files

These configurations must be added to an indexer or heavy forwarder.

outputs.conf

[rfs:filesystem]
path = file:///dumps/splunk9.3-filesystemout/
description = Test of file system out functionality in 9.3
partitionBy = day, sourcetype
compression = zstd
format = ndjson
format.ndjson.index_time_fields = true

transforms.conf

[_rule:ruleset_Wildcard:route:eval:revxh5fb]
INGEST_EVAL = 'pd:_destinationKey' = if((true()),"rfs:filesystem",'pd_destinationKey')
STOP_PROCESSING_IF = NOT isnull('pd:_destinationKey') AND 'pd:_destinationKey' != "" AND (isnull('pd:_doRouteClone') OR 'pd:_doRouteClone' == "")

props.conf

[default] # Can be changed to specific sourcetypes or use default for all data
RULESET-ruleset_wildcard = _rule:ruleset_Wildcard:route:eval:revxh5fb

Receiving side configuration

Add the following configuration to a heavy forwarder or indexer to decompress zstd files and remove the JSON formatting from events.

transforms.conf

[rfs_ndjson_rewrite]
INGEST_EVAL = host:=json_extract(_raw,"host"), sourcetype:=json_extract(_raw,"sourcetype"), source:=json_extract(_raw,"source"), _time:=json_extract(_raw,"time"), indexed_fields:=json_extract(_raw,"fields"), _raw:=json_extract(_raw,"event")

[rfs_ndjson_write_indexed_fields]
SOURCE_KEY = field:indexed_fields
REGEX = (?:\"|\')([^"]*)(?:\"|\')(?=:)(?:\:\s*)(?:\")?(true|false|[-0-9]+[\.]*[\d]*(?=,)|[0-9a-zA-Z_\(\)\@\:\,\/\!\+\-\.\$\ \\\']*)(?:\")?
FORMAT = $1::$2
REPEAT_MATCH = true
WRITE_META = true

[rfs_ndjson_null_indexed_fields]
INGEST_EVAL = indexed_fields:=null()

props.conf

[source::....zstd?(.\d+)?]
unarchive_cmd = zstd --stdout -d
sourcetype = preprocess-zstd
NO_BINARY_CHECK = true

[preprocess-zstd]
invalid_cause = archive
is_valid = False
LEARN_MODEL = false

# NDJSON from Ingest Actions
[rfs_input]
KV_MODE = json
TRUNCATE = 0
TRANSFORMS-rfs_ndjson_rewrite = rfs_ndjson_rewrite, rfs_ndjson_write_indexed_fields, rfs_ndjson_null_indexed_fields
LINE_BREAKER = ([\r\n]+)
disabled = false
pulldown_type = true

Using your new configurations

On your receiving side, set up a monitor:// input to monitor the generated zst files and index them. Set the source type to rfs_input. This will be overridden based on the source type set in the NDJSON events.

It's recommended to write a script run by cron to look for old files and delete them to prevent files from building up on the file system for longer than your desired retention time.

If the flow of data from the sending side to the receiving side must be restricted to specific times of day, a time based iptables rule can be used to block the forwarding port outside of these hours.

If you want to index data compressed with zstd, please make sure the zstd command is installed on your system. If zstd is unavailable, change the compression set in the file system destination to gzip or none.

Next steps

These additional resources might help you understand ingest actions and implement data reduction strategies:

Blog: Ingest actions: Improved usability and S3 output optimizations
Product Tip: Sampling data with ingest actions for data reduction
Product Tip: Using ingest actions with source types that are renamed with props and transforms
Medium.com: Throughput of Splunk ingest actions with regular expressions: best practices