Skip to main content
 
 
Splunk Lantern

Sending masked PII data to the Splunk platform and routing unmasked data to federated search for Amazon S3 (FS-S3)

 

As a Splunk admin, you want to mask sensitive credit card information from point-of-sale (POS) log data to a Splunk Cloud Platform index, and then route the unmasked data to Amazon S3.

This process involves the following steps:

  • Mask PII information using Splunk Edge Processor
  • Ingest data from Splunk Edge Processor to Splunk Cloud Platform
  • Select sourcetype to send events to Splunk Edge Processor
  • Route to Splunk Cloud Platform index
  • Route to Amazon S3 bucket

While this example uses credit card data, the same process can be applied to any PII data. This example is also demonstrated in Splunk Edge Processor, but the same SPL2 can be used to create and apply a pipeline in Ingest Processor.

Solution

Splunk Federated Search for Amazon S3 (FS-S3) allows you to search your data in Amazon S3 buckets directly from Splunk Cloud Platform without ingesting it. Edge Processor (EP) and Ingest Processor (IP) are Splunk features and products that offer the capability to route data to customer-managed Amazon S3 buckets.

In this article, we’ll explore a compliance use case to access and retrieve unfiltered, raw data stored in Amazon S3 for compliance and long-term retention. After sensitive information is redacted and events are enriched, the processed data is sent to Splunk Cloud Platform indexes. Using the Splunk federated search for Amazon S3, admins can search and retrieve the full raw dataset from Amazon S3 for compliance audits or in-depth analysis without storing all data in Splunk Cloud Platform, optimizing both storage and costs. Admins can also perform lookups on historical data, enrich events with tags, and route them to multiple indexes in the Splunk Cloud Platform.

Data sources

  • Amazon
  • Credit card PII data

Prerequisites

You should ensure you are familiar with Amazon S3, AWS Glue, Amazon Athena, Splunk Edge Processor and Ingest Processor, and Splunk federated search for Amazon S3. If any of these topics are not familiar, consider taking a few minutes to review them or make sure the documentation is handy. You can also find additional information about partitioning in the article Partitioning data in S3 for the best FS-S3 experience.

For more information on sending application data (for example, credit card data) to Splunk Edge Processor or Ingest Processor, refer to the documentation to send data from a forwarder or using the HTTP Event Collector (HEC).

Splunk Edge Processor pipelines support Search Processing Language 2 (SPL2). If the SPL2 syntax is new to you, review the SPL2 Search Reference documentation.

Process

Create pipelines in Splunk Edge Processor to mask and route credit card PII data

Pipeline 1: Mask PII data and route to Splunk Cloud Platform

This Splunk Edge Processor pipeline takes the data and masks the credit card PII data up to the last four numbers. After masking, the pipeline then sends the processed data to a Splunk index in Splunk Cloud Platform.

The SPL2 below contains the following parameters:

  • $source - implicit; takes input from the preceding pipe.
  • $field - The field to be masked, by default it is _raw.
  • $show_last - Boolean indicating whether the last set of four or five digits should be shown, with default being false. For example, if true then returns XXXX XXXX XXXX 1111 and if false then XXXX XXXX XXXX XXXX.
  • $format - String indicating the format used for the credit card. Each format uses "X" and a number indicating a number of digits (example: X4 is 4 digits).
  • $start_delimiter - String indicating the delimiter before the beginning of credit card number in event. If not present, pass "".
  • $end_delimiter - String indicating the delimiter after the end of credit card number in event. If not present, pass "".

The allowed credit card formats by card type are:

  • 15-digit card variations for Amex - "X4 X6 X5", "X4-X6-X5"
  • 16-digit card variations for Visa and Mastercard - "X4 X4 X4 X4", "X4-X4-X4-X4"

In summary, this example shows how to use the following command on the raw events:

* $pipeline = 
| from $source 
| mask_ccnumber format="X16" start_delimiter="," end_delimiter="," 
| into $destination;

Pipeline definition (SPL2)

$source

function mask_ccnumber($source, $format: string, $field:string=_raw, $show_last:boolean=false, $start_delimiter:string="", $end_delimiter:string=""): string {

return

| from $source

| eval card_regex = case(

$format="X4 X6 X5" and not($show_last), "[1-5][0-9]{3}\\s[0-9]{6}\\s[0-9]{5}",

$format="X4-X6-X5" and not($show_last), "[1-5][0-9]{3}\\-[0-9]{6}\\-[0-9]{5}",

$format="X4 X4 X4 X4" and not($show_last), "[1-5][0-9]{3}\\s([0-9]{4}\\s){2}[0-9]{4}",

$format="X4-X4-X4-X4" and not($show_last), "[1-5][0-9]{3}\\-([0-9]{4}\\-){2}[0-9]{4}",

$format="X4 X6 X5" and ($show_last), "[1-5][0-9]{3}\\s[0-9]{6}\\s",

$format="X4-X6-X5" and ($show_last), "[1-5][0-9]{3}\\-[0-9]{6}\\-",

$format="X4 X4 X4 X4" and ($show_last), "[1-5][0-9]{3}\\s([0-9]{4}\\s){2}",

$format="X4-X4-X4-X4" and ($show_last), "[1-5][0-9]{3}\\-([0-9]{4}\\-){2}"

)

| eval card_regex=$start_delimiter + card_regex + $end_delimiter

| eval masked_card_str = case(

$format="X4 X6 X5" and not($show_last), "XXXX XXXXXX XXXXX",

$format="X4-X6-X5" and not($show_last), "XXXX-XXXXXX-XXXXX",

$format="X4 X4 X4 X4" and not($show_last), "XXXX XXXX XXXX XXXX",

$format="X4-X4-X4-X4" and not($show_last), "XXXX-XXXX-XXXX-XXXX",

$format="X4 X6 X5" and ($show_last), "XXXX XXXXXX ",

$format="X4-X6-X5" and ($show_last), "XXXX-XXXXXX-",

$format="X4 X4 X4 X4" and ($show_last), "XXXX XXXX XXXX ",

$format="X4-X4-X4-X4" and ($show_last), "XXXX-XXXX-XXXX-"

)

| eval masked_card_str=$start_delimiter + masked_card_str + $end_delimiter

| eval $field=replace($field, card_regex, masked_card_str)

| fields -card_regex,masked_card_str}

$pipeline = | from $source

| mask_ccnumber format="X4 X4 X4 X4" show_last=true

| mask_ccnumber format="X4 X6 X5" show_last=true

| eval index = "index_name"

| into $destination;

sourcetype= pii_records

This is what the final pipeline will look like:

Pipeline 2: Route unmasked copy of all PII data to Amazon S3

This ​​pipeline takes the raw data and routes it directly to Amazon S3 without touching it.

Pipeline definition (SPL2)

$source

$pipeline = | from $source | into $destination;

sourcetype= pii_records

This is what the final pipeline will look like:

After you have constructed your pipeline using the SPL2 above, follow these instructions to save and apply your pipeline.

  1. Test your pipeline rule. In the top right corner of the screen, click the blue Preview button.
  2. Set the Data destination to the appropriate index.
  3. Click Apply to save the destination.
  4. In the top right corner of the screen, click Save pipeline.
  5. Give your pipeline a suitable name, such as "pii_records_s3_<yourName>".
  6. Click Save to save your pipeline.
  7. To try out the new pipeline, click Pipelines on the top left of the page.
  8. Locate the pipeline you just created, click the three dots next to your new pipeline, and select Apply/remove.
  9. Select the Splunk Edge Processor you created earlier and click Save. You will see a brief message stating that your changes are being saved.

Search data in your Splunk index to verify the pipeline

You can now verify that the pipeline has successfully been applied:

  1. Log into the Splunk platform and open the Search app.
  2. Run the following search and verify that you see the events coming from this pipeline:
index=pii_index sourcetype=pii_records credit_card

Run a federated search for Amazon S3 to retrieve raw data stored Amazon S3

When you search for the raw, untouched PII data routed to Amazon S3 for compliance, you can run a federated search for Amazon S3. To run the federated search, you can use thesdselectcommand withfed_search_s3_compliancedirectly from the Search app in Splunk Cloud Platform.

| sdselect * FROM federated:fed_search_s3_compliance
| rex field=event "(?<device_ip>[\d.]{7,20})\s:\s+(%ASA|%FTD)-\d+-(?P<message_number>\d+):(?<message>.*)"
| eval time = strftime(time,"%m-%d-%Y %I:%M:%S %p")
| fields time, sourcetype, event

Next steps

These resources might help you understand and implement this guidance: