Ingesting VPC flow logs into Edge Processor via Amazon Data Firehose

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Splunk Data Management now has added support for Amazon Data Firehose in Splunk Edge Processor. This enhancement enables you to use Amazon Data Firehose (formerly Amazon Kinesis Data Firehose) as a data source, offering greater flexibility and efficiency in managing data streams. With integration across over 20 AWS services, you now can easily stream data into Splunk Cloud Platform from sources like Amazon CloudWatch, SNS, AWS WAF, Network Firewall, IoT, and more.

Data required

Amazon: Firehose

Splunk Edge Processor integration with Amazon Data Firehose

Splunk Edge Processor can now directly ingest logs from Amazon Data Firehose, enabling seamless streaming from various AWS services into Splunk Cloud Platform for real-time analysis and visualization. Whether monitoring cloud infrastructure, applications, or security events, this addition broadens your data source options, enhances your ability to gain real-time insights, and simplifies data pipeline management while both reducing latency and ensuring faster access to critical data.

Acknowledgement for HEC data

Splunk Edge Processor also features receiver acknowledgement for upstream HTTP Event Collector (HEC) data. This preserves data integrity by ensuring HEC events sent to the processor are properly received and acknowledged, adding an additional layer of confidence that no information is lost during transmission between data inputs and Edge Processors.

Prerequisites

The following steps assume you already have access to:

A Splunk Edge Processor tenant with a paired Splunk Cloud Platform stack
A Splunk Edge Processor instance running on a machine with an accessible URL that is under a valid domain
An AWS account

To ensure proper data ingestion from Amazon Data Firehose, the HEC receivers for your Edge Processors should also accept data over TLS, not mTLS. The TLS certificate must be signed by a trusted public certificate authority (CA). This certificate can be configured in your tenant’s web UI. For how to acquire a certificate by public CA, please see this section of the article.

Ingesting VPC flow logs into Splunk Edge Processor via Firehose streams

In the following sections, we’ll guide you through how to integrate Amazon Data Firehose into your existing Splunk Cloud Platform setup. Specifically, we’ll focus on setting up an HEC token for your Edge Processor, configuring VPC flow log ingestion into Splunk Cloud Platform via Amazon Data Firehose, and achieving network traffic CIM compliance using SPL2 pipelines. An architectural diagram showing the high-level components involved in this setup can be seen below.

Applying a HEC token to your Edge Processor

HEC tokens are used by the HTTP Event Collector to authenticate and authorize data sent to Splunk Cloud Platform. These tokens securely manage data intake from various sources over HTTP/HTTPS, ensuring that only authorized data is accepted and properly categorized for analysis. To generate and set up a token:

Open a web browser and navigate to your Splunk Cloud Platform instance. Then, using the dropdown menus located at the top of the page, select Settings > Data Inputs.
In the table titled Local inputs, locate the HTTP Event Collector row and click the + Add New button on the right-hand side.
You'll be directed to a form requesting various HEC-related information. The only required field is the token name, though you can fill in additional details to better suit your use case. After you are finished, review and submit the form using the navigation buttons in the top-right corner.
Under the Token is being deployed header, you'll see a grayed out text box labeled Token Value. Copy this value to your clipboard, as it’ll be needed shortly.

Now that a valid HEC token has been generated, it’s time to apply it to your Edge Processor:

Navigate to your Splunk Edge Processor tenant in a web browser by visiting console.scs.splunk.com/<tenant-id> and logging in via your user- or company-provided SSO.
On the left side of the landing page, select Edge Processors > Shared settings. This opens a page that is used to configure various receiver settings.
In the Token authentication section, click the New token button on the right-hand side, then paste the previously-copied HEC token value into the HEC token field.
Optionally, you can choose to configure the Source and Source type fields. This is strongly recommended, as doing so assigns default values to incoming data lacking them. This is especially important because source/sourcetype are typically used as partition values in the SPL2 pipelines transforming data within an Edge Processor instance. In this example, we’ll be using default-source and default-sourcetype for demonstration purposes. In your environment, you might choose to use fields like aws:kdf or aws:vpc-flow-log.
After you have configured everything, click Save in the bottom-right corner of the page.

Configuring VPC flow log ingestion into Splunk Cloud Platform

VPC flow logs capture essential information about the IP traffic to and from network interfaces in your Virtual Private Cloud. By streaming these logs through Amazon Data Firehose, you can efficiently route the data to Splunk Edge Processor for real-time processing and analysis, enabling deeper insights within your Splunk Cloud Platform environment. To set this up, you’ll first need to create a Firehose stream:

Navigate to your AWS Management Console.
Use the search bar at the top of the page to locate the Amazon Data Firehose service’s homepage, then click Create Firehose stream in the top-right corner.
For Source and Destination, select Direct PUT and Splunk from the input fields’ dropdown menus, respectively. This populates the form with additional configuration settings.
Within the Destination Settings panel, enter the URL of the machine that hosts your Splunk Edge Processor instance in the Splunk cluster endpoint field. This URL should always follow the format https://<host_machine_url>:8088 and should point to your Edge Processor instance, not the tenant.

In order for this to work properly, the URL for your instance must use HTTPS, and the host machine must be configured to allow incoming HTTP/TCP traffic on the specified HEC receiver port (for example, 8088).
In the Authentication token field of this same panel, copy and paste the value of the HEC token generated previously.
In the Backup settings panel, you must specify an S3 bucket to ensure data recovery in the event of transmission failures or other issues during the streaming process. If you do not already have an S3 bucket set up, follow the instructions provided here.
Click Create Firehose stream in the bottom-right corner of the form.

To test whether you’ve configured everything correctly before moving on:

Navigate to your newly-created Firehose stream and expand the panel titled Test with demo data.
Click Start sending demo data. Dummy data should be routed from your Firehose stream through your Splunk Edge Processor instance.
To verify this is working as expected, select the Edge Processors tab on the left-hand side of your tenant’s UI and double-click the row containing your Edge Processor.
Within a minute or two, the data flowing through in the last 30 minutes metrics in the bottom-right corner of the page should reflect some small amount of inbound data—likely categorized by the default source and sourcetype values specified previously. If this doesn't happen, check your Firehose stream’s destination error logs in Amazon CloudWatch.

With the Firehose stream now configured to send data to your Splunk Edge Processor instance, the final step is to create a VPC log flow and direct it to the Firehose stream. Depending on your use case, you might want to create a new VPC or use an existing one. Instructions for creating a new one can be found in the official AWS documentation. For the purposes of this demonstration, we’ll be using the default VPC provided by AWS.

In the same AWS management console as before, navigate to the VPC service’s homepage using the search bar provided at the top of the page.
Click the VPCs hyperlink in the Resources by region section of the page and select its associated VPC ID. This value will be of the format “vpc-<hexstring>”.
The resulting page shows information related to the selected VPC’s configuration. Under the section titled Details, open the Flow logs tab and click Create flow log on the right side of the panel.
For the Destination field, choose the option Send to Amazon Data Firehose in the same account. Then, select your previously-created Firehose stream from the dropdown menu of the resulting Amazon Firehose stream name field.
For the Log record format field, you can choose to use AWS’s default format or customize your own. The fields that you should include are dependent on your use case. However, it’s important to note the format preview shown below. You'll need this when creating a SPL2 pipeline used to transform these logs in the next section of these instructions.
When you are finished, click Create flow log in the bottom-right corner of the form.

At this point, you should begin to see VPC flow logs populating the destination specified by your Edge Processor. If routing to Splunk Cloud Platform, you can identify these logs by searching for the default source and sourcetype values defined previously. In the event something has gone wrong, start debugging by checking the Firehose stream destination error logs.

Achieving Common Information Model (CIM) compliance using SPL2 pipelines

With VPC flow logs now successfully flowing into Splunk Edge Processor, the next step is to transform these logs to align with the CIM Network Traffic data model. By leveraging specific SPL2 commands, you can build and apply a pipeline that maps the flow log fields to their CIM equivalents. This ensures the data is normalized, enabling consistent and effective analysis across search and reporting capabilities in the Splunk platform. To accomplish this, you'll need to first create a SPL2 pipeline:

Navigate to your Splunk Edge Processor tenant in a web browser.
On the left side of the page, select the Pipelines tab and click the + New pipeline button in the top-right corner.
You will be prompted to select a template from which your pipeline will be created. Select Blank pipeline and click Next.
Define the partition(s) for your pipeline. Because we configured our receiver to append default source and sourcetype values to logs without them, it’s best to set the partition to match one or both of these values since VPC flow logs don’t include them by default.
(Optional) In the Enter or upload sample data input box, it might be useful for testing purposes to paste one of the VPC flow logs ingested earlier. Depending on whether you provide sample data, click either the Next or Skip button in the bottom-right corner of the page to continue.

For example, you could use the AWS default format which produces a log similar to the following: {"message":"2 215263928837 eni-05a082dab7784e51f 35.203.211.189 172.31.61.177 54623 5800 6 1 44 1723573216 1723573232 REJECT OK"} . You can see this in the Splunk Cloud Platform screenshot above.
Select the desired data destination from the list and click Done to create your pipeline.

Now that a new pipeline has been created, you can use various SPL2 commands to extract information from the flow log and map it to CIM-compliant field names. For AWS flow logs specifically, the default record format referenced in Step 5 of the previous section looks like: ${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status}. According to field mapping documentation for the Splunk platform, the following changes need to be made in order to achieve CIM compliance:

(not required) version
account-id → vendor_account
interface-id → dvc
srcaddr → src_ip
dstaddr → dest_ip
srcport → src_port
dstport → dest_port
protocol → transport
(unchanged) packets
(unchanged) bytes
(calculated) start, end → duration
(not required) action
(not required) log-status

The next step involves implementing these changes in code. The rex command can be used to parse the raw flow log, extracting only fields that are essential for compliance. Fields like version, action, and log-status, which are not required, should be intentionally excluded from this extraction process, ensuring that only necessary information is retained. Additionally, the pipeline should calculate the duration of the network session using the provided start and end timestamps in order to align with the data model specified by the CIM. Finally, the fields command can help remove the start and end fields from the log, as they are not needed after calculating duration so can be ignored. Here’s an example of what the resulting SPL2 might look like:

$pipeline = | from $source
    | rex field=_raw /{"message":"\S+ (?P<vendor_account>\S+) (?P<dvc>\S+) (?P<src_ip>\S+) (?P<dest_ip>\S+) (?P<src_port>\S+) (?P<dest_port>\S+) (?P<transport>\S+) (?P<packets>\S+) (?P<bytes>\S+) (?P<start>\S+) (?P<end>\S+) \S+ \S+"}/
    | eval duration = end - start
    | fields - start, end
    | into $destination;

Now that all the data transformation logic is in place, the only remaining step is to save the pipeline and apply it to your running Edge Processor:

In the top-right corner of the pipeline editor, click the Save pipeline button, provide a required name and an optional description, and click Save.
After a few seconds, you’ll be met with a popup titled Apply pipeline. Click Yes, apply, select the targeted Edge Processor(s), and click Save in the bottom-right corner.
A notification should appear indicating that the pipeline update might take a few minutes to propagate. To check the status of your processor, click the Splunk icon in the top-left corner to navigate back to the landing page, select the Edge Processors tab on the left-hand side, and monitor its Instance Health. It should eventually reach a healthy (green) status.

Logs routed to your specified destination should now contain the CIM-compliant fields appended above.

Optional: How to get a signed TLS certificate

In Amazon Firehose Stream, the Direct PUT to Splunk only supports HTTPS endpoints on the Internet with TLS certification verification. Amazon Firehose Stream does not currently support trusting a custom certificate or a self signed certificate. The Edge Processor node must host under a domain with a signed TLS certificate from a public certificate authority (CA).

Acquiring a domain via route 53

Route 53 is a native AWS service that can help you register or import a domain name. In this example we’ll purchase a domain name.

Navigate to your AWS Management Console.
Use the search bar at the top of the page to locate the Route 53 service’s Dashboard. On the left side panel, expand Domains, then click Registered domains. In the top right corner click Register domains.
Enter your preferred domain name and click Search. This will list the availability of the domain and its price, and suggestions if it is not available.
After you select the domain name, click Proceed to checkout. Route 53 will validate and register the domain for you.
After the domain is registered, it will be listed under “Hosted zones” in your left side panel.

Bind domain to your EC2 Edge Processor instance

See this link for more information on this process.

In the AWS Management console, locate the EC2 service.
In the navigation pane, choose Instances.
In the table, choose the instance that you want to route traffic to.
In the bottom pane, on the Description tab, take note of the value of the Elastic IP there. If you didn't associate an Elastic IP with the instance, take note of the value of the IPv4 public IP.
Go to the Route 53 management console by clicking Hosted zones on the left side panel.
Choose the name of the hosted zone that matches the name of the domain that you want to route traffic for.
Click Create record.
In the Record name field, enter the domain name that you want to use to route traffic to your EC2 instance. The default value is the name of the hosted zone.
Record type select A – Routes traffic to an IPv4 address and some AWS resources.
In Value, enter the Elastic IP or IPv4 address you noted earlier.
Click Create records. This will propagate the change to the domain resolver within 60 seconds. After that you should be able to access your instance via its domain name.

Get server TLS certificate for Edge Processor HEC

In this example we will host the Edge Processor HTTP Event Collector under the domain we registered previously. We will use a free third-party certificate authority, Let's Encrypt, to sign the HTTPS certificate.

In order to prove that we have full control of this domain, we need to use Certbot, an open source tool using Let’s Encrypt certificates, to acquire and manage all the certificates from Let’s Encrypt.

Install the Snapd package management tool on your system using one of the two options below. For more information on this process see this link.
For Ubuntu and Debian:
```
sudo apt update
sudo apt install snapd
```
For CenOS and RHEL:
```
sudo yum install snapd
```

Install Certbot via snapd:

sudo snap install --classic certbot
sudo ln -s /snap/bin/certbot /usr/bin/certbot

By default Certbot uses HTTP-01 challenge to check the control of the domains. You need to set up a HTTP service first in order to pass that check, because the check can only be done by HTTP request on port 80.
You can also check your domain challenge by go to https://letsdebug.net/ to see which challenge type you need.
Let's Encrypt provides a Certbot plugin for domains that are managed by AWS Route 53.
Install the certbot-dns-route53 plugin using:
```
sudo apt install python3-certbot-dns-route53
```

Prepare the access user for the certbot-dns-route53 plugin:

In the AWS console, navigate to IAM.
On the right side panel, click Policies, then in the top right corner, click Create Policy
Switch the policy editor to JSON, and add the following policy:

{
    "Version": "2012-10-17",
    "Id": "certbot-dns-route53 sample policy",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "route53:ListHostedZones",
                "route53:GetChange"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect" : "Allow",
            "Action" : [
                "route53:ChangeResourceRecordSets"
            ],
            "Resource" : [
                "arn:aws:route53:::hostedzone/YOURHOSTEDZONEID"
            ]
        }
    ]
}

Click Next and enter a policy name, then click Create Policy.
On the right side panel, click Users, then in the top right corner, click Create User.
Enter a user name and click Next.
In Set Permissions, switch the Permissions options to Attach policies directly, search for the policy you created, and select it.
Click Next to review the user, then click Create User to create it.
In the User list, click the user you just created.
On the top right corner of the user Summary, click Create Access Key.
In Use case, select Command Line Interface (CLI), and click Next all the way to the end to create the access key.
After the key is created, download the CSV access key file. Note that this is the only area that AWS will show you the key, so ensure you keep it in a safe place.
Add the AWS access key in your Splunk Edge Processor host:
1. Create a file in ~/.aws/config
2. Put the access key ID and the secret access key in the following format:
```
[default]
aws_access_key_id=AKIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
```
If you want to use another path than the default, you can put your access key file path in the AWS_CONFIG_FILE environment variable. For more information, see this link.
Run the following command to get the signed server certificate:
```
certbot certonly --dns-route53 --cert-name cert-name --key-type rsa -d example.com -d "*.example.com"
```
You must specify the key type as RSA, otherwise the key will be in ECDSA format. Edge Processors currently only support RSA format keys.
The certificate and key will be downloaded to /etc/letsencrypt/live/cert-name/ as the file names fullchain.pem and privkey.pem.
Update Splunk Edge Processor with the certificate and key:
1. Go to the Splunk Edge Processor list and click your Edge Processor.
2. In the top right corner, expand the Actions dropdown list, and click Edit Edge Processor.
3. Enable the HTTP Event Collector.
4. Select TLS as the connection protocol.
5. Upload privkey.pem as the Server Private Key, and fullchain.pem as the Server certificate.
6. Click Save and wait for confirmation that Splunk Edge Processor has deployed the change.

Next steps

With the introduction of Amazon Data Firehose support in Splunk Edge Processor, managing and analyzing your AWS data streams has never been easier. This update not only expands your data source options but also enhances the reliability of data transmission with receiver acknowledgement for upstream HEC data. Whether you’re monitoring cloud infrastructure, analyzing security events, or ensuring CIM compliance, these new capabilities provide you with the tools needed to optimize your Splunk environment. We encourage you to explore these features and see how they can enhance your data processing workflows.

To get started with one (or both!) of our Data Management pipeline builders, fill out the following form.
Check out the Data Management resource hub.
If you’d like to request a feature or provide any other feedback, we strongly encourage you to create a Splunk Idea and/or send an email to edgeprocessor@splunk.com.
You can also join the lively discussion in the #edge-processor channel of the Splunk Community Slack. It’s an excellent forum to learn from the community on the latest Edge Processor use-cases.