Amazon Web Services (AWS) has become an integral part of many organizations’ IT infrastructure. An Amazon Virtual Private Cloud (VPC) enables you to launch AWS resources into a virtual network that you've defined, with the benefits of using the scalable infrastructure of AWS. VPC flow log data is collected outside of the path of your network traffic, and therefore does not affect network throughput or latency. You can create or delete flow logs without any risk of impact to network performance. In the Common Information Model, VPC flow log data is typically mapped to the Network Traffic Data model.
VPC Flow logs contain a comprehensive record of network traffic in and out of your AWS environment. By default, the record includes values for the different components of the IP flow, including the source, destination, and protocol. They are often used for troubleshooting connectivity issues across your VPCs, intrusion detection, or anomaly detection.
The following sections provide information on configuring Splunk software to ingest this data source. To configure the device or software, we recommend that you leverage official AWS resources.
Getting AWS VPC Flow data in
Splunk Docs contains extensive guidance on getting data into your Splunk deployment. If your deployment is not already ingesting AWS VPC Flow logs, the following topics can assist you in preparing to work with this data type:
- Splunk Enterprise
- Splunk Cloud
The recommended index is awsflow.
The source type is aws:cloudwatchlogs:vpcflow.
The supported input types are Cloudwatch logs and Kinesis. It is best to collect VPC flow logs and CloudWatch logs through Kinesis streams. However, the AWS Kinesis input has the following limitations:
- Multiple inputs collecting data from a single stream cause duplicate events in the Splunk platform.
- Does not support monitoring of dynamic shards repartition, which means when there is a shard split or merge, the add-on cannot automatically discover and collect data in the new shards until it is restarted. After you repartition shards, you must restart your data collection node to collect data from the partitions.
In addition, you will need the Splunk Add-on for Amazon Web Services. The add-on can be downloaded here and the official documentation can be accessed here. Read and follow the documentation carefully to understand all the essential information you need to work with this data source, including how to install the add-on, configure AWS, and configure Splunk.
The recommended maximum daily indexing volume for a typical VPC Flow log source type on a clustered indexer is 25 - 30 GB per indexer. Use this as a rough guideline to plan for the number of indexers to deploy in your clustered environment. Adding more indexers to a cluster improves indexing and search retrieval performance. Since this also incurs some additional within-cluster data replication traffic, adjust the number of indexers in your cluster based on your actual system performance.
AWS limits each account to 10 requests per second, each of which returns no more than 1 MB of data. This means the data ingestion and indexing rate is no more than 10MB/s. The add-on modular input can process up to 4K events per second in a single log stream.
If volume is a concern, configure the only_after parameter to limit the amount of historical data you collect.
If you have high volume VPC Flow Logs, configure one or more Kinesis inputs to collect them instead of using the CloudWatch Logs input.
You can make sure that Splunk has begun ingesting the data from AWS by running Splunk searches. The Splunk add-on for AWS also has a built-in health-overview dashboard that will provide initial troubleshooting information.