Onboarding AWS CloudTrail data
The primary goal of this Data Source Onboarding Guide is to provide you with a curated, easy-to-digest view of the most common ways that Splunk users ingest data from AWS CloudTrail, including how to configure the systems that will send data. While this guide won't cover every single possible option for installation or configuration, it will give you the most common, easiest way forward.
Because simpler is almost always better when getting started, more complicated capabilities like Search Head Clustering, Indexer Clustering, or similar are not included in this guide. Only straightforward technologies have been included, avoiding ones that have lots of complications.
If at any point you feel like you need more traditional documentation for the deployment or usage of Splunk, you can check Splunk Docs' comprehensive guidance.
Follow the steps below, and keep track of the steps you've completed as you go.
General infrastructure
Scaling
While the Splunk platform scales to hundreds or thousands of indexers with ease, there are usually some pretty serious architecture conversations before ordering lots of hardware. That said, this guide isn't just for lab installs. We've found that this guidance will work just fine with most customers in the 5 GB to 500 GB range, or even some larger. Regardless of whether you have a single Splunk box doing everything, or a distributed install with a Search Head and a set of Indexers, you should be able to get the data and the value flowing quickly.
The first request we often get for orchestration as customers scale is to distribute configurations across many different universal forwarders. There are a variety of ways to do this:
- The standard answer is to use the Deployment Server. The deployment server is designed for exactly this task, and is free with the Splunk platform.
- If you are a larger organization, you've probably already got a way to deploy configurations and code, like Puppet, Chef, SCCM, Ansible, etc. All of those tools are used to deploy the Splunk platform on a regular basis. However, you might not want to go down this route if it requires onerous change control, or reliance on other teams, etc. Many large Splunk environments with well-developed software deployment systems prefer to use the Deployment Server because it can be owned by Splunk and is optimized for Splunk's needs. But many customers are very happy with using Puppet to distribute Splunk platform configurations.
Indexes and source types overview
Splexicon (Splunk's Lexicon, a glossary of Splunk-specific terms) defines an index as the repository for data. When the Splunk platform indexes raw event data, it transforms the data into searchable events. Indexes are the collections of flat files on the Splunk platform instance. That instance is known as an Indexer because it stores data. Splunk instances that users log into and run searches from are known as Search Heads. When you have a single instance, it takes on both the search head and indexer roles.
"Sourcetype" is defined as a default field that identifies the data structure of an event. A source type determines how the Splunk platform formats the data during the indexing process. Example source types include access_combined and cisco_syslog.
In other words, an index is where we store data, and the source type is a label given to similar types of data. For example, all Windows Security Logs will have a source type of WinEventLog:Security, which means you can always search for source=*wineventlog:security (when searching, source types are rendered as a single word, sourcetype, which is case sensitive but the value is not).
Knowing this information is important since this guide helps you use indexes that are recommended as an effective starting point, with standardized source types (those shared by other customers). This makes it much easier to use the Splunk platform and avoid headaches down the road. While the Splunk platform will allow you to use any source type you can imagine, which is great for custom log sources, it can be easier for common log sources to stick with standard source types.
Implement indexes.conf on indexers
Below is a sample indexes.conf that will prepare you for all of the data sources used in this guide. OS logs are separated from network logs and security logs from application logs for performance reasons, but also for isolation purposes - you might want to expose the application or system logs to people who shouldn't view security logs, and putting them in separate indexes prevents that.
Installing indexes.conf on Splunk Enterprise | Installing indexes.conf on Splunk Cloud Platform |
---|---|
Download a Splunk app with the indexes.conf and put it in the apps directory. For Windows systems, this will typically be: c:\Program Files\Splunk\etc\apps. Once you've extracted the app there, you can restart Splunk via the Services Control Panel applet, or by running c:\Program Files\Splunk\bin\splunk.exe restart. For Linux systems, the apps directory will typically be /opt/splunk/etc/apps/. Once you've extracted the app there, you can restart Splunk by running /opt/splunk/bin/splunk restart. You can view the indexes.conf below, but it's easiest to just click here to download a Splunk app with the indexes.conf. |
You won't copy the files onto your Splunk servers because you don't have access. You could go one-by-one through the UI and create all of the indexes below, but it might be easiest if you download the app, and open a ticket with CloudOps to have it installed. |
Sample indexes.conf:
# Overview. Below you will find the basic indexes.conf settings for # setting up your indexes in Splunk. We separate into different indexes # to allow for performance (in some cases) or data isolation in others. # All indexes come preconfigured with a relatively short retention period # that should work for everyone, but if you have more disk space, we # encourage (and usually see) longer retention periods, particularly # for security customers. # Endpoint Indexes used for Splunk Security Essentials. # If you have the sources, other standard indexes we recommend include: # epproxy - Local Proxy Activity [epav] coldPath = $SPLUNK_DB/epav/colddb homePath = $SPLUNK_DB/epav/db thawedPath = $SPLUNK_DB/epav/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [epfw] coldPath = $SPLUNK_DB/epnet/colddb homePath = $SPLUNK_DB/epnet/db thawedPath = $SPLUNK_DB/epnet/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [ephids] coldPath = $SPLUNK_DB/epmon/colddb homePath = $SPLUNK_DB/epmon/db thawedPath = $SPLUNK_DB/epmon/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [epintel] coldPath = $SPLUNK_DB/epweb/colddb homePath = $SPLUNK_DB/epweb/db thawedPath = $SPLUNK_DB/epweb/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [oswin] coldPath = $SPLUNK_DB/oswin/colddb homePath = $SPLUNK_DB/oswin/db thawedPath = $SPLUNK_DB/oswin/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [oswinsec] coldPath = $SPLUNK_DB/oswinsec/colddb homePath = $SPLUNK_DB/oswinsec/db thawedPath = $SPLUNK_DB/oswinsec/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [oswinscript] coldPath = $SPLUNK_DB/oswinscript/colddb homePath = $SPLUNK_DB/oswinscript/db thawedPath = $SPLUNK_DB/oswinscript/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [oswinperf] coldPath = $SPLUNK_DB/oswinperf/colddb homePath = $SPLUNK_DB/oswinperf/db thawedPath = $SPLUNK_DB/oswinperf/thaweddb frozenTimePeriodInSecs = 604800 #7 days [osnix] coldPath = $SPLUNK_DB/osnix/colddb homePath = $SPLUNK_DB/osnix/db thawedPath = $SPLUNK_DB/osnix/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [osnixsec] coldPath = $SPLUNK_DB/osnixsec/colddb homePath = $SPLUNK_DB/osnixsec/db thawedPath = $SPLUNK_DB/osnixsec/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [osnixscript] coldPath = $SPLUNK_DB/osnixscript/colddb homePath = $SPLUNK_DB/osnixscript/db thawedPath = $SPLUNK_DB/osnixscript/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [osnixperf] coldPath = $SPLUNK_DB/osnixperf/colddb homePath = $SPLUNK_DB/osnixperf/db thawedPath = $SPLUNK_DB/osnixperf/thaweddb frozenTimePeriodInSecs = 604800 #7 days # Network Indexes used for Splunk Security Essentials # If you have the sources, other standard indexes we recommend include: # netauth - for network authentication sources # netflow - for netflow data # netids - for dedicated IPS environments # netipam - for IPAM systems # netnlb - for non-web server load balancer data (e.g., DNS, SMTP, SIP, etc.) # netops - for general network system data (such as Cisco iOS non-netflow logs) # netvuln - for Network Vulnerability Data [netdns] coldPath = $SPLUNK_DB/netdns/colddb homePath = $SPLUNK_DB/netdns/db thawedPath = $SPLUNK_DB/netdns/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [mail] coldPath = $SPLUNK_DB/mail/colddb homePath = $SPLUNK_DB/mail/db thawedPath = $SPLUNK_DB/mail/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [netfw] coldPath = $SPLUNK_DB/netfw/colddb homePath = $SPLUNK_DB/netfw/db thawedPath = $SPLUNK_DB/netfw/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [netops] coldPath = $SPLUNK_DB/netops/colddb homePath = $SPLUNK_DB/netops/db thawedPath = $SPLUNK_DB/netops/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [netproxy] coldPath = $SPLUNK_DB/netproxy/colddb homePath = $SPLUNK_DB/netproxy/db thawedPath = $SPLUNK_DB/netproxy/thaweddb frozenTimePeriodInSecs = 2592000 #30 days [netvpn] coldPath = $SPLUNK_DB/netvpn/colddb homePath = $SPLUNK_DB/netvpn/db thawedPath = $SPLUNK_DB/netvpn/thaweddb frozenTimePeriodInSecs = 2592000 #30 days # Splunk Security Essentials doesn't have examples of Application Security, # but if you want to ingest those logs, here are the recommended indexes: # appwebint - Internal WebApp Access Logs # appwebext - External WebApp Access Logs # appwebintrp - Internal-facing Web App Load Balancers # appwebextrp - External-facing Web App Load Balancers # appwebcdn - CDN logs for your website # appdbserver - Database Servers # appmsgserver - Messaging Servers # appint - App Servers for internal-facing apps # appext - App Servers for external-facing apps
Validation
Once this is complete, you will be able to find the list of indexes that the system is aware of by logging into Splunk platform, and clicking into Settings > Indexes.
Install the forwarder on Linux systems
Installing the Windows forwarder is a straightforward process, similar to installing any Linux program. These instructions will walk you through a manual instruction for getting started (perfect for a lab, a few laptops, or when you're just getting started on domain controllers). You will have three options for how to proceed:
- Using an RPM package (easiest for any Red Hat or similar system with rpm)
- Using a DEB package (easiest for any Ubuntu or similiar system with dpkg)
- Using the compressed .tgz file (will work across Linux platforms)
For full and latest information on installing a forwarder, please follow the instructions in the Linux installation manual.
Implementation
First, check that you have elevated permissions to install the software and configure correctly. Then pick one of the following options:
-
Installation using an RPM package
Make sure you have downloaded the universal forwarder package and have it on the system you want to install the Splunk platform on.
Run: rpm -i splunkforwarder<version>.rpm
This will install the Splunk forwarder into the default directory of /opt/splunkforwarder.
To enable the Splunk platform to run each time your server is restarted use the following command: /opt/splunkforwarder/bin/splunk enable boot-start
-
Installation using a DEB package
Make sure you have downloaded the universal forwarder package and have it on the system you want to install the Splunk platform on.
Run: dpkg -i splunkforwarder<version>.rpm
This will install the Splunk forwarder into the default directory of /opt/splunkforwarder.
To enable the Splunk platform to run each time your server is restarted use the following command: /opt/splunkforwarder/bin/splunk enable boot-start
-
Installation using the .tgz file
Make sure you have copied the tarball (or appropriate package for your system) and extract or install it into the /opt directory.
Run: tar zxvf <splunk_tarball_file.tgz> -C /opt
[root@ip-172-31-94-210 ~]# tar zxvf splunkforwarder-7.0.1-2b5b15c4ee89-Linux-x86_64.tgz -C /opt splunkforwarder/ splunkforwarder/etc/ splunkforwarder/etc/deployment-apps/ splunkforwarder/etc/deployment-apps/README splunkforwarder/etc/apps/
Check your extraction:
Run: ls -l /opt
[root@ip-172-31-94-210 apps]# ls -l /opt total 8 drwxr-xr-x 8 splunk splunk 4096 Nov 29 20:21 splunkforwarder
If you would like the Splunk platform to run at startup then execute the following command: /opt/splunkforwarder/bin/splunk enable boot-start
After following any of the above three options, you will have a fully installed Splunk forwarder. There are three more steps you’ll want to take before you can see the data in Splunk:
- You will need an outputs.conf to send data from forwarders to indexers.
- You will need an inputs.conf to tell the forwarder what data to send.
- Make sure you've also fully completed putting an indexes.conf on the indexers to tell them where to put the data received. (You just passed that section.)
Sending data from forwarders to indexers
For any Splunk system in the environment, whether it's a Universal Forwarder on a Windows host, a Linux Heavy-Weight Forwarder pulling the more difficult AWS logs, or even a dedicated Search Head that dispatches searches to your indexers, every system in the environment that is not an indexers (i.e. any system that doesn't store its data locally) should have an outputs.conf that points to your indexers.
Implementation
The outputs.conf will be the same across the entire environment, and is fairly simple. There are three steps:
- Generate your own outputs app (Splunk Cloud Platform customers should use the app they received from Splunk Cloud Platform).
- Extract the file (it will download a zip file).
- Place in the etc/apps directory.
For Windows systems, this will typically be: c:\Program Files\Splunk\etc\apps. Once you've extracted the app there, you can restart the Splunk platform via the Services Control Panel applet, or by running "c:\Program Files\Splunk\bin\splunk.exe" restart.
For Linux systems, this will typically be /opt/splunkforwarder/etc/apps/. Once you've extracted the app there, you can restart Splunk by running /opt/splunk/bin/splunk restart.
Sample outputs.conf:[tcpout] defaultGroup = default-autolb-group [tcpout:default-autolb-group] server = MySplunkServer.mycompany.local:9997 [tcpout-server://MySplunkServer.mycompany.local:9997]
The completed folder should look like this:
Validation
Run a search in the Splunk environment for the host you've installed the forwarder on. For example, index=* host=mywinsystem1*
You can also review all hosts that are sending data from using | metadata index=* type=hosts
System Configuration
Amazon Web Services (AWS) has become an integral part of many organizations’ IT infrastructure. Splunk offers an easy method to ingest various data sources from the AWS platform, which Splunk Security Essentials (SSE) uses to enhance your overall security posture. This overview will provide step-by-step guidance on setting up this integration—specifically for the CloudTrail and VPC Flow Logs data sources. CloudTrail provides a comprehensive trail of account activity related to actions across your AWS infrastructure. VPC Flow Logs contain a comprehensive record of network traffic in and out of your AWS environment.
To set up this integration on your supported platform, follow these steps:
- Configure your AWS accounts and services or confirm your existing configurations.
- Configure accounts or EC2 roles with IAM permissions to match those required by the add-on.
- Install the add-on.
- On your data-collection node, configure the AWS accounts you want to use to collect data with the add-on.
- Configure your inputs to get your AWS data into the Splunk platform.
Comprehensive documentation for configuring both your AWS and Splunk environments can be found on the Splunk Docs website.
Performing all the steps below requires administrator access to your AWS account. If you do not have the required permissions to perform all the actions yourself, work with an AWS admin to complete all steps, including creating the accounts or EC2 IAM roles, with the permissions that the Splunk Add-on for AWS uses to connect.
Set Up AWS Identity Access Management - IAM
Correctly configuring the AWS IAM policy is required for ingesting the subsequent data streams into your Splunk environment. Splunk Docs contains comprehensive information on how to setup IAM roles in AWS, either for individual data sources or globally, for all AWS data sources.
Within the AWS IAM configuration menu, create a new user, splunk_access. Attach the SplunkAccess policy and grant the user only programmatic access. Once complete, download the user credentials, as shown in the screenshot below.
Set Up AWS Simple Notification Service - SNS
You need to grant permissions to the AWS accounts or EC2 IAM roles that the add-on uses to connect to the Amazon SNS API.
If you plan to use the Simple Queue Service (SQS)-based S3 input, you must enable Amazon S3 bucket events to send notification messages to an SQS queue whenever the events occur. You can check AWS Docs for instructions on Enabling and configuring event notifications using the Amazon S3 console and learning how to configure Amazon S3 Event Notifications.
Set up AWS Simple Queueing Service - SQS
You need to grant permissions to the AWS accounts or EC2 IAM roles that the add-on uses to connect to the Amazon SQS API.
If you plan to use the SQS-based S3 input, you must perform the following:
- Set up a dead-letter queue for the SQS queue to be used for the input for storing invalid messages. Read more about SQS dead-letter queues and how to configure them.
- Configure the SQS visibility timeout to prevent multiple inputs from receiving and processing messages in a queue more than once. We recommend that you set your SQS visibility timeout to 5 minutes or longer. If the visibility timeout for a message is reached before the message has been fully processed by the SQS-based S3 input, the message will reappear in the queue and will be retrieved and processed again, resulting in duplicate data. Read more about SQS visibility timeout and how to configure it.
Set up AWS CloudTrail
Enabling AWS CloudTrail inputs into Splunk will allows you to record AWS API calls for your account and ingest the resulting dataset. This data can then be used for searching, visualization, and correlation.
The Splunk Add-on for AWS collects events from an SQS that subscribes to the SNS notification events from CloudTrail. Configure CloudTrail to produce these notifications, then create an SQS in each region for the add-on to access them.
- Enable CloudTrail. Follow the instructions in the AWS documentation.
- Create an S3 Bucket in which to store the CloudTrail events. Follow the AWS documentation to ensure the permissions for this bucket are correct.
- Enable SNS Notifications. See the AWS documentation for instructions.
- Create a new SQS.
- If you are in the China region, explicitly grant DeleteMessage and SendMessage permissions to the SQS you just created. This step is not necessary in commercial regions.
- Subscribe the SQS to the SNS Notifications that you enabled in step 3.
- Grant IAM permissions to access the S3 bucket and SQS to the AWS account that the add-on uses to connect to your AWS environment. See Configure AWS Permissions for details.
Splunk configuration for data source
Where to collect logs from
Pulling logs from AWS requires a web service. It is functionally very different from grabbing data from local logs or events, since it must be configured via the Splunk Add-on for Amazon Web Services on a Splunk box with a web UI. It's deployed in one of the following two ways:
- Single instance: Splunk customers who have a smaller Splunk load that fits on a single system often add the Technology Add-on (TA) to the same system. Sizing here is environment specific, so you will want to ensure adequate performance (although this setup is usually quite workable in smaller environments). If you need to, you can always redo the configuration later, using a dedicated heavy forwarder.
- Heavy forwarder: In most environments, customers will install the TA on a dedicated heavy forwarder. A heavy forwarder is just like a normal Splunk install (in effect, not a universal forwarder), but its only role is to pull in data from special sources and send it to indexers.
The Splunk Add-on for Amazon Web Services requires Internet connectivity to send queries to the AWS APIs.
While it is generally Splunk best practice to install TAs across all parts of your Splunk environment (particularly props and transforms), in the case of the Splunk Add-on for Amazon Web Services, we reach out to a cloud service, which makes the configuration slightly different. We separate out installing the TA from configuring the inputs.
When configuring the inputs, you will only configure the inputs on one system in your environment, such as a heavy forwarder or a single instance. (See "Overview" for more detail.)
When installing the TA, the TA itself should reside wherever you configure the inputs (since the TA is the mechanism that allows you to configure the inputs). If you have a larger or more advanced environment where you configure the inputs on a heavy forwarder, you should also install the TA on your search heads, so you can see the Splunk Add-on for Amazon Web Services field extractions.
You can hide the app on your search heads so you don’t accidentally reconfigure and duplicate your data later. To do this, click the app dropdown on the upper left-hand corner of the screen, select Manage Apps , then Edit Properties next to the Splunk Add-on for Amazon Web Services. Next, click Visible: No and then Save.
The following table provides a reference for installing this specific add-on to a distributed deployment of Splunk Enterprise:
Splunk Platform component | Supported? | Required |
---|---|---|
Universal forwarders |
No |
No |
Search heads |
Yes |
Yes |
Heavy forwarders |
Yes |
Depends on size |
Indexers |
Yes |
No |
Installing the Technology Add-on - TA
Installing the TA on Splunk Enterprise | Installing the TA on Splunk Cloud Platform |
---|---|
|
You won't be copying any files or folders to your indexers or search heads, but even though the Amazon Web Services Add-on is not Cloud Self-Service Enabled, you will still be able to open a ticket with Cloud Ops and be ready to go in short order. |
AWS indexes and source types
While you can use any source types or indices that you want, we’ve found that the most successful customers follow specific patterns, as it sets them up for success moving forward.
The most common relevant AWS data types to Splunk Security Essentials are CloudTrail and VPC Flow Logs, but there are many others available to you. The following is an overview of SSE-relevant AWS data types and the recommended indices and sourcetypes. Other AWS data sources are outlined in more detail later.
Data source | Description | Source type | Index |
---|---|---|---|
Cloudtrail | AWS API call history from the AWS CloudTrail service. | aws:cloudtrail | awscloudtrail |
CloudWatch Logs | VPC Flow Logs from the CloudWatch Logs service. | aws:cloudwatchlogs:vpcflow | awsflow |
To support your AWS data sources, add new indices for the data you will be bringing in (it’s generally easiest if you just create awscloudtrail, awsflow, awscloudwatch, etc).
As mentioned, there are several other AWS data sources you could opt to bring in. If you wish to ingest those, here are our recommended sourcetypes for that data:
Data source | Description | Source type |
---|---|---|
Config | Configuration snapshots and historical configuration data from the AWS Config service | aws:config |
Configuration change notifications from the AWS Config service | aws:config:notification | |
Description | Descriptions of your AWS EC2 instances, reserved instances, and EBS snapshots. Used to improve dashboard readability | aws:description |
Config rules | Compliance details, compliance summary, and evaluation status of your AWS Config rules | aws:config:rule |
Inspector | Assessment runs and findings data from the Amazon Inspector service | aws:inspector |
CloudTrail | AWS API call history from the AWS CloudTrail service | aws:cloudtrail |
CloudWatch logs |
Data from the CloudWatch Logs service |
aws:cloudwatchlogs |
VPC Flow Logs from the CloudWatch Logs service | aws:cloudwatchlogs:vpcflow | |
CloudWatch | Performance and billing metrics from the AWS CloudWatch service | aws:cloudwatch |
Billing | Billing reports that you have configured in AWS | aws:billing |
S3 | Generic log data from your S3 buckets | aws:s3 |
S3 access logs | aws:s3:accesslogs | |
CloudFront access logs | aws:cloudfront:accesslogs | |
ELB access logs | aws:elb:accesslogs | |
CloudTrail data | aws:cloudtrail | |
Kinesis | Data from Kinesis streams | aws:kinesis |
SQS | Generic data from SQS | aws:sqs |
Configure AWS Account Information
- Configure the Splunk add-on for AWS to ingest data is to add the account created above. Go to Configuration > Account, and then click Add.
- Supply the information requested, and click Add.
Configure CloudTrail Input
After completing the above steps, configuring the data inputs within the Splunk platform interface is simple.
- In the Splunk add-on for AWS, go to Data inputs, and select Create new input > CloudTrail.
- Select your desired configuration.
The Splunk add-on for AWS will then ingest data from the configured AWS API endpoints, and perform the relevant field extractions and CIM data model mappings for use by the SSE app. You can find more detail on configuration in Splunk Docs.
Finally, you can make sure that the Splunk platform has begun ingesting the data from AWS by running Splunk searches. The Splunk add-on for AWS also has a built-in health-overview dashboard that will provide initial troubleshooting information.
(Optional) Update AWS app, if used
By default, the Splunk App for AWS and add-on send the data into the Splunk Main (default) index. From there, a saved search will run to populate the summary indices. The summary indices are used to populate the dashboards in the Splunk App for AWS. Sending data to custom indices will require making changes to the macros supporting this app. To modify this to meet Splunk best practices, follow these steps:
- Create a new index for AWS data.
- Update the proper macros and make sure the saved searches are running to populate the summary indices. (See http://docs.splunk.com/Documentation/AWS/5.0.2/Installation/Macros)
- Make sure the saved searches are running properly. (See http://docs.splunk.com/Documentation/AWS/5.0.2/Installation/Savedsearches)