GDI - Getting data in
Getting data into the Splunk platform involves taking data from inputs, and then indexing that data by transforming it into individual events that contain searchable fields. The Splunk platform can index any kind of data and offers a variety of well documented solutions for most common data sources.
Splunk apps and add-ons, found on Splunkbase, extend the capability and simplify the process of getting data into your Splunk deployment. Apps typically target specific data types and handle everything from configuring the inputs to generating useful views of the data.
Understand your needs
Before you start adding inputs to your deployment, ask yourself the following questions:
What kind of data do I want to index?
Is there an app for that?
|Where does the data reside? Is it local or remote?||Where is my data?|
|Should I use forwarders to access remote data?||Use forwarders to get data in|
|What do I want to do with the indexed data?||What is Splunk knowledge?|
Data inputs in Splunk platform
The Splunk platform provides tools to configure many kinds of data inputs, including those that are specific to particular application needs. The Splunk platform also provides the tools to configure any arbitrary data input types. In general, you can categorize Splunk inputs as follows:
- Files and directories
- Network events
- Windows sources
- *nix sources
- AWS, GCP, Azure Cloud sources
- HTTP Event Collector (HEC)
- Monitor First In, First Out (FIFO) queues
- Monitor changes to your file system
- Get data from APIs and other remote data interfaces through scripted inputs
- Get data with the Journald input
Files and directories
A lot of data comes directly from files and directories. You can use universal and heavy forwarders to monitor those files and directories and send them to both Splunk Enterprise and Splunk Cloud Platform. As a best practice for Splunk Cloud Platform, install universal forwarders on every machine you want to monitor files and directories, and send that data to a heavy forwarder into Splunk Cloud Platform. To monitor files and directories, see Get data from files and directories.
You might want to collect data from network ports, such as network data from machines that run syslog or SNMP. To do this in Splunk Cloud Platform, use a heavy or universal forwarder to collect the network data and then send that data to Splunk Cloud Platform. To get data from network ports, see Get data from TCP and UDP ports.
To get data from Windows sources into Splunk Cloud Platform, install the Splunk Add-on for Microsoft Windows on your universal forwarder. In this scenario, you can use a deployment server to deliver the Splunk Add-on for Microsoft Windows to the Windows machines you want to monitor. The add-on collects the data and sends it to Splunk Cloud Platform. For additional information on getting Windows data into Splunk Cloud Platform, see Get Windows data into Splunk Cloud Platform.
Linux and Unix Sources
To get data from Linux and Unix sources into Splunk Cloud Platform, install the Splunk Add-on for Unix and Linux. In this scenario, you can use a deployment server to deliver the Splunk Add-on for Unix and Linux to the *nix machines you want to monitor. For additional information, see Get *nix data into Splunk Cloud Platform.
HTTP Event Collector
In Splunk Cloud Platform, you can use the HTTP Event Collector to get data directly from a source with the HTTP or HTTPS protocols. For more information, see The HTTP Event Collector endpoint.
You can also get metrics data from your technology infrastructure, security systems, and business applications. For more information, see Metrics.
A scripted input is useful when combined with some Windows and *nix command-line tools, such as ipconfig, iostat, netstat, and top. You can also use a scripted input to get data from APIs, other remote data interfaces, and message queues. You can then use commands like vmstat and iostat on that data to generate metrics and status data. On Windows platforms, you can enable text-based scripts, such those in Perl and Python, with an intermediary Windows batch (.bat) or PowerShell (.ps1) file. For more information, see Scripted inputs.
Index custom data
The Splunk platform can index any time-series data, usually without additional configuration. If you have logs from a custom application or device, process it with the default configuration first. If you do not get the results you want, you can make adjustments so the software indexes your events correctly. See Create custom data inputs for Splunk Cloud Platform or Splunk Enterprise on the Splunk Developer Portal.
See Overview of event processing and How indexing works so that you can make decisions about how to make the Splunk platform work with your data.
Then, consider the following scenarios for collecting data:
- Are the events in your data more than one line? See Configure event line breaking.
- Is your data in an unusual character set? See Configure character set encoding.
- Is the Splunk platform unable to determine the timestamps correctly? See How timestamp assignment works.