Bypassing a database for faster processing
Many admins have their Splunk environment configured to write time series data to a log-rotated file and then have another process translate those events into rows and columns to be ingested into a relational database. After this extract, translate, and load process (ETL), they then use SQL to gather their database records either for ad-hoc search or for aggregate reporting.
Additionally, these users sometimes discover the Splunk DBConnect (DBX) add-on that allows them to move the data from the database in short intervals into the Splunk platform for universal indexing of all fields and easy creation of reports without having to know SQL. Their approach looks something like this:
In this approach, a Splunk heavy forwarder has the DBX add-on installed on it and it contacts the database to gather its records. The events are then distributed in a round robin fashion to multiple Splunk indexers that might or might not be in the same data center.
Solution
The approach described above might seem good, but if you already have time series data written to files, you can bypass the database steps entirely. The following workflow is easier to maintain.
In this workflow, Splunk universal forwarders are placed on the data center machines that collect this time series data from different applications and send it in near real-time to Splunk indexers that might or might not be in the same data center. With this, you have effectively done the following:
- Eliminated the ETL phase, which someone had to write and maintain.
- Eliminated the need for a schema to collect the data.
- Eliminated the need to constantly react to new adjustment to a schema.
- Eliminated the database entirely, along with its need for an administrator and license.
- Introduced near real-time collection of time series data.
- Provided some high availability through the use of universal forwarders that can pick up where they left off if the network gets severed.
Bypass the database, use the universal indexing engine as the platform for search, reporting, and alerts for your textual data, and increase your efficiency managing your deployment.
Next steps
The following additional resources might help you understand and implement this product tip.
- Splunk Docs: About the universal forwarder
- Splunk Blog: Time series databases (TSDBs) explained
- Product Tip: Receiving and storing queued time series data