Apache: Hadoop
Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers. It can scale up from single servers to thousands of machines. The library is designed to detect and handle failures at the application layer to deliver a highly-available service.
The Splunk integration with Hadoop allows you to seamlessly search and analyze Hadoop-based data as part of your Splunk Enterprise deployment. You can:
- Interactively query raw data by previewing results and refining searches using the same Splunk Enterprise interface
- Quickly create and share charts, graphs and dashboards
- Ensure security with role-based access control and HDFS pass-through authentication
Configuration
Guidance for onboarding data can be found in the Spunk Documentation:
- Getting Data In (Splunk Enterprise)
- Getting Data In (Splunk Cloud)
- Get data into Splunk Observability Cloud
In addition, specific configuration information for the Splunk Analytics for Hadoop add-on is available here.
Application
When your Splunk deployment is ingesting Hadoop data, you can use the data to achieve the following: