Well, to start, an indexer stores indexed data in indexes after such data is indexed. However, the real work happens when you run a search and the indexer fetches the indexed data from the indexes. If you're still with us, great. If not, don't panic, because we'll show you how Splunk creates and manages data repositories (indexes), and review the courses designed to help Splunk Administrators keep Splunk installations happy, healthy, and growing.
How multiple indexers helps you scale
Believe it or not, the more indexers you have, the better! Slow indexing? Add indexers! Slow searching? Add indexers! That's because Splunk forwarders distribute data to each of your indexers. That data-distribution results in opportunities for parallelized processing when you need to search that data. In other words, you win when you have a bunch of machines working on portions of your search rather than one machine trying to handle it all on its own.
You may ask, "But what happens to my data if one of those indexers goes down?" Great question! The Splunk indexer clustering feature manages multiple copies of the data to increase resiliency for your Splunk-ed data.
Now that you know enough to be smart and safe, take a moment to understand the relationship between indexers, buckets, and indexer clusters. These concepts will help you effectively plan and scale your deployments with Splunk Enterprise components.
Things to know
Indexers play a key role in how data moves through Splunk deployments.
An indexer is a Splunk Enterprise instance that stores incoming raw event data and transforms it into searchable events that it places on an index. Each index can contain a variety of data, and is made up of buckets, that is, smaller collections of data and their associated index files.
An indexer cluster, or the Splunk implementation of index replication, is a group of indexers configured to replicate the data of other indexers in the cluster group to ensure the system has redundant copies of all data. By maintaining multiple, identical copies of data, indexer clusters and index replication prevent data loss and ensure that data is available for searching. Key benefits include: data availability, data fidelity, data recovery, disaster recovery and search affinity.
Replication factor is the number of copies of data that the cluster maintains. For example, to ensure that your system can tolerate a failure of two peers, you would configure a replication factor of 3, which means that the cluster stores three identical copies of each bucket on separate nodes. As the replication factor increases, you need to run more indexers and provision more storage for the indexed data. The good news is data replication itself requires little processing power, so you can take advantage of the multiple indexers in a cluster to ingest and index more data.
Things to do
Begin your education track for Splunk Administrators. Enroll in the Education Track for Splunk Administrators to learn the concepts, tasks, and best practices for managing a single Splunk instance or a distributed deployment.
Lift the hood on the indexer. Find the buckets that store your data. How many hot buckets do you have? Do you have many warm buckets or just a few? See Managing Indexers and Clusters of Indexers.
Calculate the replication factor you need. Calculate the trade-offs and performance benefits of adding peers to your indexer clusters and index replication.