Data availability describes how often your data is available to be utilized. During an active security incident, not having the correct data or the correct time frame of data ready in Splunk can have severe consequences. Furthermore, unexpected issues or interruptions in data management are inevitable, so your system should be able to work around these issues while still allowing you to access the data you need.
Establishing and maintaining a secure, successful Splunk deployment starts with having the right data. You'll need to plan and implement common framework structures around the system and the data itself in order for the right data to be in place. You should define your requirements and then develop policies that adhere to those requirements, while adhering to a structure-based framework for event management activities, including event generation, transmission, storage, analysis, retention, and disposal.
What are the benefits of effective data availability and retention?
While some businesses are more time-sensitive than others, maintaining data availability is essential for the performance and business continuity of any organization. If you were to lose access to mission-critical data, your IT operations could grind to a halt or you could miss a key piece of evidence that points to a major breach, resulting in financial costs and damage to the reputation of your organization. Benefits of a proper data governance program include:
- Control of your data lifecycle. Data storage requirements can place data in hot, warm, and cold alternatives to manage your data through its lifecycle and availability needs.
- Continuity of service. Highly available deployments allow for "always on" access.
- High-level performance. Efficiently optimize the storage and overall cost of managing and maintaining data, based on the usage/demand patterns.
- Secure data. Prevent data from being misused, and keep it secure at rest and in transit.
What are data availability and retention best practices?
Splunk recommends best practices around event generation and storage to help preserve and protect the confidentiality, integrity, and availability of security data. Along with a well established data governance program, you should consider a few essential metrics to monitor when evaluating the data availability of your environment:
- Security alerts. Data availability isn’t just about application monitoring and response. It is also about ensuring your information is protected. Proper monitoring of security alerts and warnings should be a priority of any organization. Your applications may be running perfectly while your intellectual property may be walking out the front door.
- Idle connections. Idle connections suck up resources, congest networks, and impact system performance. Idle connections can also indicate a problem and provide gaps in data availability.
- Long-running queries, commands, or jobs. This applies not just to database queries or jobs, but also to commands and backups. These types of digital actions can be an indicator of poor system health, slow disk speeds, CPU or other resource contention, or even deeper systematic problems.
- Disk input/output. Disk IO typically refers to the input/output operations of the system related to disk activity. Tracking disk I/O can help identify bottlenecks, poor hardware configurations, improperly sized disks, or poorly tuned disk layouts for a given workload.
- Memory. Monitoring memory helps you look into traffic jams or leaks, identify improperly sized systems, understand loads, and spikes in activity. In addition, knowing about memory-intensive patterns can help you anticipate availability demands.
- Disk space. Disk space monitoring is available in many forms, and utilizing it as a metric can prevent unnecessary problems and costly efforts to introduce more space.
- Errors and alerts. Errors, alerts, and recovery messages in the logs are another good metric to consider. Adding log monitoring for FATAL, PANIC, and key ERROR messages can help you identify issues that your availability solution is frequently recovering from, such as system or application crashes, core dumps, or errors requiring system downtime.