Deploying use-case based data management solutions
Explosive data growth, a proliferation of data sources and types, and disparate use cases all lead to challenges in how to access and retain data. It can be difficult to determine the right balance between cost and ease of access. A general use-case based guideline is as follows:
- Real time and near-real time data: This data is indexed. It fulfills the following use cases:
- Prevention
- Detection
- Monitoring
- Ad hoc data: This data is sometimes indexed and sometimes kept in object storage. It fulfills the following use cases:
- Incident review
- Investigations
- Threat hunting
- Archive data: This data is kept in archive storage. It fulfills the following use cases:
- Forensics
- Audit
- Compliance
Splunk offers storage management solutions at all these levels, through two main categories:
- Index and time-based data management
- Event level data management
Click though the graphic below to learn more about these solutions and how they serve your use cases.
After you understand what the solutions are and what use case each is best applied to, the next step is to clean up your data.
Filter out null values
Challenge: JSON data is verbose, but when field names contain no value, the empty string still needs storage. Storage costs money.
Solution: Use SPL2 to remove empty fields.
Outcomes:
- Between 10 and 50 percent log volume reduction
- More readable events and search results
- Fewer parsed fields leads to a minor reduction in SVC
JSON classification
Challenges:
- Different and distinct sources of data produce JSON output, but each contains vastly different fields and values.
- Generic source types negatively affect index and search time capability.
- Index and search time rules are not able to target distinct sources of data.
- The default
_jsonsource type uses indexed extractions, which is not usually the desired behavior and leads to index bloat.
Solution: Do not use _jsonin every case. Detect and apply appropriate classifications by using Ingest Processor, Ingest Actions, and other index and search time transforms to more reliably target specific data. Use per-source type optimization by applying index time extractions (Splunk Enterprise/Splunk Cloud Platform), field aliasing (Splunk Enterprise/Splunk Cloud Platform/Splunk Observability Cloud), or CIM mapping as appropriate.
Outcomes:
- These actions lead to more efficient searches, which in turn reduce SVC consumption.
Routing and federation
Challenges:
- Data contains fields unlikely to be used for common use cases
- Data contains events considered low value not typically used for detections
- Full fidelity logs still required for compliance use cases or audit
Solution: First, transform the logs by getting rid of unused headers. You can also remove or filter out unused fields before sending only the optimized events to the Splunk platform, while still sending the whole log to S3 for lowest cost object storage. If needed, you can use federated search to find the logs you need in S3 without ingesting them into the Splunk platform.
Outcomes:
- Reduce storage requirements
- Reduce SVC consumption
- Offload data processing from the Splunk platform
Windows XML to JSON
Challenges:
- Windows XML logs are verbose and consume a lot of disk
- Require search-time
xmlkv, as well as both index and search time knowledge - Need additional SVC consumption and search complexity
- Modifying Universal Forwarders to switch formats can be very time consuming and labor intensive
Solution: Convert XML to JSON. JSON is a good intermediate format for further optimization, and you could even convert to CSV if needed.
Outcomes:
- Implicit JSON search parsing with minimal SVC consumption
- JSON is still verbose, but has smaller disk footprint than XML, some storage savings
- No changes to upstream data collection needed
Results
Using the recommendations above, here are examples of results that Splunk customers were able to achieve.
| Customer | Objective | Action | Outcome |
|---|---|---|---|
| A | Reduce firewall log noise | Filtered events from non-critical systems using IP address | 90% reduction |
| B |
Reduce data volume Increase control of data sources |
Transformed zscaler data by using lookups to remove redundant data | 60% reduction without loss of fidelity |
| C | Eliminate low value fields to increase efficiency in DDAS capacity consumption | Filtered null values | Reduced data by 75% |
| D |
Reduce data volumes Address PII compliance issues |
Masking and S3 Routing |
Addressed PII Compliance Cost reduction Routing to S3 |
Next steps
Now that you have an idea of how to manage your data in a more strategic way, watch the full .conf25 Talk, Strategic Data Mastery. In the talk, you'll learn about additional steps you can take, such as masking, filtering, and transforming your data, plus best practices for metricizing logs for use in Splunk Observability Cloud.
In addition, these resources might help you understand and implement this guidance:
- Splunk Lantern Article: Setting data retention rules in Splunk Cloud Platform
- Splunk Lantern Article: Partitioning data in S3 for the best FS-S3 experience
- Splunk Help: Manage DDSS self storage locations
- .Conf Talk: Regeneron, an Ingest Processor/Data Transformation Success Story
- Splunk EDU: Mastering Splunk Data Management Techniques
- Splunk Tech Talk: Advanced Splunk Data Management Strategies

