Deploying use-case based data management solutions

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Explosive data growth, a proliferation of data sources and types, and disparate use cases all lead to challenges in how to access and retain data. It can be difficult to determine the right balance between cost and ease of access. A general use-case based guideline is as follows:

Real time and near-real time data: This data is indexed. It fulfills the following use cases:
- Prevention
- Detection
- Monitoring
Ad hoc data: This data is sometimes indexed and sometimes kept in object storage. It fulfills the following use cases:
- Incident review
- Investigations
- Threat hunting
Archive data: This data is kept in archive storage. It fulfills the following use cases:
- Forensics
- Audit
- Compliance

Splunk offers storage management solutions at all these levels, through two main categories:

Index and time-based data management
Event level data management

Click though the graphic below to learn more about these solutions and how they serve your use cases.

After you understand what the solutions are and what use case each is best applied to, the next step is to clean up your data.

Filter out null values

Challenge: JSON data is verbose, but when field names contain no value, the empty string still needs storage. Storage costs money.

Solution: Use SPL2 to remove empty fields.

Outcomes:

Between 10 and 50 percent log volume reduction
More readable events and search results
Fewer parsed fields leads to a minor reduction in SVC

JSON classification

Challenges:

Different and distinct sources of data produce JSON output, but each contains vastly different fields and values.
Generic source types negatively affect index and search time capability.
Index and search time rules are not able to target distinct sources of data.
The default _json source type uses indexed extractions, which is not usually the desired behavior and leads to index bloat.

Solution: Do not use _jsonin every case. Detect and apply appropriate classifications by using Ingest Processor, Ingest Actions, and other index and search time transforms to more reliably target specific data. Use per-source type optimization by applying index time extractions (Splunk Enterprise/Splunk Cloud Platform), field aliasing (Splunk Enterprise/Splunk Cloud Platform/Splunk Observability Cloud), or CIM mapping as appropriate.

Outcomes:

These actions lead to more efficient searches, which in turn reduce SVC consumption.

Routing and federation

Challenges:

Data contains fields unlikely to be used for common use cases
Data contains events considered low value not typically used for detections
Full fidelity logs still required for compliance use cases or audit

Solution: First, transform the logs by getting rid of unused headers. You can also remove or filter out unused fields before sending only the optimized events to the Splunk platform, while still sending the whole log to S3 for lowest cost object storage. If needed, you can use federated search to find the logs you need in S3 without ingesting them into the Splunk platform.

Outcomes:

Reduce storage requirements
Reduce SVC consumption
Offload data processing from the Splunk platform

Windows XML to JSON

Challenges:

Windows XML logs are verbose and consume a lot of disk
Require search-time xmlkv, as well as both index and search time knowledge
Need additional SVC consumption and search complexity
Modifying Universal Forwarders to switch formats can be very time consuming and labor intensive

Solution: Convert XML to JSON. JSON is a good intermediate format for further optimization, and you could even convert to CSV if needed.

Outcomes:

Implicit JSON search parsing with minimal SVC consumption
JSON is still verbose, but has smaller disk footprint than XML, some storage savings
No changes to upstream data collection needed

Results

Using the recommendations above, here are examples of results that Splunk customers were able to achieve.


Customer	Objective	Action	Outcome
A	Reduce firewall log noise	Filtered events from non-critical systems using IP address	90% reduction
B	Reduce data volume Increase control of data sources	Transformed zscaler data by using lookups to remove redundant data	60% reduction without loss of fidelity
C	Eliminate low value fields to increase efficiency in DDAS capacity consumption	Filtered null values	Reduced data by 75%
D	Reduce data volumes Address PII compliance issues	Masking and S3 Routing	Addressed PII Compliance Cost reduction Routing to S3

Next steps

Now that you have an idea of how to manage your data in a more strategic way, watch the full .conf25 Talk, Strategic Data Mastery. In the talk, you'll learn about additional steps you can take, such as masking, filtering, and transforming your data, plus best practices for metricizing logs for use in Splunk Observability Cloud.

In addition, these resources might help you understand and implement this guidance:

Splunk Lantern Article: Setting data retention rules in Splunk Cloud Platform
Splunk Lantern Article: Partitioning data in S3 for the best FS-S3 experience
Splunk Help: Manage DDSS self storage locations
.Conf Talk: Regeneron, an Ingest Processor/Data Transformation Success Story
Splunk EDU: Mastering Splunk Data Management Techniques
Splunk Tech Talk: Advanced Splunk Data Management Strategies
Splunk OnDemand Services: Use these credit-based services for direct access to Splunk technical consultants with a variety of technical services from a pre-defined catalog. Most customers have OnDemand Services per their Success Plan. Engage the ODS team at ondemand@cisco.com if you would like assistance.