Skip to main content

 

Splunk Lantern

Deploying use-case based data management solutions

Explosive data growth, a proliferation of data sources and types, and disparate use cases all lead to challenges in how to access and retain data. It can be difficult to determine the right balance between cost and ease of access. A general use-case based guideline is as follows:

  • Real time and near-real time data: This data is indexed. It fulfills the following use cases:
    • Prevention
    • Detection
    • Monitoring
  • Ad hoc data: This data is sometimes indexed and sometimes kept in object storage. It fulfills the following use cases:
    • Incident review
    • Investigations
    • Threat hunting
  • Archive data: This data is kept in archive storage. It fulfills the following use cases:
    • Forensics
    • Audit
    • Compliance

Splunk offers storage management solutions at all these levels, through two main categories: 

  • Index and time-based data management
  • Event level data management

Click though the graphic below to learn more about these solutions and how they serve your use cases. 

After you understand what the solutions are and what use case each is best applied to, the next step is to clean up your data. 

Filter out null values

Challenge: JSON data is verbose, but when field names contain no value, the empty string still needs storage. Storage costs money. 

Solution: Use SPL2 to remove empty fields. 

Outcomes:

  • Between 10 and 50 percent log volume reduction
  • More readable events and search results
  • Fewer parsed fields leads to a minor reduction in SVC

JSON classification

Challenges:

  • Different and distinct sources of data produce JSON output, but each contains vastly different fields and values.
  • Generic source types negatively affect index and search time capability.
  • Index and search time rules are not able to target distinct sources of data.
  • The default _json source type uses indexed extractions, which is not usually the desired behavior and leads to index bloat.

Solution: Do not use _jsonin every case. Detect and apply appropriate classifications by using Ingest Processor, Ingest Actions, and other index and search time transforms to more reliably target specific data. Use per-source type optimization by applying index time extractions (Splunk Enterprise/Splunk Cloud Platform), field aliasing (Splunk Enterprise/Splunk Cloud Platform/Splunk Observability Cloud), or CIM mapping as appropriate.

Outcomes:

  • These actions lead to more efficient searches, which in turn reduce SVC consumption.

Routing and federation

Challenges:

  • Data contains fields unlikely to be used for common use cases
  • Data contains events considered low value not typically used for detections
  • Full fidelity logs still required for compliance use cases or audit

Solution: First, transform the logs by getting rid of unused headers. You can also remove or filter out unused fields before sending only the optimized events to the Splunk platform, while still sending the whole log to S3 for lowest cost object storage. If needed, you can use federated search to find the logs you need in S3 without ingesting them into the Splunk platform. 

Outcomes:

  • Reduce storage requirements
  • Reduce SVC consumption
  • Offload data processing from the Splunk platform

Windows XML to JSON

Challenges:

  • Windows XML logs are verbose and consume a lot of disk
  • Require search-time xmlkv, as well as both index and search time knowledge
  • Need additional SVC consumption and search complexity
  • Modifying Universal Forwarders to switch formats can be very time consuming and labor intensive 

Solution: Convert XML to JSON. JSON is a good intermediate format for further optimization, and you could even convert to CSV if needed.

Outcomes:

  • Implicit JSON search parsing with minimal SVC consumption
  • JSON is still verbose, but has smaller disk footprint than XML, some storage savings
  • No changes to upstream data collection needed

Results

Using the recommendations above, here are examples of results that Splunk customers were able to achieve.

Customer Objective Action Outcome
A Reduce firewall log noise Filtered events from non-critical systems using IP address  90% reduction
B

Reduce data volume

Increase control of data sources

Transformed zscaler data by using lookups to remove redundant data  60% reduction without loss of fidelity
C Eliminate low value fields to increase efficiency in DDAS capacity consumption Filtered null values Reduced data by 75%
D

Reduce data volumes 

Address PII compliance issues

Masking and S3 Routing

Addressed PII Compliance

Cost reduction 

Routing to S3 

Next steps

Now that you have an idea of how to manage your data in a more strategic way, watch the full .conf25 Talk, Strategic Data Mastery. In the talk, you'll learn about additional steps you can take, such as masking, filtering, and transforming your data, plus best practices for metricizing logs for use in Splunk Observability Cloud.

In addition, these resources might help you understand and implement this guidance:

  • Written by Paul Davies (Director, Global Architects) and Tolga Tohumcu (Director, Technical Interlock)
  • Splunk