Writing better queries in Splunk Search Processing Language
Poorly written queries can lead to slow, inefficient performance. Here are some best practices to improve them.
Solution
- Minimize the number of trips to the indexers.
One of the best ways to minimize the number of trips to the indexers is to avoid using the
join
andappend
commands. Although these commands are widely used, they’re not the most efficient.This is because both commands make use of a subsearch (the content between the square brackets). With each subsearch comes additional trips to the indexers, which increase the level of communication and overhead that might need to be involved.
Subsearches have additional limitations. By default, they have a timeout of 60 seconds and a limitation of 50,000 events (see
subsearch_maxtime
andsubsearch_maxout
inlimits.conf
for Splunk Enterprise or Splunk Cloud Platform). These factors lead to a truncation of results, which often goes unnoticed and leads to incorrect answers.So, what’s the solution?
Combine your subsearch with your primary search and accomplish the
join
with astats
command instead. Here is an example:Using
join
(before)index=_internal sourcetype=splunkd component=Metrics | stats count AS metric_count BY host | join host type=left [search index=_audit sourcetype=audittrail | stats count AS audit_count BY host] | table host metric_count audit_count
Using
stats
(after)(index=_internal sourcetype=splunkd component=Metrics) OR (index=_audit sourcetype=audittrail) | stats count(eval(sourcetype="splunkd")) AS metric_count count(eval(sourcetype="audittrail")) AS audit_count BY host
This technique can also be used in place of the
append
,dedup
, andtable
commands. - Minimize the amount of data coming back from the indexers.
To lower the amount of data coming back from the indexers, many articles recommend filtering your data early on.
While this does cut down on the number of events (vertical) that are retrieved, you should also focus on cutting down the number of fields (horizontal) that are retrieved.
By using the
fields
streaming command early on within your SPL, you not only lower the amount of data being pulled from the indexers, but also the amount that has to be transferred to and processed by the search head.Whenever possible, try using the
fields
command right after the first pipe of your SPL as shown below.<base query> |fields <field list> |fields - _raw
Here’s a real-life example of how impactful using the
fields
command can be.# of Fields Disk Usage Events Time Spent Query without use of fields 155 18458240 498478 166s Query with use of fields 18 5681152 498478 103s - Perform calculations on the smallest amount of data.
It’s most efficient to save calculations that use commands like
eval
,lookups
, andforeach
until after your data set has been made as succinct as possible through the previous steps. It’s also most efficient to combine commands whenever possible. For example, observe how you could combine the followingeval
statements into one comma-delimitedeval
statement.Before
… | eval var1="value1" | eval var2="value2" | eval var3="value3" …
After
… eval var1="value1", var2="value2", var3="value3" …
- Use non-streaming commands as late in the query as possible.
An additional query best practice is to save non-streaming, transforming commands for last. These are the commands that really give you the answers you’re looking for such as
stats
,chart
, andtimechart
.
Next steps
With the above tips in mind, here’s a sample query template to follow.
Step | SPL |
---|---|
Base query | base query |
Minimize data | fields <list of fields> |
Combine/Summarize data | use of stats for join /append /summarizations |
Run calculations | eval , lookup , etc |
Format the data | stats , chart , timechart , etc. |
But remember — every query is different, so think of these tips as guidelines rather than rules.
Next steps
If you've implemented the query writing tips in this article, but are still experiencing problems, try troubleshooting your queries using the Job Inspector. You can also read Optimizing search for advanced recommendations that go beyond inefficient search practices.
Need more help? Contact our Splunk Elite Partner, SP6. SP6 is a technology firm specializing in cybersecurity, CMMC compliance, and systems observability. SP6 has built North America’s largest Splunk Services team. Their team of cybersecurity and technology observability specialists ensures that the digital assets of customers are both protected and highly performant. SP6 delivers this expertise through both project-based Professional Services, as well as Managed Services for those organizations that can benefit from additional guidance.
The user- and community-generated information, content, data, text, graphics, images, videos, documents and other materials made available on Splunk Lantern is Community Content as provided in the terms and conditions of the Splunk Website Terms of Use, and it should not be implied that Splunk warrants, recommends, endorses or approves of any of the Community Content, nor is Splunk responsible for the availability or accuracy of such. Splunk specifically disclaims any liability and any actions resulting from your use of any information provided on Splunk Lantern.