Checking the quality of your data sources

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

There might be times when you want to check the quality of your data sources to ensure that your source types are parsing properly. Incorrect line breaking, timestamp parsing problems, and aggregation problems can cause issues with searches and make it harder for you to get value from your data.

You should ideally check your data in a test instance or environment before implementing in production.

Searching on your data

You can check if your data is being parsed properly by searching on it, using the index and source type that your data source applies to.

Run the below search in your environment, with a timeframe of at least the last 15 minutes. This search is a modified version of a search from Splunk Monitoring Console > Indexing > Inputs > Data Quality. This search has been modified so that you can run this on any of your search heads:

index=_internal splunk_server=* source=*splunkd.log* splunk_server=* (log_level=ERROR OR log_level=WARN) (component=AggregatorMiningProcessor OR component=DateParserVerbose OR component=LineBreakingProcessor)
| rex field=event_message "Context: source(::|=)(?<context_source>[^\\|]*?)\\|host(::|=)(?<context_host>[^\\|]*?)\\|(?<context_sourcetype>[^\\|]*?)\\|"
| eval data_source=if((isnull(data_source) AND isnotnull(context_source)),context_source,data_source), data_host=if((isnull(data_host) AND isnotnull(context_host)),context_host,data_host), data_sourcetype=if((isnull(data_sourcetype) AND isnotnull(context_sourcetype)),context_sourcetype,data_sourcetype)
| stats count(eval(component=="LineBreakingProcessor" OR component=="DateParserVerbose" OR component=="AggregatorMiningProcessor")) as total_issues dc(data_host) AS "Host Count" dc(data_source) AS "Source Count" count(eval(component=="LineBreakingProcessor")) AS "Line Breaking Issues" count(eval(component=="DateParserVerbose")) AS "Timestamp Parsing Issues" count(eval(component=="AggregatorMiningProcessor")) AS "Aggregation Issues" by data_sourcetype
| sort - total_issues
| rename data_sourcetype as Sourcetype, total_issues as "Total Issues"

The results of the search should look like this, showing the number of line breaking issues, timestamp parsing issues, and aggregation issues for each of your source types.

You can drill down into the data by clicking on one of the numbers in the columns.

Adjusting settings in props

This section only applies to Splunk Enterprise users.

To correctly parse your data, Splunk recommends that you always have the following settings in your props.conf:

On Splunk Enterprise:

LINE_BREAKER
SHOULD_LINEMERGE
MAX_TIMESTAMP_LOOKAHEAD
TRUNCATE
TIME_FORMAT
TIME_PREFIX

On the universal forwarder:

EVENT_BREAKER_ENABLE
EVENT_BREAKER

Next steps

These resources might help you understand and implement this guidance:

TekStream Blog: Data onboarding in Splunk
Product Tip: Improving data pipeline processing in Splunk Enterprise

Want to learn more about improving data quality? Contact us today! TekStream accelerates clients’ digital transformation by navigating complex technology environments with a combination of technical expertise and staffing solutions. We guide clients’ decisions, quickly implement the right technologies with the right people, and keep them running for sustainable growth. Our battle-tested processes and methodology help companies with legacy systems get to the cloud faster, so they can be agile, reduce costs, and improve operational efficiencies. And with hundreds of deployments under our belt, we can guarantee on-time and on-budget project delivery. That’s why 97% of clients are repeat customers.

The user- and community-generated information, content, data, text, graphics, images, videos, documents and other materials made available on Splunk Lantern is Community Content as provided in the terms and conditions of the Splunk Website Terms of Use, and it should not be implied that Splunk warrants, recommends, endorses or approves of any of the Community Content, nor is Splunk responsible for the availability or accuracy of such. Splunk specifically disclaims any liability and any actions resulting from your use of any information provided on Splunk Lantern.