Skip to main content
Splunk Lantern is a nominee for Knowledge Innovation and Knowledge Management in the CXOne Customer Recognition Awards. Click here to vote for us!

 

Splunk Lantern

Analyzing nested manufacturing QA data

Manufacturing Quality Assurance (QA) systems generate complex, hierarchical test data that can be challenging to analyze in traditional formats. Each product's test results can span hundreds to thousands of lines in a single event. This data contains both high-level test run information and detailed individual measurement results in a nested structure.

This article shows you how to configure the Splunk platform to ingest, parse, and transform nested manufacturing QA data into actionable insights. You'll learn how to handle multi-line events that contain both top-level metadata and repeating nested test measurements, then flatten this structure. This enables detailed analysis of individual measurements, correlation with product identifiers and test conditions, and QA performance trend monitoring.

About QA test data formatting

Manufacturing QA test results from automated test equipment arrive as long, multi-line events, each representing a single product's complete test results. These events can span hundreds to thousands of lines and contain two types of data:

  • Top-level data attributes: Key-value pairs with overall test run information (timestamp, unit serial number, process location/ID). These serve as metadata for the processing unit.
  • Nested data instances: Individual test measurement results enclosed in XML tags, JSON payloads "{ }", or other customer-specific formats. Each instance contains multiple readings and process run statuses. The number of instances varies by unit, creating a hierarchical structure beneath the top-level attributes.

Sample data structure

Start_Test
Start_Header
TIME_START 10/10/2025 08:07:00
UNIT_ID 1234567890
PROD_TYPE |ZTT177|01.04.98
VERSION 876543_00 | 876543
PROCESS_ID 11-56
OVERALL_RESULT PASS
End_Header
Start_Units
UNIT_NAME ANON_UNIT_NAME_1
{ANON_TEST_1, Bad material, 0.0 , 0.0 , 0.0 , 1.0 , PASS}
{ANON_TEST_2, CONT, 0.0 , 0.0 , 0.0 , 0.0 , PASS}
{ANON_TEST_3, CONT, 0.0 , 0.0 , 0.0 , 0.0 , PASS}
{ANON_TEST_4, 1, 0.0 Ohm, 15.0 Ohm, 10.0 Ohm, 100.0 Ohm, PASS}
{ANON_TEST_5, CONT1, 5.0 Ohm, 1.8967 Ohm, Open, 10.0 Ohm, PASS}
{ANON_TEST_6, CONT1, 5.0 Ohm, 1.217 Ohm, Open, 20.0 Ohm, PASS}
{ANON_TEST_7, CONT1, 5.0 Ohm, 1.1515 Ohm, Open, 20.0 Ohm, PASS}
{ANON_TEST_8, OPEN, 10.0 KOhm, 9.143 KOhm, 5.0 KOhm, Open, PASS}
{ANON_TEST_9, CONT, 3.0 Ohm, 1.1191 Ohm, Open, 5.0 Ohm, PASS}
{ANON_TEST_10, CONT, 2.0 Ohm, 1.0938 Ohm, Open, 3.0 Ohm, PASS}
{ANON_TEST_11, CONT, 2.0 Ohm, 1.1913 Ohm, Open, 3.0 Ohm, PASS}
{ANON_TEST_12, CONT, 2.0 Ohm, 0.9805 Ohm, Open, 3.0 Ohm, PASS}
{ANON_TEST_13, CONT, 2.0 Ohm, 1.0428 Ohm, Open, 3.0 Ohm, PASS}
End_Units
End_Test

The challenge lies in parsing these large events to extract both top-level context and detailed nested measurements.

How to use Splunk software for this use case

First, you'll apply some specific configurations to the props.conf and transforms.conf files within your environment. You'll then run a search to transform the data into a flat, tabular format suitable for analysis within the Splunk platform while preserving relationships between metadata and nested records.

props.conf configuration

The props.conf file controls how the Splunk platform processes raw data at ingestion. Add these settings under the [qa_data_nested_hierarchies] stanza:

[qa_data_nested_hierarchies]
BREAK_ONLY_BEFORE = Start_Test
MAX_EVENTS = 10000
TRUNCATE = 0
LINE_BREAKER = Start_Test
TIME_PREFIX = TIME_START\s+
REPORT-qa_data_nested_kv_extract = qa_data_nested_kv_extract
REPORT-qa_data_nested_01 = qa_data_nested_all

Setting explanations

  • BREAK_ONLY_BEFORE = Start_Test: Treats lines starting with Start_Test as new event boundaries. Ensures proper event breaking if multiple test blocks appear in one file.
  • MAX_EVENTS = 10000: Sets the maximum number of lines per event.
  • TRUNCATE = 0: Ensures that the Splunk platform ingests the entire event regardless of length. Prevents truncation of events with 5,000+ lines.
  • LINE_BREAKER = Start_Test: Regular expression defining event break locations. Works with BREAK_ONLY_BEFORE to establish event boundaries.
  • TIME_PREFIX = TIME_START\s+: Identifies the string preceding timestamps, enabling correct time parsing.
  • REPORT-qa_data_nested_kv_extract = qa_data_nested_kv_extract: Applies the qa_data_nested_kv_extract transform to extract initial key-value pairs.
  • REPORT-qa_data_nested_01 = qa_data_nested_all: Applies the qa_data_nested_all transform to extract nested data blocks.

transforms.conf configuration

The transforms.conf file defines custom transformations for events. Add these settings under the [qa_data_nested_kv_extract] and [qa_data_nested_all] stanzas:

[qa_data_nested_kv_extract]
KV_MODE = auto
REGEX = ^\s*(?<_KEY_1>[^\{]\S+)\s+(?<_VAL_1>\S+)$

[qa_data_nested_all]
REGEX = (?P\{[^\}]+\})
MV_ADD = true

Transform explanations

  • [qa_data_nested_kv_extract]: Extracts higher-level metadata attached to each nested record.
    • KV_MODE = auto: Automatically extracts key-value pairs using the REGEX pattern.
    • REGEX = ^\s*(?<_KEY_1>[^\{]\S+)\s+(?<_VAL_1>\S+)$: Captures key-value pairs at line beginnings, excluding lines starting with curly braces.
  • [qa_data_nested_all]: Extracts individual nested records.
    • REGEX = (?P \{[^\}]+\}): Identifies and extracts all text within curly braces {...}. Places extracted content into the qa_data_nested_all field.
    • MV_ADD = true: Critical for nested data handling. Creates a multi-value field when multiple {...} blocks exist, preserving all nested instances for processing.

SPL commands for processing nested data

Run the following search to refine the data, flatten the nested structure, and extract individual fields from measurements. You can optimize it by specifying an index and adjusting the time range.

sourcetype="mfg_qa_nested_event"
| rex field=source "(?[^\/]+)\.(mdl|tdx)"
| stats values(*) AS * BY file_name
| table file_name SerialnumberDC UName BID TS FIX qa_data_nested_all
| mvexpand qa_data_nested_all
| rex field=qa_data_nested_all "{(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,^\}]+)}"
| eval _raw=file_name+"|"+SerialnumberDC+"|"+UName+"|"+BID+"|"+TS+"|"+FIX+"|"+qa_data_nested_all
| fields - qa_data_nested_all _raw

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this search based on the specifics of your environment.

Splunk Search Explanation

sourcetype="mfg_qa_nested_event"

Filters events from the manufacturing QA data source type.

rex field=source "(?[^\/]+)\.(mdl|tdx)"

Extracts file_name from the source field as metadata. Demonstrates extracting higher-level values from default Splunk platform metadata. The file_name often contains unique identifiers (product_line, production_line) corresponding to test runs or products.

stats values(*) AS * BY file_name

Groups events by file_name while retaining all field values. Critical for maintaining association between top-level attributes and nested results after flattening.

table file_name SerialnumberDC UName BID TS FIX qa_data_nested_all

Selects relevant fields for processing. qa_data_nested_all contains the raw nested data blocks extracted by transforms.conf.

mvexpand qa_data_nested_all

The pivotal flattening command. Takes the multi-value field qa_data_nested_all and creates separate events for each value. For example, 100 nested measurement blocks become 100 new events, each with duplicated top-level attributes.

rex field=qa_data_nested_all "{(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,^\}]+)}"

Parses each qa_data_nested_all value into seven named fields (field_1 through field_7). For example: {Panel SetSerialNumbers|MesItac_DoMerge, Wrong material?, 0.0 , 0.0 , 0.0 , 1.0 , PASS}

eval _raw=file_name+"|"+SerialnumberDC+"|"+UName+"|"+BID+"|"+TS+"|"+FIX+"|"+qa_data_nested_all

Reconstructs the _raw field by concatenating key identifiers and qa_data_nested_all. Useful for debugging or custom event representation. Optional based on output requirements.

fields - qa_data_nested_all _raw

Removes the original qa_data_nested_all field (now parsed into field_1-7) and reconstructed _raw field, leaving only structured data.

Results

The search produces a flattened dataset where each row represents an individual measurement with preserved top-level attributes:

# file_name SerialnumberDC UName BID TS FIX field_1 field_2 field_3 field_4 field_5 field_6 field_7
1 1010023314361000-... 1010023314361000 Unit_101... 1010023314361000 HWS|MTS300... HWBA|I99;... Panel SetSerialNumbers|MesItac_DoMerge Wrong material? 0.0 0.0 0.0 1.0 PASS
49 1010023314361000-... 1010023314361000 Unit_101... 1010023314361000 HWS|MTS300... HWBA|I99;... CONT DCDC_3V3 @P6 P102 P133 CONT 0.0 0.0 0.0 0.0 PASS
189 1010023314361000-... 1010023314361000 Unit_101... 1010023314361000 HWS|MTS300... HWBA|I99;... NET short test 1 0.0 Ohm 15.0 Ohm 10.0 Ohm 100.0 Ohm PASS
... ... ... ... ... ... ... ... ... ... ... ... ... ...

This structure enables direct analysis of test parameters (field_1 to field_7) for specific products using the preserved header information (file_name, SerialnumberDC, BID, etc.).

Next steps

Now that your data has been restructured and transformed you can perform further analysis, such as:

  • Statistical analysis: Calculate averages, standard deviations, or ranges for measurements (field_3 to field_7) to identify deviations from norms.
  • Trend monitoring: Track test result changes over time for products or batches to detect performance degradation or process shifts.
  • Anomaly detection: Create alerts for test failures (field_7 = "FAIL") or out-of-spec measurements (field_4, field_5, field_6 outside thresholds).
  • Root cause analysis: Correlate test results with operational data (machine logs, operator actions) to identify quality issue causes.
  • Dashboard creation: Build interactive dashboards to visualize QA performance, highlight failing tests, and provide quality overviews.

In addition, these resources might help you understand and implement this guidance: