Analyzing nested manufacturing QA data

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Manufacturing Quality Assurance (QA) systems generate complex, hierarchical test data that can be challenging to analyze in traditional formats. Each product's test results can span hundreds to thousands of lines in a single event. This data contains both high-level test run information and detailed individual measurement results in a nested structure.

This article shows you how to configure the Splunk platform to ingest, parse, and transform nested manufacturing QA data into actionable insights. You'll learn how to handle multi-line events that contain both top-level metadata and repeating nested test measurements, then flatten this structure. This enables detailed analysis of individual measurements, correlation with product identifiers and test conditions, and QA performance trend monitoring.

Data required

Operational technology data

About QA test data formatting

Manufacturing QA test results from automated test equipment arrive as long, multi-line events, each representing a single product's complete test results. These events can span hundreds to thousands of lines and contain two types of data:

Top-level data attributes: Key-value pairs with overall test run information (timestamp, unit serial number, process location/ID). These serve as metadata for the processing unit.
Nested data instances: Individual test measurement results enclosed in XML tags, JSON payloads "{ }", or other customer-specific formats. Each instance contains multiple readings and process run statuses. The number of instances varies by unit, creating a hierarchical structure beneath the top-level attributes.

Sample data structure

Start_Test
Start_Header
TIME_START 10/10/2025 08:07:00
UNIT_ID 1234567890
PROD_TYPE |ZTT177|01.04.98
VERSION 876543_00 | 876543
PROCESS_ID 11-56
OVERALL_RESULT PASS
End_Header
Start_Units
UNIT_NAME ANON_UNIT_NAME_1
{ANON_TEST_1, Bad material, 0.0 , 0.0 , 0.0 , 1.0 , PASS}
{ANON_TEST_2, CONT, 0.0 , 0.0 , 0.0 , 0.0 , PASS}
{ANON_TEST_3, CONT, 0.0 , 0.0 , 0.0 , 0.0 , PASS}
{ANON_TEST_4, 1, 0.0 Ohm, 15.0 Ohm, 10.0 Ohm, 100.0 Ohm, PASS}
{ANON_TEST_5, CONT1, 5.0 Ohm, 1.8967 Ohm, Open, 10.0 Ohm, PASS}
{ANON_TEST_6, CONT1, 5.0 Ohm, 1.217 Ohm, Open, 20.0 Ohm, PASS}
{ANON_TEST_7, CONT1, 5.0 Ohm, 1.1515 Ohm, Open, 20.0 Ohm, PASS}
{ANON_TEST_8, OPEN, 10.0 KOhm, 9.143 KOhm, 5.0 KOhm, Open, PASS}
{ANON_TEST_9, CONT, 3.0 Ohm, 1.1191 Ohm, Open, 5.0 Ohm, PASS}
{ANON_TEST_10, CONT, 2.0 Ohm, 1.0938 Ohm, Open, 3.0 Ohm, PASS}
{ANON_TEST_11, CONT, 2.0 Ohm, 1.1913 Ohm, Open, 3.0 Ohm, PASS}
{ANON_TEST_12, CONT, 2.0 Ohm, 0.9805 Ohm, Open, 3.0 Ohm, PASS}
{ANON_TEST_13, CONT, 2.0 Ohm, 1.0428 Ohm, Open, 3.0 Ohm, PASS}
End_Units
End_Test

The challenge lies in parsing these large events to extract both top-level context and detailed nested measurements.

How to use Splunk software for this use case

First, you'll apply some specific configurations to the props.conf and transforms.conf files within your environment. You'll then run a search to transform the data into a flat, tabular format suitable for analysis within the Splunk platform while preserving relationships between metadata and nested records.

props.conf configuration

The props.conf file controls how the Splunk platform processes raw data at ingestion. Add these settings under the [qa_data_nested_hierarchies] stanza:

[qa_data_nested_hierarchies]
BREAK_ONLY_BEFORE = Start_Test
MAX_EVENTS = 10000
TRUNCATE = 0
LINE_BREAKER = Start_Test
TIME_PREFIX = TIME_START\s+
REPORT-qa_data_nested_kv_extract = qa_data_nested_kv_extract
REPORT-qa_data_nested_01 = qa_data_nested_all

Setting explanations

BREAK_ONLY_BEFORE = Start_Test: Treats lines starting with Start_Test as new event boundaries. Ensures proper event breaking if multiple test blocks appear in one file.
MAX_EVENTS = 10000: Sets the maximum number of lines per event.
TRUNCATE = 0: Ensures that the Splunk platform ingests the entire event regardless of length. Prevents truncation of events with 5,000+ lines.
LINE_BREAKER = Start_Test: Regular expression defining event break locations. Works with BREAK_ONLY_BEFORE to establish event boundaries.
TIME_PREFIX = TIME_START\s+: Identifies the string preceding timestamps, enabling correct time parsing.
REPORT-qa_data_nested_kv_extract = qa_data_nested_kv_extract: Applies the qa_data_nested_kv_extract transform to extract initial key-value pairs.
REPORT-qa_data_nested_01 = qa_data_nested_all: Applies the qa_data_nested_all transform to extract nested data blocks.

transforms.conf configuration

The transforms.conf file defines custom transformations for events. Add these settings under the [qa_data_nested_kv_extract] and [qa_data_nested_all] stanzas:

[qa_data_nested_kv_extract]
KV_MODE = auto
REGEX = ^\s*(?<_KEY_1>[^\{]\S+)\s+(?<_VAL_1>\S+)$

[qa_data_nested_all]
REGEX = (?P\{[^\}]+\})
MV_ADD = true

Transform explanations

[qa_data_nested_kv_extract]: Extracts higher-level metadata attached to each nested record.
- KV_MODE = auto: Automatically extracts key-value pairs using the REGEX pattern.
- REGEX = ^\s*(?<_KEY_1>[^\{]\S+)\s+(?<_VAL_1>\S+)$: Captures key-value pairs at line beginnings, excluding lines starting with curly braces.
[qa_data_nested_all]: Extracts individual nested records.
- REGEX = (?P \{[^\}]+\}): Identifies and extracts all text within curly braces {...}. Places extracted content into the qa_data_nested_all field.
- MV_ADD = true: Critical for nested data handling. Creates a multi-value field when multiple {...} blocks exist, preserving all nested instances for processing.

SPL commands for processing nested data

Run the following search to refine the data, flatten the nested structure, and extract individual fields from measurements. You can optimize it by specifying an index and adjusting the time range.

sourcetype="mfg_qa_nested_event"
| rex field=source "(?[^\/]+)\.(mdl|tdx)"
| stats values(*) AS * BY file_name
| table file_name SerialnumberDC UName BID TS FIX qa_data_nested_all
| mvexpand qa_data_nested_all
| rex field=qa_data_nested_all "{(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,^\}]+)}"
| eval _raw=file_name+"|"+SerialnumberDC+"|"+UName+"|"+BID+"|"+TS+"|"+FIX+"|"+qa_data_nested_all
| fields - qa_data_nested_all _raw

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this search based on the specifics of your environment.

Splunk Search	Explanation
`sourcetype="mfg_qa_nested_event"`	Filters events from the manufacturing QA data source type.
`rex field=source "(?[^\/]+)\.(mdl\|tdx)"`	Extracts `file_name` from the source field as metadata. Demonstrates extracting higher-level values from default Splunk platform metadata. The `file_name` often contains unique identifiers (`product_line`, `production_line`) corresponding to test runs or products.
`stats values() AS BY file_name`	Groups events by `file_name` while retaining all field values. Critical for maintaining association between top-level attributes and nested results after flattening.
`table file_name SerialnumberDC UName BID TS FIX qa_data_nested_all`	Selects relevant fields for processing. `qa_data_nested_all` contains the raw nested data blocks extracted by transforms.conf.
`mvexpand qa_data_nested_all`	The pivotal flattening command. Takes the multi-value field `qa_data_nested_all` and creates separate events for each value. For example, 100 nested measurement blocks become 100 new events, each with duplicated top-level attributes.
`rex field=qa_data_nested_all "{(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,]+),(?[^\,^\}]+)}"`	Parses each `qa_data_nested_all` value into seven named fields (`field_1` through `field_7`). For example: `{Panel SetSerialNumbers\|MesItac_DoMerge, Wrong material?, 0.0 , 0.0 , 0.0 , 1.0 , PASS}`
`eval _raw=file_name+"\|"+SerialnumberDC+"\|"+UName+"\|"+BID+"\|"+TS+"\|"+FIX+"\|"+qa_data_nested_all`	Reconstructs the `_raw` field by concatenating key identifiers and `qa_data_nested_all`. Useful for debugging or custom event representation. Optional based on output requirements.
`fields - qa_data_nested_all _raw`	Removes the original `qa_data_nested_all` field (now parsed into field_1-7) and reconstructed `_raw` field, leaving only structured data.

Results

The search produces a flattened dataset where each row represents an individual measurement with preserved top-level attributes:

#	file_name	SerialnumberDC	UName	BID	TS	FIX	field_1	field_2	field_3	field_4	field_5	field_6	field_7
1	1010023314361000-...	1010023314361000	Unit_101...	1010023314361000	HWS\|MTS300...	HWBA\|I99;...	Panel SetSerialNumbers\|MesItac_DoMerge	Wrong material?	0.0	0.0	0.0	1.0	PASS
49	1010023314361000-...	1010023314361000	Unit_101...	1010023314361000	HWS\|MTS300...	HWBA\|I99;...	CONT DCDC_3V3 @P6 P102 P133	CONT	0.0	0.0	0.0	0.0	PASS
189	1010023314361000-...	1010023314361000	Unit_101...	1010023314361000	HWS\|MTS300...	HWBA\|I99;...	NET short test	1	0.0 Ohm	15.0 Ohm	10.0 Ohm	100.0 Ohm	PASS
...	...	...	...	...	...	...	...	...	...	...	...	...	...

This structure enables direct analysis of test parameters (field_1 to field_7) for specific products using the preserved header information (file_name, SerialnumberDC, BID, etc.).

Next steps

Now that your data has been restructured and transformed you can perform further analysis, such as:

Statistical analysis: Calculate averages, standard deviations, or ranges for measurements (field_3 to field_7) to identify deviations from norms.
Trend monitoring: Track test result changes over time for products or batches to detect performance degradation or process shifts.
Anomaly detection: Create alerts for test failures (field_7 = "FAIL") or out-of-spec measurements (field_4, field_5, field_6 outside thresholds).
Root cause analysis: Correlate test results with operational data (machine logs, operator actions) to identify quality issue causes.
Dashboard creation: Build interactive dashboards to visualize QA performance, highlight failing tests, and provide quality overviews.

In addition, these resources might help you understand and implement this guidance:

Splunk Lantern Article: Analyzing nested JSON manufacturing QA data
Splunk Lantern Article: Analyzing nested XML manufacturing QA data
Splunk OnDemand Services: Use these credit-based services for direct access to Splunk technical consultants with a variety of technical services from a pre-defined catalog. Most customers have OnDemand Services per their Success Plan. Engage the ODS team at ondemand@cisco.com if you would like assistance.