Analyzing nested XML manufacturing QA data
Manufacturing Quality Assurance (QA) systems generate complex, nested XML data that presents unique challenges for analysis. These XML events can span 5,000 to 10,000 lines, containing both top-level metadata and multiple nested measurement results within a hierarchical structure.
This article shows you how to configure the Splunk platform to ingest, parse, and transform nested manufacturing QA XML data into actionable insights. You'll learn how to handle multi-line XML events that contain both top-level attributes and repeating nested measurement blocks, then flatten this structure. This enables detailed analysis of individual measurements, correlation with test sessions and station identifiers, and comprehensive QA performance monitoring.
Data required
About QA XML data formatting
Manufacturing QA test results arrive as single, long multi-line XML events, each representing a complete test session. These events typically span 5,000 to 10,000 lines and are characterized by:
- Top-level data attributes: XML elements containing overall test session information (for example, test run identifiers, dates, session IDs). These serve as metadata for the test session.
- Multiple nested measurement instances: Individual test measurement results embedded within
<Measurement>tags, each containing its own set of attributes such as designator, measurement ID, value, and status. The number of instances varies by product, creating a hierarchical structure beneath the top-level attributes.
Sample data structure
<IPC2547Event>
<ProcessSessionStart sessionId="SESSION-2024-001">
<Entity stationId="ST-001"/>
</ProcessSessionStart>
<ItemProcessStatus>
<ItemEventCount eventType="QA-TEST"/>
</ItemProcessStatus>
<Measurements>
<Measurement>
<designator>R101</designator>
<measurementId>M001</measurementId>
<value>85.3</value>
<units>um</units>
<type>DX</type>
<status>PASS</status>
</Measurement>
<Measurement>
<designator>C201</designator>
<measurementId>M002</measurementId>
<value>63.1</value>
<units>um</units>
<type>DY</type>
<status>PASS</status>
</Measurement>
</Measurements>
</IPC2547Event>
The challenge lies in extracting these nested measurement blocks as individual records while preserving their relationship to the parent session context.
How to use Splunk software for this use case
First, you'll apply specific configurations to the props.conf and transforms.conf files to enable proper XML field extraction and multi-value field creation. You'll then run a search to flatten the nested structure into a tabular format suitable for analysis within the Splunk platform.
props.conf configuration
The props.conf file controls how the Splunk platform processes XML data at ingestion and defines field extraction rules. Add these settings under the [qa_data_xml] stanza:
[qa_data_xml]
BREAK_ONLY_BEFORE = <IPC2547Event>
KV_MODE = xml
REPORT-parse_xml_multi_fields = parse_xml_multi_fields
REPORT-xml_short_rename = xml_short_rename
FIELDALIAS-eventType = "IPC2547Event.ItemProcessStatus.ItemEventCount{@eventType}" AS qa_xml_eventType
Setting explanations
BREAK_ONLY_BEFORE = <IPC2547Event>: Ensures the Splunk platform treats each occurrence of<IPC2547Event>as the beginning of a new event. Critical for correctly segmenting potentially large, multi-line input into logical events.KV_MODE = xml: Enables the Splunk platform's built-in XML key-value extraction, which automatically extracts fields from the top-level XML structure.REPORT-parse_xml_multi_fields = parse_xml_multi_fields: Links to a stanza in transforms.conf that extracts nested measurement data into a multi-value field.REPORT-xml_short_rename = xml_short_rename: Links to another transforms.conf stanza that renames or extracts specific fields from the XML.FIELDALIAS-eventType: Creates an aliasqa_xml_eventTypefor a specific XML attribute, making it easier to reference in searches.
transforms.conf configuration
The transforms.conf file defines regular expressions and formatting rules for field extraction. Add these settings:
[xml_short_rename]
REGEX = IPC2547Event\.ProcessSessionStart\{@sessionId\}=(\S+)
FORMAT = sessionId::$1
[parse_xml_multi_fields]
REGEX = (?s)(?<qa_xml_measurement_all><Measurement.*?<\/Measurement>)
MV_ADD = true
Transform explanations
- [xml_short_rename]:
REGEX: Captures thesessionIdattribute from theProcessSessionStartelement. The(\S+)group captures non-whitespace characters as the session ID value.FORMAT = sessionId::$1: Assigns the captured value to a new field namedsessionId.
- [parse_xml_multi_fields]: Critical for nested data handling.
REGEX: Uses the(?s)flag (DOTALL) to match across newlines, capturing entire<Measurement>...</Measurement>blocks into theqa_xml_measurement_allfield.MV_ADD = true: Createsqa_xml_measurement_allas a multi-value field, adding each matched measurement block as a separate value.
SPL commands for processing nested XML data
Run the following search to flatten the nested XML structure and extract individual measurement fields. You can optimize it by specifying an index and adjusting the time range.
sourcetype="qa_data_xml" | table qa_xml_stationId qa_xml_sessionId qa_xml_measurement_all | mvexpand qa_xml_measurement_all | eval _raw=qa_xml_measurement_all | extract access-extractions | fields - _raw qa_xml_measurement_all
Search explanation
The table provides an explanation of what each part of this search achieves. You can adjust this search based on the specifics of your XML structure.
| Splunk Search | Explanation |
|---|---|
|
|
Selects events that have been assigned the |
|
|
Displays only the specified fields: |
|
|
The pivotal flattening command. Transforms each value within the multi-value |
|
|
Overwrites the |
|
|
Extracts additional fields from the modified |
|
|
Removes temporary fields ( |
Results
The search produces a flattened dataset where each row represents an individual measurement with preserved top-level session information:
| qa_xml_stationId | qa_xml_sessionId | designator | measurementId | type | units | value | status |
|---|---|---|---|---|---|---|---|
| ST-001 | SESSION-2024-001 | R101 | M001 | DX | um | 85.3 | PASS |
| ST-001 | SESSION-2024-001 | C201 | M002 | DY | um | 63.1 | PASS |
| ST-001 | SESSION-2024-001 | L301 | M003 | DTheta | deg | 1.25 | PASS |
| ST-001 | SESSION-2024-001 | U401 | M004 | 2DSolderJointAverageGreyLevel | GreyLevel (0->255) | 78.2 | PASS |
This structure enables direct analysis of individual measurement results with their associated test session context.
Next steps
Now that your nested XML data has been flattened and transformed, you can perform further analysis, such as:
- Performance monitoring: Track key metrics for individual measurements over time to identify trends or regressions in manufacturing quality.
- Anomaly detection: Identify outliers in test results that may indicate manufacturing defects or process deviations.
- Root cause analysis: Correlate specific measurement failures with top-level attributes like
stationIdorsessionIdto pinpoint problematic equipment or batches. - Quality control reporting: Generate comprehensive reports and dashboards summarizing pass/fail rates, measurement distributions, and other quality indicators.
- Trend visualization: Create bar charts to compare performance across different test stations, timecharts to observe measurement value trends, or scatter plots to identify correlations between various parameters.
In addition, these resources might help you understand and implement this guidance:
- Splunk Lantern Article: Analyzing nested manufacturing QA data
- Splunk Lantern Article: Analyzing nested JSON manufacturing QA data

