Analyzing nested XML manufacturing QA data

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Manufacturing Quality Assurance (QA) systems generate complex, nested XML data that presents unique challenges for analysis. These XML events can span 5,000 to 10,000 lines, containing both top-level metadata and multiple nested measurement results within a hierarchical structure.

This article shows you how to configure the Splunk platform to ingest, parse, and transform nested manufacturing QA XML data into actionable insights. You'll learn how to handle multi-line XML events that contain both top-level attributes and repeating nested measurement blocks, then flatten this structure. This enables detailed analysis of individual measurements, correlation with test sessions and station identifiers, and comprehensive QA performance monitoring.

Data required

Operational technology data

About QA XML data formatting

Manufacturing QA test results arrive as single, long multi-line XML events, each representing a complete test session. These events typically span 5,000 to 10,000 lines and are characterized by:

Top-level data attributes: XML elements containing overall test session information (for example, test run identifiers, dates, session IDs). These serve as metadata for the test session.
Multiple nested measurement instances: Individual test measurement results embedded within <Measurement> tags, each containing its own set of attributes such as designator, measurement ID, value, and status. The number of instances varies by product, creating a hierarchical structure beneath the top-level attributes.

Sample data structure

<IPC2547Event>
  <ProcessSessionStart sessionId="SESSION-2024-001">
    <Entity stationId="ST-001"/>
  </ProcessSessionStart>
  <ItemProcessStatus>
    <ItemEventCount eventType="QA-TEST"/>
  </ItemProcessStatus>
  <Measurements>
    <Measurement>
      <designator>R101</designator>
      <measurementId>M001</measurementId>
      <value>85.3</value>
      <units>um</units>
      <type>DX</type>
      <status>PASS</status>
    </Measurement>
    <Measurement>
      <designator>C201</designator>
      <measurementId>M002</measurementId>
      <value>63.1</value>
      <units>um</units>
      <type>DY</type>
      <status>PASS</status>
    </Measurement>
  </Measurements>
</IPC2547Event>

The challenge lies in extracting these nested measurement blocks as individual records while preserving their relationship to the parent session context.

How to use Splunk software for this use case

First, you'll apply specific configurations to the props.conf and transforms.conf files to enable proper XML field extraction and multi-value field creation. You'll then run a search to flatten the nested structure into a tabular format suitable for analysis within the Splunk platform.

props.conf configuration

The props.conf file controls how the Splunk platform processes XML data at ingestion and defines field extraction rules. Add these settings under the [qa_data_xml] stanza:

[qa_data_xml]
BREAK_ONLY_BEFORE = <IPC2547Event>
KV_MODE = xml
REPORT-parse_xml_multi_fields = parse_xml_multi_fields
REPORT-xml_short_rename = xml_short_rename
FIELDALIAS-eventType = "IPC2547Event.ItemProcessStatus.ItemEventCount{@eventType}" AS qa_xml_eventType

Setting explanations

BREAK_ONLY_BEFORE = <IPC2547Event>: Ensures the Splunk platform treats each occurrence of <IPC2547Event> as the beginning of a new event. Critical for correctly segmenting potentially large, multi-line input into logical events.
KV_MODE = xml: Enables the Splunk platform's built-in XML key-value extraction, which automatically extracts fields from the top-level XML structure.
REPORT-parse_xml_multi_fields = parse_xml_multi_fields: Links to a stanza in transforms.conf that extracts nested measurement data into a multi-value field.
REPORT-xml_short_rename = xml_short_rename: Links to another transforms.conf stanza that renames or extracts specific fields from the XML.
FIELDALIAS-eventType: Creates an alias qa_xml_eventType for a specific XML attribute, making it easier to reference in searches.

transforms.conf configuration

The transforms.conf file defines regular expressions and formatting rules for field extraction. Add these settings:

[xml_short_rename]
REGEX = IPC2547Event\.ProcessSessionStart\{@sessionId\}=(\S+)
FORMAT = sessionId::$1

[parse_xml_multi_fields]
REGEX = (?s)(?<qa_xml_measurement_all><Measurement.*?<\/Measurement>)
MV_ADD = true

Transform explanations

[xml_short_rename]:
- REGEX: Captures the sessionId attribute from the ProcessSessionStart element. The (\S+) group captures non-whitespace characters as the session ID value.
- FORMAT = sessionId::$1: Assigns the captured value to a new field named sessionId.
[parse_xml_multi_fields]: Critical for nested data handling.
- REGEX: Uses the (?s) flag (DOTALL) to match across newlines, capturing entire <Measurement>...</Measurement> blocks into the qa_xml_measurement_all field.
- MV_ADD = true: Creates qa_xml_measurement_all as a multi-value field, adding each matched measurement block as a separate value.

SPL commands for processing nested XML data

Run the following search to flatten the nested XML structure and extract individual measurement fields. You can optimize it by specifying an index and adjusting the time range.

sourcetype="qa_data_xml"
| table qa_xml_stationId qa_xml_sessionId qa_xml_measurement_all
| mvexpand qa_xml_measurement_all
| eval _raw=qa_xml_measurement_all
| extract access-extractions
| fields - _raw qa_xml_measurement_all

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this search based on the specifics of your XML structure.

Splunk Search	Explanation
`sourcetype="qa_data_xml"`	Selects events that have been assigned the `qa_data_xml` source type, ensuring that the props.conf and transforms.conf configurations are applied.
`table qa_xml_stationId qa_xml_sessionId qa_xml_measurement_all`	Displays only the specified fields: `qa_xml_stationId` and `qa_xml_sessionId` (top-level attributes extracted by `KV_MODE=xml` and `xml_short_rename`), and `qa_xml_measurement_all` (the multi-value field containing raw XML of each measurement).
`mvexpand qa_xml_measurement_all`	The pivotal flattening command. Transforms each value within the multi-value `qa_xml_measurement_all` field into a separate event. If an original event contained 10 measurement blocks, `mvexpand` creates 10 new events, each representing one measurement while retaining top-level context.
`eval _raw=qa_xml_measurement_all`	Overwrites the `_raw` field with the content of `qa_xml_measurement_all` for each new event, making each measurement's XML content accessible for subsequent processing.
`extract access-extractions`	Extracts additional fields from the modified `_raw` field. Assumes `access-extractions` refers to a pre-defined field extraction configuration for parsing the XML structure within each `<Measurement>` block, populating fields like `designator`, `measurementId`, `value`, and `status`.
`fields - _raw qa_xml_measurement_all`	Removes temporary fields (`_raw` and `qa_xml_measurement_all`), leaving only the newly extracted fields and original top-level fields for cleaner output.

Results

The search produces a flattened dataset where each row represents an individual measurement with preserved top-level session information:

qa_xml_stationId	qa_xml_sessionId	designator	measurementId	type	units	value	status
ST-001	SESSION-2024-001	R101	M001	DX	um	85.3	PASS
ST-001	SESSION-2024-001	C201	M002	DY	um	63.1	PASS
ST-001	SESSION-2024-001	L301	M003	DTheta	deg	1.25	PASS
ST-001	SESSION-2024-001	U401	M004	2DSolderJointAverageGreyLevel	GreyLevel (0->255)	78.2	PASS

This structure enables direct analysis of individual measurement results with their associated test session context.

Next steps

Now that your nested XML data has been flattened and transformed, you can perform further analysis, such as:

Performance monitoring: Track key metrics for individual measurements over time to identify trends or regressions in manufacturing quality.
Anomaly detection: Identify outliers in test results that may indicate manufacturing defects or process deviations.
Root cause analysis: Correlate specific measurement failures with top-level attributes like stationId or sessionId to pinpoint problematic equipment or batches.
Quality control reporting: Generate comprehensive reports and dashboards summarizing pass/fail rates, measurement distributions, and other quality indicators.
Trend visualization: Create bar charts to compare performance across different test stations, timecharts to observe measurement value trends, or scatter plots to identify correlations between various parameters.

In addition, these resources might help you understand and implement this guidance:

Splunk Lantern Article: Analyzing nested manufacturing QA data
Splunk Lantern Article: Analyzing nested JSON manufacturing QA data
Splunk OnDemand Services: Use these credit-based services for direct access to Splunk technical consultants with a variety of technical services from a pre-defined catalog. Most customers have OnDemand Services per their Success Plan. Engage the ODS team at ondemand@cisco.com if you would like assistance.