Skip to main content
Splunk Lantern is a nominee for Knowledge Innovation and Knowledge Management in the CXOne Customer Recognition Awards. Click here to vote for us!

 

Splunk Lantern

Analyzing nested XML manufacturing QA data

Manufacturing Quality Assurance (QA) systems generate complex, nested XML data that presents unique challenges for analysis. These XML events can span 5,000 to 10,000 lines, containing both top-level metadata and multiple nested measurement results within a hierarchical structure.

This article shows you how to configure the Splunk platform to ingest, parse, and transform nested manufacturing QA XML data into actionable insights. You'll learn how to handle multi-line XML events that contain both top-level attributes and repeating nested measurement blocks, then flatten this structure. This enables detailed analysis of individual measurements, correlation with test sessions and station identifiers, and comprehensive QA performance monitoring.

About QA XML data formatting

Manufacturing QA test results arrive as single, long multi-line XML events, each representing a complete test session. These events typically span 5,000 to 10,000 lines and are characterized by:

  • Top-level data attributes: XML elements containing overall test session information (for example, test run identifiers, dates, session IDs). These serve as metadata for the test session.
  • Multiple nested measurement instances: Individual test measurement results embedded within <Measurement> tags, each containing its own set of attributes such as designator, measurement ID, value, and status. The number of instances varies by product, creating a hierarchical structure beneath the top-level attributes.

Sample data structure

<IPC2547Event>
  <ProcessSessionStart sessionId="SESSION-2024-001">
    <Entity stationId="ST-001"/>
  </ProcessSessionStart>
  <ItemProcessStatus>
    <ItemEventCount eventType="QA-TEST"/>
  </ItemProcessStatus>
  <Measurements>
    <Measurement>
      <designator>R101</designator>
      <measurementId>M001</measurementId>
      <value>85.3</value>
      <units>um</units>
      <type>DX</type>
      <status>PASS</status>
    </Measurement>
    <Measurement>
      <designator>C201</designator>
      <measurementId>M002</measurementId>
      <value>63.1</value>
      <units>um</units>
      <type>DY</type>
      <status>PASS</status>
    </Measurement>
  </Measurements>
</IPC2547Event>

The challenge lies in extracting these nested measurement blocks as individual records while preserving their relationship to the parent session context.

How to use Splunk software for this use case

First, you'll apply specific configurations to the props.conf and transforms.conf files to enable proper XML field extraction and multi-value field creation. You'll then run a search to flatten the nested structure into a tabular format suitable for analysis within the Splunk platform.

props.conf configuration

The props.conf file controls how the Splunk platform processes XML data at ingestion and defines field extraction rules. Add these settings under the [qa_data_xml] stanza:

[qa_data_xml]
BREAK_ONLY_BEFORE = <IPC2547Event>
KV_MODE = xml
REPORT-parse_xml_multi_fields = parse_xml_multi_fields
REPORT-xml_short_rename = xml_short_rename
FIELDALIAS-eventType = "IPC2547Event.ItemProcessStatus.ItemEventCount{@eventType}" AS qa_xml_eventType

Setting explanations

  • BREAK_ONLY_BEFORE = <IPC2547Event>: Ensures the Splunk platform treats each occurrence of <IPC2547Event> as the beginning of a new event. Critical for correctly segmenting potentially large, multi-line input into logical events.
  • KV_MODE = xml: Enables the Splunk platform's built-in XML key-value extraction, which automatically extracts fields from the top-level XML structure.
  • REPORT-parse_xml_multi_fields = parse_xml_multi_fields: Links to a stanza in transforms.conf that extracts nested measurement data into a multi-value field.
  • REPORT-xml_short_rename = xml_short_rename: Links to another transforms.conf stanza that renames or extracts specific fields from the XML.
  • FIELDALIAS-eventType: Creates an alias qa_xml_eventType for a specific XML attribute, making it easier to reference in searches.

transforms.conf configuration

The transforms.conf file defines regular expressions and formatting rules for field extraction. Add these settings:

[xml_short_rename]
REGEX = IPC2547Event\.ProcessSessionStart\{@sessionId\}=(\S+)
FORMAT = sessionId::$1

[parse_xml_multi_fields]
REGEX = (?s)(?<qa_xml_measurement_all><Measurement.*?<\/Measurement>)
MV_ADD = true

Transform explanations

  • [xml_short_rename]:
    • REGEX: Captures the sessionId attribute from the ProcessSessionStart element. The (\S+) group captures non-whitespace characters as the session ID value.
    • FORMAT = sessionId::$1: Assigns the captured value to a new field named sessionId.
  • [parse_xml_multi_fields]: Critical for nested data handling.
    • REGEX: Uses the (?s) flag (DOTALL) to match across newlines, capturing entire <Measurement>...</Measurement> blocks into the qa_xml_measurement_all field.
    • MV_ADD = true: Creates qa_xml_measurement_all as a multi-value field, adding each matched measurement block as a separate value.

SPL commands for processing nested XML data

Run the following search to flatten the nested XML structure and extract individual measurement fields. You can optimize it by specifying an index and adjusting the time range.

sourcetype="qa_data_xml"
| table qa_xml_stationId qa_xml_sessionId qa_xml_measurement_all
| mvexpand qa_xml_measurement_all
| eval _raw=qa_xml_measurement_all
| extract access-extractions
| fields - _raw qa_xml_measurement_all

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this search based on the specifics of your XML structure.

Splunk Search Explanation

sourcetype="qa_data_xml"

Selects events that have been assigned the qa_data_xml source type, ensuring that the props.conf and transforms.conf configurations are applied.

table qa_xml_stationId qa_xml_sessionId qa_xml_measurement_all

Displays only the specified fields: qa_xml_stationId and qa_xml_sessionId (top-level attributes extracted by KV_MODE=xml and xml_short_rename), and qa_xml_measurement_all (the multi-value field containing raw XML of each measurement).

mvexpand qa_xml_measurement_all

The pivotal flattening command. Transforms each value within the multi-value qa_xml_measurement_all field into a separate event. If an original event contained 10 measurement blocks, mvexpand creates 10 new events, each representing one measurement while retaining top-level context.

eval _raw=qa_xml_measurement_all

Overwrites the _raw field with the content of qa_xml_measurement_all for each new event, making each measurement's XML content accessible for subsequent processing.

extract access-extractions

Extracts additional fields from the modified _raw field. Assumes access-extractions refers to a pre-defined field extraction configuration for parsing the XML structure within each <Measurement> block, populating fields like designator, measurementId, value, and status.

fields - _raw qa_xml_measurement_all

Removes temporary fields (_raw and qa_xml_measurement_all), leaving only the newly extracted fields and original top-level fields for cleaner output.

Results

The search produces a flattened dataset where each row represents an individual measurement with preserved top-level session information:

qa_xml_stationId qa_xml_sessionId designator measurementId type units value status
ST-001 SESSION-2024-001 R101 M001 DX um 85.3 PASS
ST-001 SESSION-2024-001 C201 M002 DY um 63.1 PASS
ST-001 SESSION-2024-001 L301 M003 DTheta deg 1.25 PASS
ST-001 SESSION-2024-001 U401 M004 2DSolderJointAverageGreyLevel GreyLevel (0->255) 78.2 PASS

This structure enables direct analysis of individual measurement results with their associated test session context.

Next steps

Now that your nested XML data has been flattened and transformed, you can perform further analysis, such as:

  • Performance monitoring: Track key metrics for individual measurements over time to identify trends or regressions in manufacturing quality.
  • Anomaly detection: Identify outliers in test results that may indicate manufacturing defects or process deviations.
  • Root cause analysis: Correlate specific measurement failures with top-level attributes like stationId or sessionId to pinpoint problematic equipment or batches.
  • Quality control reporting: Generate comprehensive reports and dashboards summarizing pass/fail rates, measurement distributions, and other quality indicators.
  • Trend visualization: Create bar charts to compare performance across different test stations, timecharts to observe measurement value trends, or scatter plots to identify correlations between various parameters.

In addition, these resources might help you understand and implement this guidance: