Analyzing nested JSON manufacturing QA data

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Manufacturing Quality Assurance (QA) systems generate complex, nested JSON data that presents unique challenges for analysis. These JSON events can span 5,000 to 10,000 lines, containing both top-level metadata and deeply nested measurement results within a hierarchical structure.

This article shows you how to configure the Splunk platform to ingest, parse, and transform nested manufacturing QA JSON data into actionable insights. You'll learn how to handle multi-line JSON events that contain both top-level attributes and repeating nested measurement objects, then flatten this structure. This enables detailed analysis of individual measurements, correlation with test sessions and station identifiers, and comprehensive QA performance monitoring.

Data required

Operational technology data

About QA JSON data formatting

Manufacturing QA test results from automated test equipment arrive as single, long multi-line JSON events, each representing a complete test session for discrete products. These events typically span 5,000 to 10,000 lines and contain two types of data:

Top-level data attributes: JSON fields with overall test session information (for example, IPC2547Event.ProcessSessionStart.Entity._stationId, IPC2547Event.ProcessSessionStart._sessionId). These serve as metadata for the test session.
Nested measurement instances: Individual measurement results embedded within the JSON structure (for example, Measurement objects containing MeasuredNumeric, ExpectedNumeric, Component details). Each instance contains specific test readings and statuses. The number of instances varies by product, creating a hierarchical structure beneath the top-level attributes.

Sample data structure

{
  "IPC2547Event": {
    "ProcessSessionStart": {
      "Entity": {
        "_stationId": "ST-001"
      },
      "_sessionId": "SESSION-2024-001"
    },
    "Measurements": [
      {
        "MeasuredNumeric": {
          "_value": "10.5"
        },
        "ExpectedNumeric": {
          "_value": "10.0"
        },
        "Component": {
          "_designator": "R101"
        },
        "status": "PASS",
        "type": "RESISTANCE"
      },
      {
        "MeasuredNumeric": {
          "_value": "3.3"
        },
        "ExpectedNumeric": {
          "_value": "3.3"
        },
        "Component": {
          "_designator": "C201"
        },
        "status": "PASS",
        "type": "CAPACITANCE"
      }
    ]
  }
}

The challenge lies in extracting these nested measurement objects as individual records while preserving their relationship to the parent session context.

How to use Splunk software for this use case

First, you'll apply specific configurations to the props.conf and transforms.conf files to enable proper JSON field extraction and multi-value field creation. You'll then run a search to flatten the nested structure into a tabular format suitable for analysis within the Splunk platform.

props.conf configuration

The props.conf file controls how the Splunk platform processes JSON data at ingestion and defines field extraction rules. Add these settings under the [qa_data_json] stanza:

[qa_data_json]
INDEXED_EXTRACTIONS = json
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
MAX_TIMESTAMP_LOOKAHEAD = 300
TRUNCATE = 50000
description = MFG JSON
REPORT-parse_json_multi_fields = parse_json_multi_fields

Setting explanations

INDEXED_EXTRACTIONS = json: Instructs the Splunk platform to automatically extract fields from the event as JSON at index time.
KV_MODE = none: Disables default key-value pair extraction, as JSON extraction is handled by INDEXED_EXTRACTIONS.
LINE_BREAKER = ([\r\n]+): Defines how the Splunk platform identifies event breaks. For a single long multi-line event, this ensures the entire JSON block is treated as one event.
MAX_TIMESTAMP_LOOKAHEAD = 300: Sets the maximum number of characters the Splunk platform will scan to find a timestamp.
TRUNCATE = 50000: Specifies the maximum number of characters an event can have. Critical for large JSON events to prevent truncation.
REPORT-parse_json_multi_fields = parse_json_multi_fields: Links to a stanza in transforms.conf, allowing for custom field extractions after initial indexing.

transforms.conf configuration

The transforms.conf file defines how to extract nested measurement objects into a multi-valued field. Add this setting under the [parse_json_multi_fields] stanza:

[parse_json_multi_fields]
REGEX = (?s)(?<qa_json_measurement_all>\{\s+"MeasuredNumeric[^}]+[^{]+[^}]+[^{]+[^}]+}[^}]+})
MV_ADD = true

Transform explanations

REGEX: This regular expression captures each individual nested Measurement object that contains a MeasuredNumeric field.
- The (?s) flag allows the dot . to match newlines.
- The pattern targets blocks starting with {"MeasuredNumeric and captures them into the qa_json_measurement_all field.
MV_ADD = true: Critical for nested data handling. Ensures all regex matches are added as separate values to the qa_json_measurement_all field, creating a multi-valued field.

SPL commands for processing nested JSON data

Run the following search to flatten the nested JSON structure and extract individual measurement fields. You can optimize it by specifying an index and adjusting the time range.

sourcetype="qa_data_json"
| table IPC2547Event.ProcessSessionStart.Entity._stationId IPC2547Event.ProcessSessionStart._sessionId qa_json_measurement_all
| mvexpand qa_json_measurement_all
| eval _raw=qa_json_measurement_all
| extract access-extractions
| fields - _raw qa_json_measurement_all

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this search based on the specifics of your JSON structure.

Splunk Search	Explanation
`sourcetype="qa_data_json"`	Initiates the search by selecting events from the `qa_data_json` source type, which is configured to handle nested JSON data.
`table IPC2547Event.ProcessSessionStart.Entity._stationId IPC2547Event.ProcessSessionStart._sessionId qa_json_measurement_all`	Displays a table of selected fields including top-level attributes (`_stationId`, `_sessionId`) and the multi-valued field `qa_json_measurement_all` containing extracted JSON measurement blocks.
`mvexpand qa_json_measurement_all`	The pivotal flattening command. Takes the multi-valued field `qa_json_measurement_all` and creates a separate event for each value. This effectively "flattens" the nested structure, allowing each measurement to be treated as an independent record while retaining top-level context.
`eval _raw=qa_json_measurement_all`	After `mvexpand`, overwrites the `_raw` field for each expanded event with its respective `qa_json_measurement_all` value (a single JSON measurement block). Critical because the `extract` command operates on `_raw`.
`extract access-extractions`	Triggers field extraction defined by the `REPORT-parse_json_multi_fields` setting. Since `_raw` now contains individual JSON measurement blocks, this extracts fields like `MeasuredNumeric._value`, `Component._designator`, `status`, and `type`.
`fields - _raw qa_json_measurement_all`	Removes temporary fields (`_raw` and `qa_json_measurement_all`), leaving only the extracted, relevant fields for analysis.

Results

The search produces a flattened dataset where each row represents an individual measurement with preserved top-level session information:

_stationId	_sessionId	MeasuredNumeric._value	ExpectedNumeric._value	Component._designator	status	type
ST-001	SESSION-2024-001	10.5	10.0	R101	PASS	RESISTANCE
ST-001	SESSION-2024-001	3.3	3.3	C201	PASS	CAPACITANCE
ST-001	SESSION-2024-001	5.0	5.0	L301	PASS	INDUCTANCE

This structure enables direct analysis of individual measurement results with their associated test session context.

Next steps

Now that your nested JSON data has been flattened and transformed, you can perform further analysis, such as:

Performance monitoring: Track pass/fail rates, measurement distributions, and deviations from expected values for specific components or stations.
Trend analysis: Use timechart to visualize performance trends over time for different products, stations, or measurement types.
Anomaly detection: Identify unusual measurement values or patterns that might indicate equipment malfunctions or process issues.
Root cause analysis: Correlate failed measurements with other process parameters to pinpoint potential causes of defects.
Dashboard creation: Build interactive dashboards to monitor QA performance in real-time and set up alerts for critical deviations or failures.

In addition, these resources might help you understand and implement this guidance:

Splunk Lantern Article: Analyzing nested manufacturing QA data
Splunk Lantern Article: Analyzing nested XML manufacturing QA data
Splunk OnDemand Services: Use these credit-based services for direct access to Splunk technical consultants with a variety of technical services from a pre-defined catalog. Most customers have OnDemand Services per their Success Plan. Engage the ODS team at ondemand@cisco.com if you would like assistance.