Skip to main content
Splunk Lantern is a nominee for Knowledge Innovation and Knowledge Management in the CXOne Customer Recognition Awards. Click here to vote for us!

 

Splunk Lantern

Analyzing nested JSON manufacturing QA data

Manufacturing Quality Assurance (QA) systems generate complex, nested JSON data that presents unique challenges for analysis. These JSON events can span 5,000 to 10,000 lines, containing both top-level metadata and deeply nested measurement results within a hierarchical structure.

This article shows you how to configure the Splunk platform to ingest, parse, and transform nested manufacturing QA JSON data into actionable insights. You'll learn how to handle multi-line JSON events that contain both top-level attributes and repeating nested measurement objects, then flatten this structure. This enables detailed analysis of individual measurements, correlation with test sessions and station identifiers, and comprehensive QA performance monitoring.

About QA JSON data formatting

Manufacturing QA test results from automated test equipment arrive as single, long multi-line JSON events, each representing a complete test session for discrete products. These events typically span 5,000 to 10,000 lines and contain two types of data:

  • Top-level data attributes: JSON fields with overall test session information (for example, IPC2547Event.ProcessSessionStart.Entity._stationId, IPC2547Event.ProcessSessionStart._sessionId). These serve as metadata for the test session.
  • Nested measurement instances: Individual measurement results embedded within the JSON structure (for example, Measurement objects containing MeasuredNumeric, ExpectedNumeric, Component details). Each instance contains specific test readings and statuses. The number of instances varies by product, creating a hierarchical structure beneath the top-level attributes.

Sample data structure

{
  "IPC2547Event": {
    "ProcessSessionStart": {
      "Entity": {
        "_stationId": "ST-001"
      },
      "_sessionId": "SESSION-2024-001"
    },
    "Measurements": [
      {
        "MeasuredNumeric": {
          "_value": "10.5"
        },
        "ExpectedNumeric": {
          "_value": "10.0"
        },
        "Component": {
          "_designator": "R101"
        },
        "status": "PASS",
        "type": "RESISTANCE"
      },
      {
        "MeasuredNumeric": {
          "_value": "3.3"
        },
        "ExpectedNumeric": {
          "_value": "3.3"
        },
        "Component": {
          "_designator": "C201"
        },
        "status": "PASS",
        "type": "CAPACITANCE"
      }
    ]
  }
}

The challenge lies in extracting these nested measurement objects as individual records while preserving their relationship to the parent session context.

How to use Splunk software for this use case

First, you'll apply specific configurations to the props.conf and transforms.conf files to enable proper JSON field extraction and multi-value field creation. You'll then run a search to flatten the nested structure into a tabular format suitable for analysis within the Splunk platform.

props.conf configuration

The props.conf file controls how the Splunk platform processes JSON data at ingestion and defines field extraction rules. Add these settings under the [qa_data_json] stanza:

[qa_data_json]
INDEXED_EXTRACTIONS = json
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
MAX_TIMESTAMP_LOOKAHEAD = 300
TRUNCATE = 50000
description = MFG JSON
REPORT-parse_json_multi_fields = parse_json_multi_fields

Setting explanations

  • INDEXED_EXTRACTIONS = json: Instructs the Splunk platform to automatically extract fields from the event as JSON at index time.
  • KV_MODE = none: Disables default key-value pair extraction, as JSON extraction is handled by INDEXED_EXTRACTIONS.
  • LINE_BREAKER = ([\r\n]+): Defines how the Splunk platform identifies event breaks. For a single long multi-line event, this ensures the entire JSON block is treated as one event.
  • MAX_TIMESTAMP_LOOKAHEAD = 300: Sets the maximum number of characters the Splunk platform will scan to find a timestamp.
  • TRUNCATE = 50000: Specifies the maximum number of characters an event can have. Critical for large JSON events to prevent truncation.
  • REPORT-parse_json_multi_fields = parse_json_multi_fields: Links to a stanza in transforms.conf, allowing for custom field extractions after initial indexing.

transforms.conf configuration

The transforms.conf file defines how to extract nested measurement objects into a multi-valued field. Add this setting under the [parse_json_multi_fields] stanza:

[parse_json_multi_fields]
REGEX = (?s)(?<qa_json_measurement_all>\{\s+"MeasuredNumeric[^}]+[^{]+[^}]+[^{]+[^}]+}[^}]+})
MV_ADD = true

Transform explanations

  • REGEX: This regular expression captures each individual nested Measurement object that contains a MeasuredNumeric field.
    • The (?s) flag allows the dot . to match newlines.
    • The pattern targets blocks starting with {"MeasuredNumeric and captures them into the qa_json_measurement_all field.
  • MV_ADD = true: Critical for nested data handling. Ensures all regex matches are added as separate values to the qa_json_measurement_all field, creating a multi-valued field.

SPL commands for processing nested JSON data

Run the following search to flatten the nested JSON structure and extract individual measurement fields. You can optimize it by specifying an index and adjusting the time range.

sourcetype="qa_data_json"
| table IPC2547Event.ProcessSessionStart.Entity._stationId IPC2547Event.ProcessSessionStart._sessionId qa_json_measurement_all
| mvexpand qa_json_measurement_all
| eval _raw=qa_json_measurement_all
| extract access-extractions
| fields - _raw qa_json_measurement_all

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this search based on the specifics of your JSON structure.

Splunk Search Explanation

sourcetype="qa_data_json"

Initiates the search by selecting events from the qa_data_json source type, which is configured to handle nested JSON data.

table IPC2547Event.ProcessSessionStart.Entity._stationId IPC2547Event.ProcessSessionStart._sessionId qa_json_measurement_all

Displays a table of selected fields including top-level attributes (_stationId, _sessionId) and the multi-valued field qa_json_measurement_all containing extracted JSON measurement blocks.

mvexpand qa_json_measurement_all

The pivotal flattening command. Takes the multi-valued field qa_json_measurement_all and creates a separate event for each value. This effectively "flattens" the nested structure, allowing each measurement to be treated as an independent record while retaining top-level context.

eval _raw=qa_json_measurement_all

After mvexpand, overwrites the _raw field for each expanded event with its respective qa_json_measurement_all value (a single JSON measurement block). Critical because the extract command operates on _raw.

extract access-extractions

Triggers field extraction defined by the REPORT-parse_json_multi_fields setting. Since _raw now contains individual JSON measurement blocks, this extracts fields like MeasuredNumeric._value, Component._designator, status, and type.

fields - _raw qa_json_measurement_all

Removes temporary fields (_raw and qa_json_measurement_all), leaving only the extracted, relevant fields for analysis.

Results

The search produces a flattened dataset where each row represents an individual measurement with preserved top-level session information:

_stationId _sessionId MeasuredNumeric._value ExpectedNumeric._value Component._designator status type
ST-001 SESSION-2024-001 10.5 10.0 R101 PASS RESISTANCE
ST-001 SESSION-2024-001 3.3 3.3 C201 PASS CAPACITANCE
ST-001 SESSION-2024-001 5.0 5.0 L301 PASS INDUCTANCE

This structure enables direct analysis of individual measurement results with their associated test session context.

Next steps

Now that your nested JSON data has been flattened and transformed, you can perform further analysis, such as:

  • Performance monitoring: Track pass/fail rates, measurement distributions, and deviations from expected values for specific components or stations.
  • Trend analysis: Use timechart to visualize performance trends over time for different products, stations, or measurement types.
  • Anomaly detection: Identify unusual measurement values or patterns that might indicate equipment malfunctions or process issues.
  • Root cause analysis: Correlate failed measurements with other process parameters to pinpoint potential causes of defects.
  • Dashboard creation: Build interactive dashboards to monitor QA performance in real-time and set up alerts for critical deviations or failures.

In addition, these resources might help you understand and implement this guidance: