Analyzing nested JSON manufacturing QA data
Manufacturing Quality Assurance (QA) systems generate complex, nested JSON data that presents unique challenges for analysis. These JSON events can span 5,000 to 10,000 lines, containing both top-level metadata and deeply nested measurement results within a hierarchical structure.
This article shows you how to configure the Splunk platform to ingest, parse, and transform nested manufacturing QA JSON data into actionable insights. You'll learn how to handle multi-line JSON events that contain both top-level attributes and repeating nested measurement objects, then flatten this structure. This enables detailed analysis of individual measurements, correlation with test sessions and station identifiers, and comprehensive QA performance monitoring.
Data required
About QA JSON data formatting
Manufacturing QA test results from automated test equipment arrive as single, long multi-line JSON events, each representing a complete test session for discrete products. These events typically span 5,000 to 10,000 lines and contain two types of data:
- Top-level data attributes: JSON fields with overall test session information (for example,
IPC2547Event.ProcessSessionStart.Entity._stationId,IPC2547Event.ProcessSessionStart._sessionId). These serve as metadata for the test session. - Nested measurement instances: Individual measurement results embedded within the JSON structure (for example,
Measurementobjects containingMeasuredNumeric,ExpectedNumeric,Componentdetails). Each instance contains specific test readings and statuses. The number of instances varies by product, creating a hierarchical structure beneath the top-level attributes.
Sample data structure
{
"IPC2547Event": {
"ProcessSessionStart": {
"Entity": {
"_stationId": "ST-001"
},
"_sessionId": "SESSION-2024-001"
},
"Measurements": [
{
"MeasuredNumeric": {
"_value": "10.5"
},
"ExpectedNumeric": {
"_value": "10.0"
},
"Component": {
"_designator": "R101"
},
"status": "PASS",
"type": "RESISTANCE"
},
{
"MeasuredNumeric": {
"_value": "3.3"
},
"ExpectedNumeric": {
"_value": "3.3"
},
"Component": {
"_designator": "C201"
},
"status": "PASS",
"type": "CAPACITANCE"
}
]
}
}
The challenge lies in extracting these nested measurement objects as individual records while preserving their relationship to the parent session context.
How to use Splunk software for this use case
First, you'll apply specific configurations to the props.conf and transforms.conf files to enable proper JSON field extraction and multi-value field creation. You'll then run a search to flatten the nested structure into a tabular format suitable for analysis within the Splunk platform.
props.conf configuration
The props.conf file controls how the Splunk platform processes JSON data at ingestion and defines field extraction rules. Add these settings under the [qa_data_json] stanza:
[qa_data_json] INDEXED_EXTRACTIONS = json KV_MODE = none LINE_BREAKER = ([\r\n]+) MAX_TIMESTAMP_LOOKAHEAD = 300 TRUNCATE = 50000 description = MFG JSON REPORT-parse_json_multi_fields = parse_json_multi_fields
Setting explanations
INDEXED_EXTRACTIONS = json: Instructs the Splunk platform to automatically extract fields from the event as JSON at index time.KV_MODE = none: Disables default key-value pair extraction, as JSON extraction is handled byINDEXED_EXTRACTIONS.LINE_BREAKER = ([\r\n]+): Defines how the Splunk platform identifies event breaks. For a single long multi-line event, this ensures the entire JSON block is treated as one event.MAX_TIMESTAMP_LOOKAHEAD = 300: Sets the maximum number of characters the Splunk platform will scan to find a timestamp.TRUNCATE = 50000: Specifies the maximum number of characters an event can have. Critical for large JSON events to prevent truncation.REPORT-parse_json_multi_fields = parse_json_multi_fields: Links to a stanza in transforms.conf, allowing for custom field extractions after initial indexing.
transforms.conf configuration
The transforms.conf file defines how to extract nested measurement objects into a multi-valued field. Add this setting under the [parse_json_multi_fields] stanza:
[parse_json_multi_fields]
REGEX = (?s)(?<qa_json_measurement_all>\{\s+"MeasuredNumeric[^}]+[^{]+[^}]+[^{]+[^}]+}[^}]+})
MV_ADD = true
Transform explanations
REGEX: This regular expression captures each individual nestedMeasurementobject that contains aMeasuredNumericfield.- The
(?s)flag allows the dot.to match newlines. - The pattern targets blocks starting with
{"MeasuredNumericand captures them into theqa_json_measurement_allfield.
- The
MV_ADD = true: Critical for nested data handling. Ensures all regex matches are added as separate values to theqa_json_measurement_allfield, creating a multi-valued field.
SPL commands for processing nested JSON data
Run the following search to flatten the nested JSON structure and extract individual measurement fields. You can optimize it by specifying an index and adjusting the time range.
sourcetype="qa_data_json" | table IPC2547Event.ProcessSessionStart.Entity._stationId IPC2547Event.ProcessSessionStart._sessionId qa_json_measurement_all | mvexpand qa_json_measurement_all | eval _raw=qa_json_measurement_all | extract access-extractions | fields - _raw qa_json_measurement_all
Search explanation
The table provides an explanation of what each part of this search achieves. You can adjust this search based on the specifics of your JSON structure.
| Splunk Search | Explanation |
|---|---|
|
|
Initiates the search by selecting events from the |
|
|
Displays a table of selected fields including top-level attributes ( |
|
|
The pivotal flattening command. Takes the multi-valued field |
|
|
After |
|
|
Triggers field extraction defined by the |
|
|
Removes temporary fields ( |
Results
The search produces a flattened dataset where each row represents an individual measurement with preserved top-level session information:
| _stationId | _sessionId | MeasuredNumeric._value | ExpectedNumeric._value | Component._designator | status | type |
|---|---|---|---|---|---|---|
| ST-001 | SESSION-2024-001 | 10.5 | 10.0 | R101 | PASS | RESISTANCE |
| ST-001 | SESSION-2024-001 | 3.3 | 3.3 | C201 | PASS | CAPACITANCE |
| ST-001 | SESSION-2024-001 | 5.0 | 5.0 | L301 | PASS | INDUCTANCE |
This structure enables direct analysis of individual measurement results with their associated test session context.
Next steps
Now that your nested JSON data has been flattened and transformed, you can perform further analysis, such as:
- Performance monitoring: Track pass/fail rates, measurement distributions, and deviations from expected values for specific components or stations.
- Trend analysis: Use
timechartto visualize performance trends over time for different products, stations, or measurement types. - Anomaly detection: Identify unusual measurement values or patterns that might indicate equipment malfunctions or process issues.
- Root cause analysis: Correlate failed measurements with other process parameters to pinpoint potential causes of defects.
- Dashboard creation: Build interactive dashboards to monitor QA performance in real-time and set up alerts for critical deviations or failures.
In addition, these resources might help you understand and implement this guidance:
- Splunk Lantern Article: Analyzing nested manufacturing QA data
- Splunk Lantern Article: Analyzing nested XML manufacturing QA data

