How to Validate Semi-Structured Data (Arrays, Structs, and Nested JSON) Without Flattening

Qualytics introduces native validation for nested JSON, arrays, and structs, enabling comprehensive data quality checks without costly flattening pipelines.

Gorkem Sevinc

CEO & Co-Founder

Mar 16, 2026

5

min read

Table of Contents

Working with nested JSON or semi-structured data?

See how Qualytics expands data quality coverage without manual flattening.

Introducing native validation for nested data in Qualytics

Validating semi-structured data is one of the hardest challenges in modern data quality.

Events arrive as nested JSON. Orders include arrays of line items. User records contain multiple addresses, devices, and preferences. While lakehouses make it easy to store this data, validating it reliably is much harder.

Most teams face a frustrating tradeoff:

  • Spend engineering time flattening nested data into relational tables, or
  • Accept blind spots in nested fields

Today we’re introducing a new capability in Qualytics that removes that tradeoff.

Qualytics can now profile and validate semi-structured data directly, including arrays, structs, and arrays of structs—without requiring manual flattening.

The Hidden Cost of Flattening Nested Data

One of the toughest challenges we hear from customers is validating nested data structures.

A good example comes from MAPFRE, one of the world’s largest insurance companies. Their systems ingest quote data as complex JSON payloads containing deeply nested structures.

To run data quality checks, the team estimated they would need to manually flatten the data into relational tables—a project that would require nearly 3,000 engineering hours.

Working together, we reduced that effort by 98%. But that raised a bigger question:

What if teams didn’t have to flatten nested data at all?

That question led us to build native support for validating semi-structured data in Qualytics.

Customer Example: MAPFRE

MAPFRE processes quote data delivered as complex nested JSON documents.

Before using Qualytics, validating that data required flattening the JSON payloads into relational tables so traditional quality checks could run. This meant building transformation pipelines to explode arrays, project nested fields, and maintain staging tables for validation.

The engineering team estimated this effort would require approximately 3,000 hours of development work, along with ongoing maintenance whenever schemas changed.

Using Qualytics, MAPFRE was able to profile and validate the nested structures directly without building flattening pipelines.

This reduced the engineering effort required for validation by 98%, while expanding quality coverage across nested fields that previously went unchecked.

Why Semi-Structured Data Is Difficult to Validate

Semi-structured data introduces complexity because validation must occur at multiple levels of the data structure.

Structs

A struct is a column that contains multiple named subfields.

Example:

address = {

street: "123 Main St",

city: "Boston",

zip: "02108"

}

Validating address is not null is very different from validating:

address.zip exists

address.zip matches a valid format

In practice, the validations with the greatest downstream impact occur at the child-field level.

Arrays

Arrays introduce another layer of complexity because validation must occur at two levels:

  • the array container
  • the elements inside the array

For example, a line_items array might be:

  • null
  • present but empty
  • populated but incorrect because one or more elements contain invalid values

Arrays of Structs

Arrays of structs combine both challenges.

Example:

payments = [

{ method: "card", amount: 50 },

{ method: "card", amount: 25 }

]

Validation rules may involve relationships between fields:

if payments[*].method = "card"

then payments[*].card_last4 must exist

Or set-level constraints:

no duplicate payments[*].auth_code

sum(payments[*].amount) = order_total

To enforce these rules, engineers typically have to:

  • explode arrays into rows
  • validate element records
  • aggregate results
  • reconcile results back to the parent record

This is why validating nested data traditionally requires flattening pipelines.

The Tradeoff Most Teams Face

In practice, teams usually make one of two compromises.

Option 1: Flatten Everything

Engineers create views or staging tables so nested fields behave like normal columns.

While effective, this approach introduces:

  • significant engineering effort
  • fragile transformations
  • ongoing maintenance when schemas evolve

Option 2: Accept Partial Coverage

Teams validate top-level scalar fields such as:

  • primary keys
  • timestamps
  • status codes

Nested fields—often where the most critical issues occur—remain unchecked.

Examples include:

  • address.zip
  • line_items.price
  • cross-element validations within arrays of structs

Real-World Example: Detecting Revenue Errors in Nested Order Data

Consider an e-commerce platform storing order records as Parquet files in Amazon S3.

Each order contains an array of line items:

order_id: 84231

line_items: [

{ sku: "SKU-100", quantity: 2, price: 50 },

  { sku: "SKU-204", quantity: -1, price: 30 }

]

The data pipeline ran successfully. There were no schema errors or missing records. But monthly financial reports began showing understated revenue. The issue was buried inside the nested line_items array.

A recent upstream migration introduced a logic error in returns processing. Instead of generating separate return records, the system wrote returned items as negative quantities inside existing orders.

Using Qualytics, the team profiled the files without flattening them.

Nested fields were automatically surfaced, including:

customer.country

line_items.sku

line_items.quantity

line_items.price

Qualytics inferred validation rules such as:

On the next scan, the platform flagged:

  • 23 orders with negative quantities
  • 4 orders with invalid SKU formats

The issue was quickly traced to the upstream migration, and the checks now run automatically on every file.

Flattening Pipelines vs Native Nested Validation

When teams need to validate nested data, they typically build flattening pipelines to transform arrays and structs into relational tables.

Traditional Approach

Raw JSON / Parquet

       │

ETL pipeline

       │

explode arrays

       │

flatten structs

       │

create staging tables

       │

run quality checks

Challenges:

  • additional engineering pipelines
  • fragile schema transformations
  • validation happens after transformation
  • nested data coverage is incomplete

Qualytics Approach

Raw JSON / Parquet

       │

Qualytics profiling

       │

automatic schema traversal

       │

nested fields discovered

       │

rules inferred and monitored

Benefits:

  • no flattening views or staging tables
  • immediate nested field coverage
  • resilience to schema evolution
  • faster anomaly detection

How Qualytics Validates Nested Data

Qualytics automatically traverses nested schemas during profiling and surfaces child field paths as explicit subfields.

Each nested field receives:

  • its own profile
  • its own quality checks
  • its own quality score

For arrays of structs, each projected field becomes its own array column and receives both container-level and element-level validation.

Traversal is recursive up to a configurable depth, and generated field names follow the schema’s path notation.

The full nested schema is visible in the datastore explorer, where a hierarchical field tree makes complex structures easy to explore.

Field tree view with type-specific icons

Simplifying Array Validation

Qualytics separates array validation into two contexts.

Container level checks.

  • Not Null: Is the array column not null?
  • Not Empty: Does the array have at least one element?
  • Min / Max Length: Does the array have the expected number of elements?

Element level checks.

  • Not Null: Is every element in the array not null?
  • Not Empty: Does every element have a value?
  • Min / Max Length: Does each element have the expected length?
Not null container and element checks on the same array field

The platform automatically distinguishes between container-level and element-level validation when applying rules.

Faster Root Cause Analysis

When an anomaly occurs, Qualytics displays the failing record in its original nested structure.

Instead of reconstructing payloads with explode queries or flattening views, teams can inspect raw nested values directly.

This makes it easier to determine whether an issue originated from:

  • the data producer
  • an ingestion mapping
  • an upstream transformation
Data previews accelerate root cause analysis

By preserving the original structure, Qualytics shortens the path from anomaly detection to root cause resolution.

What This Unlocks for Data Teams

Native validation of semi-structured data changes how teams approach data quality.

Broader data quality coverage

Teams can validate nested fields that were previously skipped, including:

  • child fields inside structs
  • element values inside arrays
  • relationships within arrays of structs

Less engineering overhead

Without flattening pipelines, teams avoid building and maintaining:

  • transformation views
  • staging tables
  • explode-and-reconcile pipelines

This reduces both implementation time and ongoing operational complexity.

Faster incident resolution

Because Qualytics surfaces nested failures directly in their original structure, engineers can identify root causes faster. Instead of rebuilding payloads with custom queries, teams can inspect failing records immediately.

More Coverage Without Flattening

Semi-structured data has traditionally forced teams to choose between engineering effort and incomplete coverage.

Qualytics removes that tradeoff.

The platform provides profiling, rule inference, and monitoring for structs, arrays, and arrays of structs without requiring additional transformation pipelines.

Teams gain:

  • broader coverage across nested data
  • reduced operational overhead
  • faster anomaly detection
  • simpler maintenance as schemas evolve

Get Started

Already a Qualytics customer?
Complex data type support is available today. Learn more in our user guide.

Working with nested JSON or semi-structured data?
Request a demo to see how Qualytics expands data quality coverage without manual flattening.

No items found.

Related Articles

Related News

Qualytics introduces native validation for nested JSON, arrays, and structs, enabling comprehensive data quality checks without costly flattening pipelines.

Gorkem Sevinc

CEO & Co-Founder

Mar 16, 2026

5

min read

About the Customer

Introducing native validation for nested data in Qualytics

Validating semi-structured data is one of the hardest challenges in modern data quality.

Events arrive as nested JSON. Orders include arrays of line items. User records contain multiple addresses, devices, and preferences. While lakehouses make it easy to store this data, validating it reliably is much harder.

Most teams face a frustrating tradeoff:

  • Spend engineering time flattening nested data into relational tables, or
  • Accept blind spots in nested fields

Today we’re introducing a new capability in Qualytics that removes that tradeoff.

Qualytics can now profile and validate semi-structured data directly, including arrays, structs, and arrays of structs—without requiring manual flattening.

The Hidden Cost of Flattening Nested Data

One of the toughest challenges we hear from customers is validating nested data structures.

A good example comes from MAPFRE, one of the world’s largest insurance companies. Their systems ingest quote data as complex JSON payloads containing deeply nested structures.

To run data quality checks, the team estimated they would need to manually flatten the data into relational tables—a project that would require nearly 3,000 engineering hours.

Working together, we reduced that effort by 98%. But that raised a bigger question:

What if teams didn’t have to flatten nested data at all?

That question led us to build native support for validating semi-structured data in Qualytics.

Customer Example: MAPFRE

MAPFRE processes quote data delivered as complex nested JSON documents.

Before using Qualytics, validating that data required flattening the JSON payloads into relational tables so traditional quality checks could run. This meant building transformation pipelines to explode arrays, project nested fields, and maintain staging tables for validation.

The engineering team estimated this effort would require approximately 3,000 hours of development work, along with ongoing maintenance whenever schemas changed.

Using Qualytics, MAPFRE was able to profile and validate the nested structures directly without building flattening pipelines.

This reduced the engineering effort required for validation by 98%, while expanding quality coverage across nested fields that previously went unchecked.

Why Semi-Structured Data Is Difficult to Validate

Semi-structured data introduces complexity because validation must occur at multiple levels of the data structure.

Structs

A struct is a column that contains multiple named subfields.

Example:

address = {

street: "123 Main St",

city: "Boston",

zip: "02108"

}

Validating address is not null is very different from validating:

address.zip exists

address.zip matches a valid format

In practice, the validations with the greatest downstream impact occur at the child-field level.

Arrays

Arrays introduce another layer of complexity because validation must occur at two levels:

  • the array container
  • the elements inside the array

For example, a line_items array might be:

  • null
  • present but empty
  • populated but incorrect because one or more elements contain invalid values

Arrays of Structs

Arrays of structs combine both challenges.

Example:

payments = [

{ method: "card", amount: 50 },

{ method: "card", amount: 25 }

]

Validation rules may involve relationships between fields:

if payments[*].method = "card"

then payments[*].card_last4 must exist

Or set-level constraints:

no duplicate payments[*].auth_code

sum(payments[*].amount) = order_total

To enforce these rules, engineers typically have to:

  • explode arrays into rows
  • validate element records
  • aggregate results
  • reconcile results back to the parent record

This is why validating nested data traditionally requires flattening pipelines.

The Tradeoff Most Teams Face

In practice, teams usually make one of two compromises.

Option 1: Flatten Everything

Engineers create views or staging tables so nested fields behave like normal columns.

While effective, this approach introduces:

  • significant engineering effort
  • fragile transformations
  • ongoing maintenance when schemas evolve

Option 2: Accept Partial Coverage

Teams validate top-level scalar fields such as:

  • primary keys
  • timestamps
  • status codes

Nested fields—often where the most critical issues occur—remain unchecked.

Examples include:

  • address.zip
  • line_items.price
  • cross-element validations within arrays of structs

Real-World Example: Detecting Revenue Errors in Nested Order Data

Consider an e-commerce platform storing order records as Parquet files in Amazon S3.

Each order contains an array of line items:

order_id: 84231

line_items: [

{ sku: "SKU-100", quantity: 2, price: 50 },

  { sku: "SKU-204", quantity: -1, price: 30 }

]

The data pipeline ran successfully. There were no schema errors or missing records. But monthly financial reports began showing understated revenue. The issue was buried inside the nested line_items array.

A recent upstream migration introduced a logic error in returns processing. Instead of generating separate return records, the system wrote returned items as negative quantities inside existing orders.

Using Qualytics, the team profiled the files without flattening them.

Nested fields were automatically surfaced, including:

customer.country

line_items.sku

line_items.quantity

line_items.price

Qualytics inferred validation rules such as:

On the next scan, the platform flagged:

  • 23 orders with negative quantities
  • 4 orders with invalid SKU formats

The issue was quickly traced to the upstream migration, and the checks now run automatically on every file.

Flattening Pipelines vs Native Nested Validation

When teams need to validate nested data, they typically build flattening pipelines to transform arrays and structs into relational tables.

Traditional Approach

Raw JSON / Parquet

       │

ETL pipeline

       │

explode arrays

       │

flatten structs

       │

create staging tables

       │

run quality checks

Challenges:

  • additional engineering pipelines
  • fragile schema transformations
  • validation happens after transformation
  • nested data coverage is incomplete

Qualytics Approach

Raw JSON / Parquet

       │

Qualytics profiling

       │

automatic schema traversal

       │

nested fields discovered

       │

rules inferred and monitored

Benefits:

  • no flattening views or staging tables
  • immediate nested field coverage
  • resilience to schema evolution
  • faster anomaly detection

How Qualytics Validates Nested Data

Qualytics automatically traverses nested schemas during profiling and surfaces child field paths as explicit subfields.

Each nested field receives:

  • its own profile
  • its own quality checks
  • its own quality score

For arrays of structs, each projected field becomes its own array column and receives both container-level and element-level validation.

Traversal is recursive up to a configurable depth, and generated field names follow the schema’s path notation.

The full nested schema is visible in the datastore explorer, where a hierarchical field tree makes complex structures easy to explore.

Field tree view with type-specific icons

Simplifying Array Validation

Qualytics separates array validation into two contexts.

Container level checks.

  • Not Null: Is the array column not null?
  • Not Empty: Does the array have at least one element?
  • Min / Max Length: Does the array have the expected number of elements?

Element level checks.

  • Not Null: Is every element in the array not null?
  • Not Empty: Does every element have a value?
  • Min / Max Length: Does each element have the expected length?
Not null container and element checks on the same array field

The platform automatically distinguishes between container-level and element-level validation when applying rules.

Faster Root Cause Analysis

When an anomaly occurs, Qualytics displays the failing record in its original nested structure.

Instead of reconstructing payloads with explode queries or flattening views, teams can inspect raw nested values directly.

This makes it easier to determine whether an issue originated from:

  • the data producer
  • an ingestion mapping
  • an upstream transformation
Data previews accelerate root cause analysis

By preserving the original structure, Qualytics shortens the path from anomaly detection to root cause resolution.

What This Unlocks for Data Teams

Native validation of semi-structured data changes how teams approach data quality.

Broader data quality coverage

Teams can validate nested fields that were previously skipped, including:

  • child fields inside structs
  • element values inside arrays
  • relationships within arrays of structs

Less engineering overhead

Without flattening pipelines, teams avoid building and maintaining:

  • transformation views
  • staging tables
  • explode-and-reconcile pipelines

This reduces both implementation time and ongoing operational complexity.

Faster incident resolution

Because Qualytics surfaces nested failures directly in their original structure, engineers can identify root causes faster. Instead of rebuilding payloads with custom queries, teams can inspect failing records immediately.

More Coverage Without Flattening

Semi-structured data has traditionally forced teams to choose between engineering effort and incomplete coverage.

Qualytics removes that tradeoff.

The platform provides profiling, rule inference, and monitoring for structs, arrays, and arrays of structs without requiring additional transformation pipelines.

Teams gain:

  • broader coverage across nested data
  • reduced operational overhead
  • faster anomaly detection
  • simpler maintenance as schemas evolve

Get Started

Already a Qualytics customer?
Complex data type support is available today. Learn more in our user guide.

Working with nested JSON or semi-structured data?
Request a demo to see how Qualytics expands data quality coverage without manual flattening.

More case studies you might like

Qualytics introduces native validation for nested JSON, arrays, and structs, enabling comprehensive data quality checks without costly flattening pipelines.

Introducing native validation for nested data in Qualytics

Validating semi-structured data is one of the hardest challenges in modern data quality.

Events arrive as nested JSON. Orders include arrays of line items. User records contain multiple addresses, devices, and preferences. While lakehouses make it easy to store this data, validating it reliably is much harder.

Most teams face a frustrating tradeoff:

  • Spend engineering time flattening nested data into relational tables, or
  • Accept blind spots in nested fields

Today we’re introducing a new capability in Qualytics that removes that tradeoff.

Qualytics can now profile and validate semi-structured data directly, including arrays, structs, and arrays of structs—without requiring manual flattening.

The Hidden Cost of Flattening Nested Data

One of the toughest challenges we hear from customers is validating nested data structures.

A good example comes from MAPFRE, one of the world’s largest insurance companies. Their systems ingest quote data as complex JSON payloads containing deeply nested structures.

To run data quality checks, the team estimated they would need to manually flatten the data into relational tables—a project that would require nearly 3,000 engineering hours.

Working together, we reduced that effort by 98%. But that raised a bigger question:

What if teams didn’t have to flatten nested data at all?

That question led us to build native support for validating semi-structured data in Qualytics.

Customer Example: MAPFRE

MAPFRE processes quote data delivered as complex nested JSON documents.

Before using Qualytics, validating that data required flattening the JSON payloads into relational tables so traditional quality checks could run. This meant building transformation pipelines to explode arrays, project nested fields, and maintain staging tables for validation.

The engineering team estimated this effort would require approximately 3,000 hours of development work, along with ongoing maintenance whenever schemas changed.

Using Qualytics, MAPFRE was able to profile and validate the nested structures directly without building flattening pipelines.

This reduced the engineering effort required for validation by 98%, while expanding quality coverage across nested fields that previously went unchecked.

Why Semi-Structured Data Is Difficult to Validate

Semi-structured data introduces complexity because validation must occur at multiple levels of the data structure.

Structs

A struct is a column that contains multiple named subfields.

Example:

address = {

street: "123 Main St",

city: "Boston",

zip: "02108"

}

Validating address is not null is very different from validating:

address.zip exists

address.zip matches a valid format

In practice, the validations with the greatest downstream impact occur at the child-field level.

Arrays

Arrays introduce another layer of complexity because validation must occur at two levels:

  • the array container
  • the elements inside the array

For example, a line_items array might be:

  • null
  • present but empty
  • populated but incorrect because one or more elements contain invalid values

Arrays of Structs

Arrays of structs combine both challenges.

Example:

payments = [

{ method: "card", amount: 50 },

{ method: "card", amount: 25 }

]

Validation rules may involve relationships between fields:

if payments[*].method = "card"

then payments[*].card_last4 must exist

Or set-level constraints:

no duplicate payments[*].auth_code

sum(payments[*].amount) = order_total

To enforce these rules, engineers typically have to:

  • explode arrays into rows
  • validate element records
  • aggregate results
  • reconcile results back to the parent record

This is why validating nested data traditionally requires flattening pipelines.

The Tradeoff Most Teams Face

In practice, teams usually make one of two compromises.

Option 1: Flatten Everything

Engineers create views or staging tables so nested fields behave like normal columns.

While effective, this approach introduces:

  • significant engineering effort
  • fragile transformations
  • ongoing maintenance when schemas evolve

Option 2: Accept Partial Coverage

Teams validate top-level scalar fields such as:

  • primary keys
  • timestamps
  • status codes

Nested fields—often where the most critical issues occur—remain unchecked.

Examples include:

  • address.zip
  • line_items.price
  • cross-element validations within arrays of structs

Real-World Example: Detecting Revenue Errors in Nested Order Data

Consider an e-commerce platform storing order records as Parquet files in Amazon S3.

Each order contains an array of line items:

order_id: 84231

line_items: [

{ sku: "SKU-100", quantity: 2, price: 50 },

  { sku: "SKU-204", quantity: -1, price: 30 }

]

The data pipeline ran successfully. There were no schema errors or missing records. But monthly financial reports began showing understated revenue. The issue was buried inside the nested line_items array.

A recent upstream migration introduced a logic error in returns processing. Instead of generating separate return records, the system wrote returned items as negative quantities inside existing orders.

Using Qualytics, the team profiled the files without flattening them.

Nested fields were automatically surfaced, including:

customer.country

line_items.sku

line_items.quantity

line_items.price

Qualytics inferred validation rules such as:

On the next scan, the platform flagged:

  • 23 orders with negative quantities
  • 4 orders with invalid SKU formats

The issue was quickly traced to the upstream migration, and the checks now run automatically on every file.

Flattening Pipelines vs Native Nested Validation

When teams need to validate nested data, they typically build flattening pipelines to transform arrays and structs into relational tables.

Traditional Approach

Raw JSON / Parquet

       │

ETL pipeline

       │

explode arrays

       │

flatten structs

       │

create staging tables

       │

run quality checks

Challenges:

  • additional engineering pipelines
  • fragile schema transformations
  • validation happens after transformation
  • nested data coverage is incomplete

Qualytics Approach

Raw JSON / Parquet

       │

Qualytics profiling

       │

automatic schema traversal

       │

nested fields discovered

       │

rules inferred and monitored

Benefits:

  • no flattening views or staging tables
  • immediate nested field coverage
  • resilience to schema evolution
  • faster anomaly detection

How Qualytics Validates Nested Data

Qualytics automatically traverses nested schemas during profiling and surfaces child field paths as explicit subfields.

Each nested field receives:

  • its own profile
  • its own quality checks
  • its own quality score

For arrays of structs, each projected field becomes its own array column and receives both container-level and element-level validation.

Traversal is recursive up to a configurable depth, and generated field names follow the schema’s path notation.

The full nested schema is visible in the datastore explorer, where a hierarchical field tree makes complex structures easy to explore.

Field tree view with type-specific icons

Simplifying Array Validation

Qualytics separates array validation into two contexts.

Container level checks.

  • Not Null: Is the array column not null?
  • Not Empty: Does the array have at least one element?
  • Min / Max Length: Does the array have the expected number of elements?

Element level checks.

  • Not Null: Is every element in the array not null?
  • Not Empty: Does every element have a value?
  • Min / Max Length: Does each element have the expected length?
Not null container and element checks on the same array field

The platform automatically distinguishes between container-level and element-level validation when applying rules.

Faster Root Cause Analysis

When an anomaly occurs, Qualytics displays the failing record in its original nested structure.

Instead of reconstructing payloads with explode queries or flattening views, teams can inspect raw nested values directly.

This makes it easier to determine whether an issue originated from:

  • the data producer
  • an ingestion mapping
  • an upstream transformation
Data previews accelerate root cause analysis

By preserving the original structure, Qualytics shortens the path from anomaly detection to root cause resolution.

What This Unlocks for Data Teams

Native validation of semi-structured data changes how teams approach data quality.

Broader data quality coverage

Teams can validate nested fields that were previously skipped, including:

  • child fields inside structs
  • element values inside arrays
  • relationships within arrays of structs

Less engineering overhead

Without flattening pipelines, teams avoid building and maintaining:

  • transformation views
  • staging tables
  • explode-and-reconcile pipelines

This reduces both implementation time and ongoing operational complexity.

Faster incident resolution

Because Qualytics surfaces nested failures directly in their original structure, engineers can identify root causes faster. Instead of rebuilding payloads with custom queries, teams can inspect failing records immediately.

More Coverage Without Flattening

Semi-structured data has traditionally forced teams to choose between engineering effort and incomplete coverage.

Qualytics removes that tradeoff.

The platform provides profiling, rule inference, and monitoring for structs, arrays, and arrays of structs without requiring additional transformation pipelines.

Teams gain:

  • broader coverage across nested data
  • reduced operational overhead
  • faster anomaly detection
  • simpler maintenance as schemas evolve

Get Started

Already a Qualytics customer?
Complex data type support is available today. Learn more in our user guide.

Working with nested JSON or semi-structured data?
Request a demo to see how Qualytics expands data quality coverage without manual flattening.

More videos you might like