← Back to all products
$39
Schema Evolution Toolkit
Schema migration scripts, backward/forward compatibility checks, schema registry integration, and versioning strategies.
PythonYAMLMarkdownJSONDatabricksPySparkSparkDelta Lake
📁 File Structure 17 files
schema-evolution-toolkit/
├── LICENSE
├── README.md
├── configs/
│ ├── schema_policy.yaml
│ └── schemas/
│ ├── v1_customer.json
│ └── v2_customer.json
├── guides/
│ └── schema-evolution-strategy.md
├── notebooks/
│ ├── detect_drift.py
│ └── evolve_schema.py
├── src/
│ ├── compatibility_checker.py
│ ├── schema_detector.py
│ ├── schema_migrator.py
│ ├── schema_registry.py
│ └── schema_validator.py
└── tests/
├── conftest.py
├── test_compatibility.py
└── test_schema_detector.py
📖 Documentation Preview README excerpt
Schema Evolution Toolkit
Detect, validate, and migrate schema changes across Delta Lake tables — safely and automatically.
By [Datanest Digital](https://datanest.dev) | Version 1.0.0 | $39
---
What You Get
A complete toolkit for managing schema evolution in Databricks / Delta Lake pipelines:
- Schema Detector — compare live table schemas against expected definitions, detect drift
- Schema Migrator — apply safe migrations (add columns, widen types, rename) with rollback
- Compatibility Checker — verify backward/forward compatibility before deploying changes
- Schema Registry — Delta-table-backed registry for versioned schema definitions
- Schema Validator — validate DataFrames against registered schemas at runtime
- Ready-to-use Notebooks — detect drift and evolve schemas interactively
- Schema Versions — example v1/v2 JSON schema files for a customer table
- Evolution Strategy Guide — comprehensive guide to schema versioning patterns
File Tree
schema-evolution-toolkit/
├── README.md
├── manifest.json
├── LICENSE
├── src/
│ ├── schema_detector.py # Detect schema drift between expected and actual
│ ├── schema_migrator.py # Apply safe migrations with rollback support
│ ├── compatibility_checker.py # Backward/forward compatibility validation
│ ├── schema_registry.py # Delta-table-backed schema version registry
│ └── schema_validator.py # Runtime DataFrame schema validation
├── configs/
│ ├── schema_policy.yaml # Evolution rules and policies per layer
│ └── schemas/
│ ├── v1_customer.json # Version 1 customer schema
│ └── v2_customer.json # Version 2 customer schema (evolved)
├── notebooks/
│ ├── detect_drift.py # Interactive drift detection notebook
│ └── evolve_schema.py # Interactive schema migration notebook
├── tests/
│ ├── conftest.py # Shared pytest fixtures (SparkSession, sample schemas)
│ ├── test_schema_detector.py # Detector unit tests
│ └── test_compatibility.py # Compatibility checker tests
└── guides/
└── schema-evolution-strategy.md # Versioning & compatibility guide
Architecture
┌──────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Upstream │────▶│ Schema Detector │────▶│ Drift Report │
│ Data Source │ └──────────────────┘ └────────┬────────┘
└──────────────┘ │
▼
┌──────────────────┐ ┌─────────────────┐
│ Compatibility │◀────│ Decision: │
*... continues with setup instructions, usage examples, and more.*
📄 Code Sample .py preview
src/compatibility_checker.py
"""
Compatibility Checker — Check forward, backward, and full compatibility
between schema versions using Avro-style rules.
Rules:
- **Backward compatible**: consumers using the *old* schema can read data
written with the *new* schema. Adding nullable columns and widening types
is allowed; removing columns and narrowing types is not.
- **Forward compatible**: consumers using the *new* schema can read data
written with the *old* schema. Removing columns is allowed; adding
required columns is not.
- **Full compatible**: both backward and forward compatible simultaneously.
Part of the Schema Evolution Toolkit by Datanest Digital (https://datanest.dev).
"""
from __future__ import annotations
from dataclasses import dataclass, field
from enum import Enum
from typing import Dict, List, Optional, Set
from pyspark.sql.types import StructField, StructType
from schema_detector import ChangeKind, SchemaDetector, SchemaReport
class CompatibilityMode(Enum):
"""Supported compatibility modes."""
BACKWARD = "backward"
FORWARD = "forward"
FULL = "full"
NONE = "none"
# Safe type widening paths (same as migrator)
_SAFE_WIDENINGS: Dict[str, Set[str]] = {
"byte": {"short", "int", "bigint", "double"},
"short": {"int", "bigint", "double"},
"int": {"bigint", "double"},
"bigint": {"double"},
"float": {"double"},
"date": {"timestamp"},
}
@dataclass
class CompatibilityResult:
"""Result of a compatibility check."""
# ... 139 more lines ...