← Back to all products

Schema Evolution Toolkit

$39

Schema migration scripts, backward/forward compatibility checks, schema registry integration, and versioning strategies.

📁 17 files🏷 v1.0.0
PythonYAMLMarkdownJSONDatabricksPySparkSparkDelta Lake

📁 File Structure 17 files

schema-evolution-toolkit/ ├── LICENSE ├── README.md ├── configs/ │ ├── schema_policy.yaml │ └── schemas/ │ ├── v1_customer.json │ └── v2_customer.json ├── guides/ │ └── schema-evolution-strategy.md ├── notebooks/ │ ├── detect_drift.py │ └── evolve_schema.py ├── src/ │ ├── compatibility_checker.py │ ├── schema_detector.py │ ├── schema_migrator.py │ ├── schema_registry.py │ └── schema_validator.py └── tests/ ├── conftest.py ├── test_compatibility.py └── test_schema_detector.py

📖 Documentation Preview README excerpt

Schema Evolution Toolkit

Detect, validate, and migrate schema changes across Delta Lake tables — safely and automatically.

By [Datanest Digital](https://datanest.dev) | Version 1.0.0 | $39

---

What You Get

A complete toolkit for managing schema evolution in Databricks / Delta Lake pipelines:

  • Schema Detector — compare live table schemas against expected definitions, detect drift
  • Schema Migrator — apply safe migrations (add columns, widen types, rename) with rollback
  • Compatibility Checker — verify backward/forward compatibility before deploying changes
  • Schema Registry — Delta-table-backed registry for versioned schema definitions
  • Schema Validator — validate DataFrames against registered schemas at runtime
  • Ready-to-use Notebooks — detect drift and evolve schemas interactively
  • Schema Versions — example v1/v2 JSON schema files for a customer table
  • Evolution Strategy Guide — comprehensive guide to schema versioning patterns

File Tree


schema-evolution-toolkit/
├── README.md
├── manifest.json
├── LICENSE
├── src/
│   ├── schema_detector.py       # Detect schema drift between expected and actual
│   ├── schema_migrator.py       # Apply safe migrations with rollback support
│   ├── compatibility_checker.py # Backward/forward compatibility validation
│   ├── schema_registry.py       # Delta-table-backed schema version registry
│   └── schema_validator.py      # Runtime DataFrame schema validation
├── configs/
│   ├── schema_policy.yaml       # Evolution rules and policies per layer
│   └── schemas/
│       ├── v1_customer.json     # Version 1 customer schema
│       └── v2_customer.json     # Version 2 customer schema (evolved)
├── notebooks/
│   ├── detect_drift.py          # Interactive drift detection notebook
│   └── evolve_schema.py         # Interactive schema migration notebook
├── tests/
│   ├── conftest.py              # Shared pytest fixtures (SparkSession, sample schemas)
│   ├── test_schema_detector.py  # Detector unit tests
│   └── test_compatibility.py    # Compatibility checker tests
└── guides/
    └── schema-evolution-strategy.md  # Versioning & compatibility guide

Architecture


  ┌──────────────┐     ┌──────────────────┐     ┌─────────────────┐
  │  Upstream     │────▶│  Schema Detector  │────▶│  Drift Report   │
  │  Data Source  │     └──────────────────┘     └────────┬────────┘
  └──────────────┘                                        │
                                                          ▼
                        ┌──────────────────┐     ┌─────────────────┐
                        │  Compatibility   │◀────│  Decision:      │

*... continues with setup instructions, usage examples, and more.*

📄 Code Sample .py preview

src/compatibility_checker.py """ Compatibility Checker — Check forward, backward, and full compatibility between schema versions using Avro-style rules. Rules: - **Backward compatible**: consumers using the *old* schema can read data written with the *new* schema. Adding nullable columns and widening types is allowed; removing columns and narrowing types is not. - **Forward compatible**: consumers using the *new* schema can read data written with the *old* schema. Removing columns is allowed; adding required columns is not. - **Full compatible**: both backward and forward compatible simultaneously. Part of the Schema Evolution Toolkit by Datanest Digital (https://datanest.dev). """ from __future__ import annotations from dataclasses import dataclass, field from enum import Enum from typing import Dict, List, Optional, Set from pyspark.sql.types import StructField, StructType from schema_detector import ChangeKind, SchemaDetector, SchemaReport class CompatibilityMode(Enum): """Supported compatibility modes.""" BACKWARD = "backward" FORWARD = "forward" FULL = "full" NONE = "none" # Safe type widening paths (same as migrator) _SAFE_WIDENINGS: Dict[str, Set[str]] = { "byte": {"short", "int", "bigint", "double"}, "short": {"int", "bigint", "double"}, "int": {"bigint", "double"}, "bigint": {"double"}, "float": {"double"}, "date": {"timestamp"}, } @dataclass class CompatibilityResult: """Result of a compatibility check.""" # ... 139 more lines ...