← Back to all products
$59
Databricks Workspace Toolkit
Workspace provisioning scripts, cluster policies, notebook templates, Unity Catalog setup, and job orchestration patterns.
PythonYAMLTOMLShellJSONMarkdownAzureDatabricksSpark
📁 File Structure 17 files
databricks-workspace-toolkit/
├── LICENSE
├── README.md
├── configs/
│ ├── cluster_policies.json
│ ├── job_templates/
│ │ ├── etl_job.json
│ │ └── ml_training_job.json
│ └── workspace_config.yaml
├── guides/
│ └── workspace-management.md
├── notebooks/
│ └── admin_dashboard.py
├── scripts/
│ ├── export_workspace.sh
│ └── setup_workspace.sh
└── src/
├── cluster_manager.py
├── job_manager.py
├── permissions_manager.py
├── secret_manager.py
├── unity_catalog_setup.py
└── workspace_manager.py
📖 Documentation Preview README excerpt
Databricks Workspace Toolkit
Automate Databricks workspace management — clusters, jobs, secrets, Unity Catalog, and permissions.
Stop clicking through the UI. Manage your entire Databricks workspace programmatically with production-ready Python wrappers around the Databricks REST APIs.
---
What You Get
- Workspace management — List, create, delete, import/export notebooks programmatically
- Cluster automation — Create clusters, resize, manage pools, enforce auto-termination policies
- Job orchestration — Create multi-task workflows, manage schedules, configure notifications
- Secret management — Create scopes, store secrets, manage ACLs for secure credential handling
- Unity Catalog setup — Bootstrap catalogs, schemas, tables, grants, and external locations
- Permissions manager — Configure RBAC for clusters, jobs, notebooks, and SQL warehouses
- Cluster policies — Cost control templates with instance type restrictions and spot pricing
- Job templates — Ready-to-use ETL and ML training job definitions
- Shell scripts — Bootstrap workspace setup and backup/export notebooks
- Admin dashboard — Databricks notebook showing cluster usage, job status, and costs
File Tree
databricks-workspace-toolkit/
├── README.md
├── manifest.json
├── LICENSE
├── src/
│ ├── workspace_manager.py # Notebook CRUD, import/export
│ ├── cluster_manager.py # Cluster lifecycle, pools, policies
│ ├── job_manager.py # Jobs API: create, run, notifications
│ ├── secret_manager.py # Secret scopes and ACLs
│ ├── unity_catalog_setup.py # UC bootstrap: catalogs, schemas, grants
│ └── permissions_manager.py # RBAC for workspace resources
├── configs/
│ ├── cluster_policies.json # Cost control cluster policies
│ ├── workspace_config.yaml # Environment configuration
│ └── job_templates/
│ ├── etl_job.json # Multi-task ETL pipeline job
│ └── ml_training_job.json # ML training with GPU cluster
├── scripts/
│ ├── setup_workspace.sh # Bootstrap workspace setup
│ └── export_workspace.sh # Backup notebooks and configs
├── notebooks/
│ └── admin_dashboard.py # Admin overview dashboard
└── guides/
└── workspace-management.md # Best practices guide
Getting Started
1. Configure Your Workspace
# configs/workspace_config.yaml
workspace:
host: "https://adb-1234567890.12.azuredatabricks.net"
token_env_var: "DATABRICKS_TOKEN"
... continues with setup instructions, usage examples, and more.
📄 Code Sample .py preview
src/cluster_manager.py
"""
Databricks Cluster Manager
===========================
Create, configure, resize, and manage Databricks clusters and instance pools
via the Clusters API 2.0.
Datanest Digital | https://datanest.dev
"""
from __future__ import annotations
import logging
import os
import time
from dataclasses import dataclass
from typing import Any, Optional
import requests
import yaml
logger = logging.getLogger(__name__)
class ClusterManager:
"""Manage Databricks clusters and instance pools."""
def __init__(self, host: str, token: str, timeout: int = 30) -> None:
self._host = host.rstrip("/")
self._timeout = timeout
self._session = requests.Session()
self._session.headers.update({
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
})
@classmethod
def from_config(cls, config_path: str) -> ClusterManager:
"""Create ClusterManager from a YAML config file."""
with open(config_path) as f:
config = yaml.safe_load(f)
ws = config["workspace"]
token = os.environ.get(ws.get("token_env_var", "DATABRICKS_TOKEN"), "")
return cls(host=ws["host"], token=token, timeout=ws.get("timeout", 30))
def _api(self, method: str, endpoint: str, **kwargs: Any) -> dict:
"""Make an authenticated API request."""
url = f"{self._host}/api/2.0{endpoint}"
resp = self._session.request(method, url, timeout=self._timeout, **kwargs)
resp.raise_for_status()
# ... 225 more lines ...