Databricks to Fabric Migration for Manufacturing

A global manufacturer migrated Databricks workloads to Microsoft Fabric using a pre-built accelerator, enabling faster migration, automated reconciliation, ML pipeline transition, and unified analytics with minimal business disruption.

At a Glance

The Client asked our team to migrate its Databricks analytics estate to Microsoft Fabric without interrupting the business. We ran the migration using the Databricks-to-Fabric pre-built accelerator, sequencing inventory, schema migration, notebook conversion, pipeline translation, MLflow-to-Fabric ML migration, permissions mapping, and reconciliation across eight phases.

Client Background

The Client operates hundreds of plants globally. Its analytics estate had grown on Databricks over five years, supporting operational reporting, supplier quality analytics, supply-chain telemetry, and a small set of production ML models for demand forecasting and quality prediction.

The Client had standardised on Microsoft 365 for collaboration, Power BI for reporting, and Azure as the primary cloud. The case to consolidate analytics onto Microsoft Fabric was as much financial as technical: simpler licensing, tighter integration with the finance and ERP reporting estate, and a single security model across the productivity and analytics stack.

Project Objective

The brief was to migrate the Databricks analytics platform to Microsoft Fabric without disrupting the business. Three measurable goals were set at the outset:

Zero data loss in migration. Every Delta table reconciled within ±0.3% on row counts and aggregate measures.
ML models retrained on Fabric ML with no accuracy regression beyond ±1% on validation sets.
Production cutover within 16 weeks of project start.

Our Approach and Architecture

We structured the work into nine workstreams running in planned overlap, with the Pre-Built Accelerator driving each workstream's repetitive work and engineers owning the judgement calls.

The diagram shows the Databricks estate on the left, the Pre-Built Migration Accelerator in the middle, and the target Microsoft Fabric estate on the right. The accelerator runs as a sequenced toolchain: inventory and dependency mapping, schema and data migration, notebook conversion, pipeline translation, MLflow to Fabric ML migration, permissions mapping, and validation. Each module emits artefacts that engineers review before they are promoted forward.

Inventory and dependency mapping

The accelerator scans the Databricks workspace and inventories notebooks, jobs, table dependencies, secret scopes, and Unity Catalog permissions. It produces a dependency graph showing which notebooks feed which tables and which jobs orchestrate which sequences. This becomes the migration plan's source of truth and survives as a living document throughout the engagement.

Schema and data migration

Delta tables on ADLS were migrated using two patterns. For tables outside the critical reporting path, shortcuts in OneLake pointed at the existing Delta location for a transition period; full physical migration followed once cutover dependencies cleared. For high-criticality tables, the accelerator performed a one-shot migration with row-count and aggregate reconciliation against the source.

Notebook conversion

PySpark notebooks were converted with the accelerator's notebook translator. Around 80% of the corpus converted cleanly. The remaining 20%, covering Spark version differences, custom UDFs, deprecated APIs, magic command differences, and secret scope rewrites, needed engineer attention. The accelerator flagged each non-trivial pattern and produced a candidate rewrite that engineers reviewed.

Pipeline and orchestration migration

Databricks Jobs configurations were translated into Fabric Data Pipelines. The accelerator generated the pipeline definition from each source job specification, covering trigger schedules, dependency edges, parameter passing, and retry policies. Engineers reviewed the generated pipelines and added the platform-specific integrations that do not translate cleanly, such as webhook callbacks and custom alerting integrations.

MLflow to Fabric ML migration

The MLflow model registry was migrated to Fabric ML's model registry, preserving versioning and metadata. Training pipelines were retargeted to Fabric ML's experiment tracking. Models were retrained on Fabric ML to confirm parity, and validation against holdout sets showed no accuracy regression beyond the agreed tolerance.

Permissions mapping

Unity Catalog's catalog, schema, and table-level grants and groups were mapped to Fabric's workspace and item-level security model. This was the slowest mechanical step. The structures do not map one-to-one, so the accelerator produced a candidate mapping that the security architect and a Client representative reviewed and adjusted.

BI redirect and semantic models

Power BI reports previously connecting to Databricks SQL warehouses were retargeted to a Direct Lake semantic model over OneLake. Reports using DirectQuery migrated cleanly. Reports using import mode were rebuilt against the semantic model to take advantage of Direct Lake's no-copy execution. About two-thirds of reports improved noticeably in response time.

Validation, reconciliation and cutover

The accelerator generated reconciliation queries comparing each migrated table to its Databricks source: row counts, hash totals on key columns, and distributional checks on numeric facts. The harness ran daily during a two-week parallel run and produced a tolerance report. Cutover proceeded once every report reconciled within ±0.3%. The cutover itself was a single weekend, with Databricks decommissioning following over the next two months as confidence built.

Project Plan

The migration ran for 14 weeks of build followed by 4 weeks of hypercare, totalling 18 weeks. Eight phases, parallelised wherever the dependency graph allowed.

Phase modules in detail:

Phase 1 — Discovery and Inventory (Weeks 1–2). Full Databricks workspace inventory. Dependency mapping across notebooks, tables, jobs, and downstream BI. Business prioritisation of migration order. Target architecture sign-off.

Phase 2 — Target Architecture and Security Setup (Weeks 2–4). Fabric workspaces provisioned, OneLake structure laid out, Azure DevOps Git integration, on-premises data gateway, Key Vault, and the Fabric security model.

Phase 3 — Data and Metadata Migration (Weeks 3–8). Delta table migration: shortcuts for low-criticality tables, full physical migration for high-criticality tables, table-by-table reconciliation against Databricks source.

Phase 4 — Notebook and Pipeline Conversion (Weeks 5–11). Accelerator-driven PySpark conversion, Fabric Data Pipeline rewrites from Databricks Jobs, dependency wiring, and parameter handling.

Phase 5 — ML Migration (Weeks 7–12). MLflow registry migration to Fabric ML, retraining of models, validation against holdout sets, and model serving.

Phase 6 — BI Redirect and Semantic Models (Weeks 9–13). Power BI retargeting from Databricks SQL to Direct Lake semantic models, performance testing, and minor dashboard adjustments.

Phase 7 — Validation, Reconciliation and Cutover (Weeks 12–14). Daily reconciliation reports during a two-week parallel run, business UAT, single-weekend cutover.

Phase 8 — Hypercare and Decommission (Weeks 15–18). Defect triage, performance tuning, knowledge transfer, phased decommissioning of the Databricks workspace.

AI Acceleration Across the Migration Lifecycle

The accelerator brings AI assistance into the migration toolchain at each step. The platform built on the other side is conventional Microsoft Fabric. The AI sits in the build.

Notebook conversion. The accelerator parses Databricks PySpark and produces a Fabric-equivalent draft, flagging API differences, magic command rewrites, and deprecated patterns. Engineers review and finalise. Around 80% converted cleanly; the remaining 20% needed real engineering work.

Pipeline translation. Databricks Jobs configurations are translated into Fabric Data Pipeline definitions including triggers, dependencies, parameters, and retry policies. Engineers review and add platform-specific integrations such as webhook callbacks.

Permissions mapping. The accelerator drafts a Fabric security model from Unity Catalog grants and groups. The mapping is a starting point that the security architect adjusted.

Reconciliation query generation. For each migrated table, the accelerator generates row count, hash total, and distribution checks. These run nightly during parallel run and produce a daily tolerance report.

Documentation. Migration runbooks, table-by-table mapping documents, and dependency diagrams are generated from the accelerator's inventory artefacts and refreshed as the migration progresses.

Where we kept humans firmly in front: business prioritisation, target architecture, security model decisions, ML accuracy sign-off, and the cutover go/no-go.

Outcome

Migration completed in 18 weeks, with 2 weeks of the original 16-week target absorbed by additional reconciliation rather than skipped validation.
180+ notebooks migrated. Around 80% converted without engineer intervention beyond review.
2,400+ Delta tables migrated and reconciled within ±0.3% on row counts and key aggregates.
35 MLflow pipelines retargeted to Fabric ML with no accuracy regression on validation sets.
Power BI performance: roughly 65% of reports saw query response improve, attributed primarily to Direct Lake mode against OneLake.
Compute and licence cost: estimated 25% reduction in run-rate analytics spend post-cutover.
Business interruption during cutover: zero. Reports were available without delay on the Monday following the cutover weekend.

Tech Stack

Source (Databricks): Workspace, Unity Catalog, Delta Lake on ADLS, MLflow, Databricks Jobs, Power BI via Databricks SQL.

Target (Microsoft Fabric): OneLake, Lakehouse, PySpark Notebooks, Data Pipelines, Fabric ML, Direct Lake semantic model, Power BI.

Microsoft Azure: On-premises data gateway, Azure Key Vault, Azure DevOps.

Pre-Built Accelerator: Databricks-to-Fabric Migration Accelerator suite (Inventory & Dependency Mapper, Schema Migrator, Notebook Converter, Pipeline Translator, MLflow-to-Fabric ML Migrator, Permissions Mapper, Validation Harness, Documentation Generator).

Reflection

Three observations from the engagement worth keeping in mind for similar work.

First, around 80% of the notebooks converted cleanly. The remaining 20%, covering Spark version specifics, custom UDFs, deprecated APIs, and magic command differences, is where the engineering hours land. Planning for that ratio rather than the headline 80% kept the schedule honest.

Second, the security model translation was the slowest step. Unity Catalog and Fabric workspace permissions do not map one-to-one. The accelerator drafted a candidate mapping, but the structures behave differently enough that the security architect needed to think through each workspace boundary and item-level grant. We now budget more time for this on subsequent migrations.

Third, the reconciliation discipline mattered more than migration speed. The accelerator moved data quickly. The trust the business needed came from running daily reconciliation reports during parallel run and not skipping a single one. Most of the additional two weeks went into reconciliation, not migration.

Databricks to Fabric Migration for Manufacturing Analytics

At a Glance

Client Background

Project Objective

Our Approach and Architecture

Inventory and dependency mapping

Schema and data migration

Notebook conversion

Pipeline and orchestration migration

MLflow to Fabric ML migration

Permissions mapping

BI redirect and semantic models

Validation, reconciliation and cutover

Project Plan

AI Acceleration Across the Migration Lifecycle

Outcome

Tech Stack

Reflection

Frequently Asked Questions

Explore More Works

Clinical-Claims Data Platform on Microsoft Fabric

Customer 360 on Snowflake for a Multi-Channel Retailer

Let's talk scale.

Email

Phone / Text

Headquarters