HomeOur WorkClinical-Claims Data Platform on Microsoft Fabric

Clinical-Claims Data Platform on Microsoft Fabric

A US health insurer built a HIPAA-aligned healthcare data platform on Microsoft Fabric, unifying claims, clinical, pharmacy, lab, and provider data to enable cohort analytics, quality measures, secure reporting, and governed self-service insights.

At a Glance

The Client needed a unified clinical and claims data platform to support cohort analytics, prior authorisation reporting, and quality measure tracking. Our team delivered a custom-built platform on Microsoft Fabric using our Custom-Build Accelerator for healthcare. The platform brings together claims, clinical, pharmacy, lab, and member-provider data into a HIPAA-aligned Lakehouse with full audit, row- and column-level controls, and a cohort builder over the conformed data.

Client Background

The Client serves around 1.5 million covered lives across commercial and Medicare Advantage lines, operating in several states. The heritage analytics estate was a mix of SQL Server data marts, Excel-driven reporting, and a long tail of one-off extracts requested from the claims processor.

Strategic decisions, including population health programmes, provider contract negotiations, and prior authorisation policy, required cohort analysis that the existing estate could not support in reasonable time. Most cohort requests took weeks of analyst effort and were re-derived each time. Compliance reporting added a parallel set of extracts that were maintained independently.

Project Objective

The brief was to build a unified, HIPAA-aligned data platform that brings together claims, clinical, pharmacy, lab, member, and provider data with the lineage and audit needed for regulated reporting. Three measurable goals were set at the outset:

  1. Deliver staged value: initial claims-based cohort analytics by month 9, integrated clinical-claims cohort builder by month 16, full quality measures and dashboards by month 20.
  2. Deliver an audit trail meeting regulatory access reporting requirements, with every protected access event logged and queryable from day one of production.
  3. Support industry-standard healthcare quality measures on a monthly cadence with documented validation against current measure specifications.

Our Approach and Architecture

A custom build, but accelerated. Our Custom-Build Accelerator for Healthcare brings starter templates that we then tune to the Client's lines of business, contracted provider network, and quality measure scope. The accelerator is a head start, not a shortcut. The work that follows is real engineering against real healthcare data.

Ingestion

Three patterns cover the heterogeneity of healthcare data. Claims and pharmacy data land via SFTP into a staging area in OneLake. Each file is validated against its expected structure; non-conforming files are quarantined and trigger a quality alert rather than failing silently into Bronze.

Clinical data is pulled from 35+ contracted clinical endpoints. The accelerator's clinical client handles bulk data exports where the endpoint supports it, and falls back to incremental pulls where it does not. Endpoint-specific quirks, including pagination differences, authentication flow variants, and partial implementations, are managed in per-endpoint configuration rather than in code.

Member, provider, and lab data come through paginated REST APIs with watermark-driven incremental extraction. Raw responses land as JSON in Bronze.

Bronze, Silver, Gold

Bronze holds raw, immutable source records organised by subject area: Claims, Clinical, Pharmacy, Lab, and Member & Provider. Partitioned by source and ingestion date. Encrypted at rest with customer-managed keys. Full source lineage tracked, including originating file or endpoint.

Silver parses and standardises by subject area. Claims are normalised into a canonical claim model covering claim lines and adjudications. Clinical data is flattened into typed records covering encounters, conditions, medications, observations, and procedures. Pharmacy holds dispensed prescriptions and PBM-adjudicated transactions. Lab Results are normalised against reference ranges. Member & Provider hold current and historical records.

Clinical code sets, both diagnosis and procedure, are mapped against current versions, with effective dates tracked for accurate point-in-time analysis. Sensitive data is masked or tokenised at this layer based on field-level sensitivity classification.

This is also where the Member-Provider-Claim Linkage layer is built. It is the spine that joins clinical encounters to claims for the same member at the same provider, enabling integrated cohort analytics. Entity resolution here is non-trivial because member identifiers differ between claims and clinical sources, and provider identifiers are not always populated consistently.

Gold is a dimensional model expressed by subject area: Claims, Clinical Encounters, Medications, and Lab Results as fact subject areas; Member, Provider, Code Sets, Date, and Benefit Plan as conformed dimensions. A Cohort Builder sits on Gold as a set of parameterised views, supporting cohort definitions by demographic, diagnosis, procedure, medication, lab value, and benefit attributes. Quality Measures views are maintained alongside, aligned to industry-standard healthcare measure specifications.

Semantic and Consumption

A Direct Lake semantic model surfaces Gold through Power BI. Row-level security scopes line of business and care management programme. Column-level masking protects sensitive data based on user role. Two dashboard sets ship at go-live: Quality Measures and Provider Network Analysis. The cohort builder is exposed as a self-service experience for analyst users with appropriate role grants.

HIPAA security and audit

Private endpoint connectivity into Fabric and OneLake. No public network exposure. Customer-managed keys for encryption at rest. Microsoft Purview applies sensitivity labels and tracks lineage. Conditional Access enforces MFA and device compliance, with break-glass procedures defined and logged. An immutable audit log captures every protected access event with user, role, query, and timestamp. The log is queryable by the compliance team for regulatory access reporting requests.

Orchestration and DataOps

Fabric Data Pipelines orchestrate ingestion and the medallion flow. Notebooks invoked from pipelines run the silver transformations and cohort views. Git integration ties the workspace to Azure DevOps with mandatory peer review and CI checks. Deployment pipelines promote development through test to production. Secrets sit in Azure Key Vault, identities and roles flow from Azure AD.

Project Plan

The programme ran for 20 months of construction followed by 4 months of hypercare and minor enhancements, totalling 24 months. Eight phases, with planned overlap and three internal release milestones at month 9, month 16, and month 20, giving the business incremental value rather than a single big-bang go-live.

Phase modules in detail:

Phase 1 — Discovery, Compliance and Architecture (Months 1–3). Source system assessment with each contracted clinical endpoint and the claims processor. HIPAA control selection and HITRUST mapping. Cohort and quality measure scope. Target architecture and data model sign-off.

Phase 2 — Foundation and HIPAA Security (Months 2–5). Fabric workspaces, OneLake structure, customer-managed keys, private endpoints, Azure AD groups and roles, RLS and column-masking framework, Microsoft Purview classification, audit logging framework.

Phase 3 — Source Ingestion, staged (Months 4–14). Five subject areas brought online in waves: Claims first to enable Release 1, Clinical endpoints onboarded in cohorts, Pharmacy and Lab in parallel, Member and Provider directories last. Per-endpoint configuration is the bulk of the time, not core code.

Phase 4 — Silver and Entity Resolution (Months 8–17). Code-set mapping for diagnosis and procedure codes. Member-provider-claim linkage layer built progressively as each subject area comes online. Sensitive data masking and tokenisation. Reference data management for benefit plans and provider directories.

Phase 5 — Gold and Cohort Builder (Months 13–19). Dimensional model implementation by subject area, parameterised cohort builder views, industry-standard quality measure views with version tracking.

Phase 6 — Semantic Model and BI (Months 16–20). Direct Lake semantic model, RLS and CLS policies, Quality Measures and Provider Network dashboards, analyst training on the cohort builder.

Phase 7 — UAT, Compliance Validation and Go-Live (Months 18–20). Audit log validation, RLS and CLS testing, HIPAA attestation activities, business UAT, full cutover at Release 3.

Phase 8 — Hypercare and Minor Enhancements (Months 21–24). Defect triage, semantic model tuning, measure validation against external benchmarks, knowledge transfer.

Release milestones:

  • R1 (Month 9). Claims-based cohort analytics live. Audit log live in production.
  • R2 (Month 16). Linked clinical-claims cohort builder live. Member-provider-claim linkage operational. Population health and prior authorisation reporting live.
  • R3 (Month 20). Full platform live. Industry-standard quality measures running on a monthly cadence. All five subject areas integrated.

AI Acceleration Across the Build Lifecycle

The Custom-Build Accelerator brings AI assistance into seven stages of the build lifecycle. Each stage produces artefacts that engineers and architects review and harden rather than accept as final.

Exploratory data analysis on the source data. Before any modelling decisions are made, the accelerator profiles every source against the platform's intended use. It examines distributions, cardinality, null rates, key candidates, value frequencies, and drift signals over historical windows. For healthcare data, this surfaces issues teams typically discover months later: a clinical endpoint returning only the last 30 days of history, a claims feed with member identifiers re-keyed across years, a pharmacy stream with inconsistent unit-of-measure conventions. Engineers reviewed the EDA output and decided what was a real data issue versus a representation artefact, and what needed remediation at source versus downstream handling.

Architecture recommendation. Given the EDA findings, the in-scope KPIs, and the platform constraints (Microsoft Fabric, HIPAA-aligned posture, the Client's existing Azure footprint), the accelerator proposed a target architecture with reasoning: medallion structure, partitioning strategy, security perimeter, ingestion pattern per subject area, and orchestration approach. The lead architect treated the recommendation as a starting position for the design review, not a final answer. The final architecture deviated from the recommendation in several places, including workspace boundary decisions, RLS scoping, and the cohort builder layer specifically, for reasons rooted in this Client's regulatory posture.

Base code generation. Once the architecture was agreed, the accelerator generated the foundational platform code: workspace structure in Fabric, CI/CD wiring in Azure DevOps, gateway configuration, secret management, baseline pipeline scaffolding, security framework starter (RLS, CLS, audit table structures), and DataOps standards. This is the boilerplate every Fabric build needs and that no one wants to write by hand for the fifth time. Engineers reviewed, tightened conventions to Client standards, and moved on to the work that actually carries judgement.

Draft data model creation based on KPIs. The KPIs the business actually wanted to track were fed into the accelerator alongside the EDA output. These covered cohort definitions, quality measures, prior authorisation reporting metrics, network analysis cuts, and population health indicators. The accelerator produced a draft conformed data model: subject-area facts, dimensions, and the grain at which each fact should sit, with traceability from each KPI back to the model elements that serve it. The healthcare data SME and BI engineer revised before sign-off. The KPI-to-model traceability survived into the documentation, which made future change requests easier to scope.

Schema mapping. With the data model agreed, the accelerator proposed source-to-target mappings for each subject area: which source field maps to which model attribute, transformation logic where derivation is needed, and gaps where the source does not carry what the model requires. Engineers reviewed and adjusted, especially where source-specific extensions or non-standard usage required handling. Gaps surfaced here became change requests against the source systems, fed back to the clinical vendors and the claims processor.

Code-set mapping. Diagnosis and procedure code sets are version-sensitive and shift over time. The accelerator maintains effective-dated code-set reference tables and proposes mappings for codes encountered in source data, flagging codes that have been retired, replaced, or that never existed. Engineers approved the mappings the compliance team would rely on, with disputed codes escalated to the healthcare data SME.

Documentation. Source-to-target mapping documents, code-set version tracking, lineage diagrams, KPI-to-model traceability, architecture decision records, and measure specifications are generated and refreshed on every deployment. The compliance team reads the docs directly to support attestation and audit, rather than asking the engineering team for handover notes.

Where we kept humans firmly in front: the security model, sensitivity classification, RLS and CLS policy definitions, the data model after review, quality measure specification interpretation, the member master once resolved, business logic decisions, and every HIPAA control decision.

Outcome

  • Platform delivered in 24 months against an initial 30-month vendor proposal, with the time saved attributed largely to the accelerator's healthcare data model templates, clinical ingestion patterns, and the staged release model that brought value forward of go-live.
  • Five subject areas unified, with 35+ clinical endpoints integrated through per-endpoint configuration handling implementation variance.
  • Cohort builder response time for typical queries: under one hour, against a previous baseline measured in days of analyst effort.
  • Immutable audit log in production with 100% of protected access events captured, queryable for regulatory access reporting.
  • Industry-standard quality measures running on a monthly cadence, validated against measure specifications and reviewed by the quality team before publication.
  • HIPAA attestation completed before go-live; HITRUST-aligned control set documented and operational.
  • Provider network analysis enabled by the resolved provider dimension, supporting contract negotiations with claim density and specialty mix views that previously required manual extracts.

Tech Stack

Microsoft Fabric (HIPAA-aligned configuration): OneLake, Lakehouse, Data Pipelines, PySpark Notebooks, Direct Lake semantic model, Power BI.

Microsoft compliance and security: Microsoft Purview (sensitivity labels, data classification, lineage), Azure Key Vault with customer-managed keys, Private Endpoint, Azure AD with Conditional Access, Azure Monitor + Log Analytics.

MicrosoftDataOps: Git integration with Azure DevOps, Fabric deployment pipelines.

Custom-Build Accelerator for Healthcare: Claims Ingestion Accelerator, Clinical Ingestion Accelerator (with endpoint-specific adapters), Healthcare Data Model Templates, Code-Set Mapper, Member/Provider Entity Resolver, HIPAA Audit Framework, Cohort Builder Templates, Quality Measure Library, Documentation Generator.

Reflection

Three observations from the engagement worth keeping in mind for similar work.

First, HIPAA controls baked in from day one made compliance attestation tractable. Retrofitting controls into a working platform is many times more expensive than designing for them. Phase 1 covered the control set; Phase 2 implemented it; every subsequent phase respected it.

Second, the accelerator's healthcare data model templates saved months of work that would otherwise have gone into modelling decisions already made and re-made by every payer build team. What still needed real engineering was this Client's specific contract terms, prior authorisation rules, line-of-business definitions, and quality measure scope. Templates as a starting point, judgement as the work.

Third, clinical APIs look simple in their specifications. Real-world endpoints implement them inconsistently. Some return partial data, some support bulk exports, some require unusual authentication flows, and rate limits vary widely. The accelerator handled the specification cleanly; engineers handled the variance. Budget for per-endpoint configuration work, not for a single clinical client.

Frequently Asked Questions

Answering common questions about 3X Data Engineering to help you get started on your modernization journey.

It unified disconnected healthcare data sources for faster cohort analytics, quality reporting, and regulated data access.
Claims, clinical, pharmacy, lab, member, and provider data were consolidated into a single governed platform.
The platform used private endpoints, audit logging, row- and column-level security, encryption, and Microsoft Purview governance.
Cohort analysis dropped from days of manual effort to under one hour with integrated quality measure reporting.

Building a Healthcare Data Platform on Microsoft Fabric?

Accelerate healthcare analytics with HIPAA-aligned Fabric architectures, governed data models, and scalable cohort analytics.

Request a Demo

Let's talk scale.

Our team of engineering experts and AI architects is ready to help you accelerate your data modernization journey.

Email

Phone / Text

-Select-