Customer 360 on Snowflake for a Multi-Channel Retailer
At a Glance
The Client needed a single customer view that worked the same way for the marketing team running reactivation campaigns, the customer service team handling support calls, and the merchandising team planning assortments. Our team delivered a Customer 360 platform on Snowflake unifying five channels, with identity resolution at the centre, activation feeds back into the marketing stack, and AI accelerators applied across the build lifecycle.
Client Background
The Client operates several hundred stores and a substantial direct-to-consumer e-commerce operation. The data estate had grown organically alongside the business: an e-commerce platform with its own analytics, a POS system feeding nightly extracts into a legacy warehouse, a loyalty programme on a packaged CRM, multiple marketing platforms (email, paid media, retargeting), and a customer service system with its own customer record.
Each system had its own version of the customer. The same person could be a Gold-tier loyalty member, an active e-commerce account, a regular in-store shopper with a credit card record, an email subscriber under a slightly different address, and a customer service ticket holder, with no two systems agreeing on who that person was. Marketing campaigns targeted lists assembled by hand. Service reps relied on whatever the customer told them. Merchandising had to choose which channel's view of the customer to trust.
Project Objective
The brief was to build a unified Customer 360 platform that produced a single, reconciled view of each customer across all channels, and made that view available to the teams who needed it. Three measurable goals were set at the outset:
- Resolve customer identity across the five channels with an end-to-end match rate of at least 80%, validated against a labelled sample.
- Deliver daily-refresh customer segments to the marketing activation stack, replacing the existing monthly batch process.
- Make the unified customer profile available to customer service reps with a sub-second look-up by phone number, email, or loyalty ID.
Our Approach and Architecture
The architecture is Snowflake-native, using Snowpipe and connectors for ingestion, dbt for the transformation layer, Streams and Tasks for orchestration, Dynamic Tables for some derived views, and Cortex for the identity resolution model. The medallion structure is conventional, expressed by subject area rather than table name.
Ingestion
Three patterns covering channel diversity. POS extracts arrive nightly as files in S3 and are auto-ingested into Snowflake via Snowpipe, with schema validation at load time and quarantine on drift. The pattern is well-suited to the POS system's batch behaviour, and Snowpipe's auto-ingest on file arrival removes the need for a separate scheduler.
Loyalty/CRM and marketing platform data is pulled through API connectors. Pagination, watermark-driven incremental extraction, and retry-aware behaviour are handled at the connector layer. Raw payloads land directly into Snowflake tables for downstream parsing.
E-commerce and customer service systems push events as they happen via webhooks, ingested through Snowpipe Streaming. This gives near-real-time visibility into the channels where the freshness matters most.
Raw, Conformed, Curated
Raw holds source-aligned, untransformed records organised by subject area: e-commerce events, POS transactions, loyalty and CRM records, marketing engagement, and service interactions. Streams enable change-data-capture into the Conformed layer without re-reading raw tables.
Conformed parses and standardises by subject area. Transactions, engagement, service history, catalogue (products), and geography are each conformed into a canonical model that downstream consumers join against. Cross-channel attribute reconciliation happens here too.
This is also the layer where identity resolution runs, producing the customer master. The accelerator combines deterministic rules (where a unique identifier exists, such as loyalty ID or e-commerce account ID) with an AI-assisted matcher trained on labelled pairs covering the long tail of fuzzy matches. Low-confidence matches are reviewed in batches; corrections feed back into the model. The resulting customer master is the spine that every downstream subject area joins against.
Curated is the Customer 360 layer. Subject areas here are customer-centric rather than source-centric: Customer Profile, Customer Behaviour, Engagement, Service History, Segments, Lifetime Value, and Churn Indicators.
Identity Resolution and Customer Master
The identity resolution problem is the heart of the build. The same person can be a loyalty member, an e-commerce account, a POS-only customer, a marketing list member, and a service contact. The accelerator builds a candidate graph linking these identifiers, applies deterministic rules where unambiguous matches exist, and uses an AI-assisted matcher for the residual fuzzy cases. The customer data SME and a sample of customer service representatives reviewed batches of low-confidence matches during the build. The match rate stabilised around 85% end-to-end against the labelled validation sample.
The customer master is updated daily. Late-arriving identifiers trigger a re-resolution that merges previously separate records into the master with full lineage preserved.
Semantic and Consumption
Three consumption paths. Power BI connects to Snowflake for analytical reporting: customer dashboards, segment analytics, cohort-based lifetime value views, channel mix analysis. Reverse ETL pipes curated segments to the marketing platforms on a daily schedule, replacing the previous monthly batch hand-off. Customer service reps query the customer master through a sub-second lookup by phone number, email, or loyalty ID.
Privacy and Governance
Customer consent is managed alongside the customer master, with consent state propagated to every downstream consumer. Dynamic data masking protects PII based on user role. Row access policies scope brand and region. Right-to-erasure requests trigger a tracked, auditable deletion flow that propagates across the lake. The audit log captures every access event.
Orchestration and DataOps
Snowflake Tasks orchestrate the transformation flow, with Streams driving incremental processing where appropriate. dbt manages the transformation models with version control in GitHub. CI/CD runs through GitHub Actions. Secrets live in AWS Secrets Manager. Identity is federated via the Client's existing SSO provider.
Project Plan
The build ran for 14 weeks of construction followed by 4 weeks of hypercare, totalling 18 weeks. Eight phases, with the identity resolution and customer master work occupying the longest single phase because it required the most engineer-in-the-loop review.
Phase modules in detail:
Phase 1 — Discovery and Architecture (Weeks 1–3). Source system assessment across all five channels. KPI scope across marketing, customer service, and merchandising. Target architecture on Snowflake including identity resolution approach. Privacy posture and consent management requirements.
Phase 2 — Foundation and Privacy Setup (Weeks 2–5). Snowflake account configuration, role hierarchy, warehouses, and resource monitors. dbt project structure and GitHub repository. CI/CD via GitHub Actions. Dynamic data masking framework. Row access policy framework.
Phase 3 — Source Ingestion (Weeks 3–9). Snowpipe configuration for POS file ingestion from S3. API connectors for loyalty, CRM, and marketing platforms. Snowpipe Streaming for e-commerce and customer service webhooks.
Phase 4 — Identity Resolution and Customer Master (Weeks 6–11). Candidate identifier graph built across all five channels. Deterministic rules for unambiguous matches. AI-assisted matcher tuned on labelled pairs. Batch review with the customer data SME.
Phase 5 — Curated Layer and Customer 360 (Weeks 9–13). Customer Profile, Behaviour, Engagement, Service History, Segments, Lifetime Value, and Churn Indicators built as dbt models. KPI-to-model traceability documented.
Phase 6 — BI and Activation (Weeks 11–14). Power BI dashboards for marketing, customer service, and merchandising. Reverse ETL configured to push daily segments. Customer service desktop integration for sub-second lookup.
Phase 7 — UAT and Go-Live (Weeks 13–14). Identity match validation against labelled sample. Parallel run with the previous monthly batch process. Marketing and customer service UAT. Cutover.
Phase 8 — Hypercare and Minor Enhancements (Weeks 15–18). Defect triage. Match-rate refinement based on production observations. Performance tuning of the customer service lookup. Knowledge transfer.
AI Acceleration Across the Build Lifecycle
The accelerator suite brings AI assistance into seven stages of the build lifecycle. Each stage produces artefacts that engineers and architects review and harden rather than accept as final.
Exploratory data analysis on the source data. Before any modelling decisions, the accelerator profiles every channel against the platform's intended use. Distributions, cardinality, null rates, key candidates, value frequencies, and drift signals over historical windows. For retail data, this surfaced things teams typically discover months later: e-commerce events with timezone drift, POS extracts with re-keyed product IDs after a merchandising system upgrade, loyalty platform records with stale email addresses, marketing platforms with overlapping audience definitions.
Architecture recommendation. Given the EDA findings, the in-scope KPIs (customer 360 attributes, segment definitions, activation requirements), and the platform constraints (Snowflake, dbt, GitHub), the accelerator proposed a target architecture with reasoning: Snowflake account topology, warehouse sizing, role hierarchy, ingestion pattern per channel, identity resolution approach, and activation pattern.
Base code generation. Once the architecture was agreed, the accelerator generated the foundational platform code: Snowflake account setup scripts, dbt project structure, baseline ingestion code, masking framework templates, row access policy templates, and CI/CD wiring.
Draft data model creation based on KPIs. The KPIs the business wanted to track were fed into the accelerator alongside the EDA output. These covered customer lifetime value, segment definitions, churn indicators, channel-mix metrics, and reactivation cohorts. The accelerator produced a draft customer-centric data model: customer subject-area facts and dimensions, the grain at which each entity should sit, and traceability from each KPI back to the model elements that serve it.
Schema mapping. With the data model agreed, the accelerator proposed source-to-target mappings for each channel: which source field maps to which model attribute, transformation logic where derivation is needed, and gaps where the source does not carry what the model requires.
Identity resolution. This was the highest-leverage capability for the build. The accelerator builds the candidate identifier graph across channels and proposes deterministic rules where unambiguous matches exist. For the residual fuzzy cases, an AI-assisted matcher tuned on labelled pairs proposes candidate matches with confidence scores. The customer data SME and customer service reps reviewed low-confidence matches in batches; corrections fed back into the matcher.
Documentation. Source-to-target mapping documents, KPI-to-model traceability, lineage diagrams, identity match rules, and segment specifications are generated and refreshed on every deployment. The marketing, customer service, and merchandising teams read the docs directly to understand what attributes are available and where they come from.
Where we kept humans firmly in front: the target data model after review, segment specifications, the customer master once resolved, consent management policies, dynamic masking rules, and every privacy decision.
Outcome
- Platform delivered in 18 weeks against an initial 22-week scope, with the time saved attributed largely to the accelerator's Snowflake-native code generation, identity resolution scaffolding, and KPI-driven data model generation.
- Five channels unified, with the customer master resolving ~12 million pre-resolution records to ~8.5 million unique customers.
- End-to-end identity match rate of approximately 85% against a labelled validation sample, exceeding the 80% target.
- Marketing activation moved from monthly batch to daily refresh, with segments piped to email and paid media platforms each morning.
- Customer service reps gained sub-second look-up of the unified customer profile by phone, email, or loyalty ID, replacing the previous practice of asking the customer to repeat themselves across systems.
- Merchandising team adopted the Customer 360 lifetime value and segment views for assortment decisions in the season following go-live.
- Right-to-erasure flow operational and audited, with first-pass deletions completing in under 24 hours of request.
Tech Stack
Snowflake: Account, Warehouses, Snowpipe, Snowpipe Streaming, Streams, Tasks, Dynamic Tables, Cortex (for the identity matcher), Dynamic Data Masking, Row Access Policies.
Transformation andDataOps: dbt for transformation models, GitHub for version control, GitHub Actions for CI/CD, dbt deployment pipelines.
Activation and consumption: Power BI, reverse ETL platform for marketing activation, customer service desktop integration via REST API over Snowflake.
Cloud and security: AWS S3 for source staging, AWS Secrets Manager, SSO via the Client's existing identity provider, audit logging through Snowflake's native logging and an external SIEM.
AI accelerators: Source Profiling Accelerator, Architecture Recommendation Accelerator, Base Code Generator (dbt-aware), Data Model Generator (KPI-driven), Schema Mapper, Identity Resolution Accelerator, Documentation Generator.
Reflection
Three observations from the engagement worth keeping in mind for similar work.
First, identity resolution is the work. Everything else is plumbing. The accelerator compressed the schema archaeology and the boilerplate transformation work, but the long tail of fuzzy customer matches still required engineering judgement, SME review, and business input to settle. Plan for the matching work to take the largest single phase of the build.
Second, daily refresh changed how the marketing team operated. Monthly refresh meant the marketing team designed campaigns against a snapshot that was already three weeks old by the time it activated. Daily refresh let them respond to actual behaviour. The platform's technical specifications did not change; the consumption pattern did.
Third, the AI-augmented build had the same fundamental engineering rigour as a traditional build, with the boilerplate compressed. The accelerator did not make the data model easier to design; it made it faster to express. It did not make identity resolution easier to validate; it made the candidate set larger and the review more efficient. AI augmentation is leverage on the work that has to be done well, not a substitute for doing it.