Snowflake -Medallion Architecture

Medallion Architecture is a way to organize and process data in a data platform (often a data lakehouse) so that it becomes cleaner, more reliable, and easier to use as it moves through stages.

Think of it like filtering and refining raw materials into a finished product in steps:

Main Components

1. Cloud Services

2. Storage Layer

3. Compute Layer (Virtual Warehouses)

Key Features (Short & Practical)

Architecture & Performance

Separation of storage & compute – scale independently
Multi-cluster warehouses – handle high concurrency
Automatic scaling – no capacity planning

Data Management

Time Travel – query old versions of data
Fail-safe – recovery for dropped data
Zero-copy cloning – instant table/databasecopies
Streams & Tasks – change tracking +automation

Data Types & Processing

Structured & semi-structured data (JSON, Parquet, Avro)
VARIANT data type for JSON
Snowpipe – continuous data loading

Security

End-to-end encryption
Role-based access control (RBAC)
Network policies
Masking & row-level security

Data Sharing & Integration

Secure Data Sharing – share live data without copying
Marketplace – access external datasets
Works with BI tools (Power BI, Tableau,Looker)

Developer & Analytics

SQL support
UDFs & Stored Procedures (SQL, JavaScript, Python)
Snowpark – use Python/Scala/Java for data processing
Supports ML pipelines

Bronze (Raw)

This is the first layer where data lands.
It’s essentially the raw data ingested from source systems.
It may be messy, unfiltered, and with duplicates.
Purpose: preserve original data for audit/completeness.

Bronze Layer – Raw Data

Purpose: Store data exactly as it arrives.

How in Snowflake:

Tables or external tables
Loaded via Snowpipe / COPY INTO / connectors
Minimal or no transformation

Typical objects:

RAW_DB.BRONZE_SCHEMA.customer_events
RAW_DB.BRONZE_SCHEMA.orders_raw

Characteristics:

JSON/CSV allowed
Duplicates possible
No business rules
Keep metadata: load_time, source_file, batch_id

Example

CREATE TABLE bronze_orders (

raw_payload VARIANT,

load_time TIMESTAMP

);

Silver (Cleansed / Conformed)

Data has been cleaned, standardized, and deduplicated.
You fix data quality issues and make it consistent (e.g., consistent date formats, correct types).
You might join related datasets here (e.g., add user names to user IDs).
Purpose: reliable analytical data for reporting and exploration.

Silver Layer – Clean & Structured

Purpose: Clean, validate, standardize, deduplicate.

How in Snowflake:

Use Streams + Tasks or scheduled SQL
Parse JSON
Fix data types
Remove duplicates
Apply basic business rules

Typical objects:

CURATED_DB.SILVER_SCHEMA.orders_clean

CURATED_DB.SILVER_SCHEMA.customers_clean

Example transformation

CREATE TABLE silver_orders AS SELECT

raw_payload:order_id::STRING AS order_id,

raw_payload:amount::NUMBER AS amount,

raw_payload:order_date::DATE AS order_dat load_time

FROM bronze_orders

WHERE raw_payload:order_id IS NOT NULL;

Gold (Business-Ready)

Highly refined, aggregated, and optimized for specific use cases.
This could be dashboards, machine learning features, KPIs, reports.
It’s curated for performance and user needs.
Purpose: ready for consumption by BI tools and apps

Gold Layer – Business Ready

Purpose: Optimized for reporting, dashboards, ML.

How in Snowflake:

Star schemas
Aggregated tables
Data marts
Secure views for BI tools

Typical objects:

ANALYTICS_DB.GOLD_SCHEMA.sales_daily

ANALYTICS_DB.GOLD_SCHEMA.customer_lifetime_value

Example

CREATE TABLE gold_daily_sales AS

SELECT

order_date,

COUNT(*) AS total_orders,

SUM(amount) AS total_revenue

FROM silver_orders

GROUP BY order_date;

Why Use It?

Clarity and structure: Each layer has a clear role.
Quality control: Systematic points to clean and validate.
Performance: Transformations prepare data for fast querying.
Reusability: Silver data can serve multiple gold use cases.

Example Flow

Ingest clickstream logs into Bronze as soon as they arrive.
Parse, remove errors, fix time stamps and join user info in Silver.
Compute session metrics and populate dashboards in Gold.

When It’s Useful

Large data volumes with varied sources.
Teams needing dependable, versioned datasets.
Analytics, ML, and reporting across many consumers.

If you want, I can walk through a concrete example with tools like Delta Lake / Databricks or Snowflake and show how the transformations look in SQL.