Medallion Architecture is a way to organize and process data in a data platform (often a data lakehouse) so that it becomes cleaner, more reliable, and easier to use as it moves through stages.
Think of it like filtering and refining raw materials into a finished product in steps:
Separation of storage & compute – scale independently
Multi-cluster warehouses – handle high concurrency
Automatic scaling – no capacity planning
Time Travel – query old versions of data
Fail-safe – recovery for dropped data
Zero-copy cloning – instant table/databasecopies
Streams & Tasks – change tracking +automation
Structured & semi-structured data (JSON, Parquet, Avro)
VARIANT data type for JSON
Snowpipe – continuous data loading
End-to-end encryption
Role-based access control (RBAC)
Network policies
Masking & row-level security
Secure Data Sharing – share live data without copying
Marketplace – access external datasets
Works with BI tools (Power BI, Tableau,Looker)
SQL support
UDFs & Stored Procedures (SQL, JavaScript, Python)
Snowpark – use Python/Scala/Java for data processing
Supports ML pipelines
Bronze (Raw)
This is the first layer where data lands.
It’s essentially the raw data ingested from source systems.
It may be messy, unfiltered, and with duplicates.
Purpose: preserve original data for audit/completeness.
Purpose: Store data exactly as it arrives.
How in Snowflake:
Tables or external tables
Loaded via Snowpipe / COPY INTO / connectors
Minimal or no transformation
Typical objects:
RAW_DB.BRONZE_SCHEMA.customer_events
RAW_DB.BRONZE_SCHEMA.orders_raw
Characteristics:
JSON/CSV allowed
Duplicates possible
No business rules
Keep metadata: load_time, source_file, batch_id
Example
CREATE TABLE bronze_orders (
raw_payload VARIANT,
load_time TIMESTAMP
);
Silver (Cleansed / Conformed)
Data has been cleaned, standardized, and deduplicated.
You fix data quality issues and make it consistent (e.g., consistent date formats, correct types).
You might join related datasets here (e.g., add user names to user IDs).
Purpose: reliable analytical data for reporting and exploration.
Purpose: Clean, validate, standardize, deduplicate.
How in Snowflake:
Use Streams + Tasks or scheduled SQL
Parse JSON
Fix data types
Remove duplicates
Apply basic business rules
Typical objects:
CURATED_DB.SILVER_SCHEMA.orders_clean
CURATED_DB.SILVER_SCHEMA.customers_clean
Example transformation
CREATE TABLE silver_orders AS SELECT
raw_payload:order_id::STRING AS order_id,
raw_payload:amount::NUMBER AS amount,
raw_payload:order_date::DATE AS order_dat load_time
FROM bronze_orders
WHERE raw_payload:order_id IS NOT NULL;
Gold (Business-Ready)
Highly refined, aggregated, and optimized for specific use cases.
This could be dashboards, machine learning features, KPIs, reports.
It’s curated for performance and user needs.
Purpose: ready for consumption by BI tools and apps
Purpose: Optimized for reporting, dashboards, ML.
How in Snowflake:
Star schemas
Aggregated tables
Data marts
Secure views for BI tools
Typical objects:
ANALYTICS_DB.GOLD_SCHEMA.sales_daily
ANALYTICS_DB.GOLD_SCHEMA.customer_lifetime_value
Example
CREATE TABLE gold_daily_sales AS
SELECT
order_date,
COUNT(*) AS total_orders,
SUM(amount) AS total_revenue
FROM silver_orders
GROUP BY order_date;
Clarity and structure: Each layer has a clear role.
Quality control: Systematic points to clean and validate.
Performance: Transformations prepare data for fast querying.
Reusability: Silver data can serve multiple gold use cases.
Ingest clickstream logs into Bronze as soon as they arrive.
Parse, remove errors, fix time stamps and join user info in Silver.
Compute session metrics and populate dashboards in Gold.
Large data volumes with varied sources.
Teams needing dependable, versioned datasets.
Analytics, ML, and reporting across many consumers.
If you want, I can walk through a concrete example with tools like Delta Lake / Databricks or Snowflake and show how the transformations look in SQL.