This project shows how I work with messy, real‑world marketing and analytics data.
The dataset simulates exports from Meta Ads, Google Ads, GA4, and an e‑commerce platform.
The goal is to turn raw, inconsistent inputs into a clean, analysis‑ready model suitable for BI dashboards and performance reporting.
This page gives a clear overview of the problem, the approach, and the results — with optional dropdowns for deeper technical detail.
SQL cleaning (CAST, COALESCE, TRIM, CASE)
Deduplication using ROW_NUMBER
CTE‑based modeling
Joining spend, sessions, and revenue
Attribution logic using LEFT JOIN
Date normalization (DATE, DATE_TRUNC)
Preparing BI‑ready tables for dashboards
This project unifies marketing spend, web sessions, and transaction data into a single, consistent model.
The dataset includes missing values, inconsistent naming, text‑stored numbers, and duplicated transactions — all cleaned and standardized through SQL.
Github: Full SQL scripts
Dashboard (optional): link to BI view
Marketing data arrives fragmented across platforms with inconsistent naming, missing UTMs, duplicated transactions, and mismatched date formats.
This makes CPC, CPM, ROAS, and conversion reporting unreliable.
Cleaned each table using CTEs
Standardized naming and fixed data types
Removed duplicates with ROW_NUMBER
Joined spend → sessions → revenue
Built daily performance metrics
Duplicate transactions removed
Campaign naming normalized
Missing UTMs reduced
Daily campaign performance table created
BI‑ready dataset prepared
Cleaned SQL tables
Full SQL scripts (schema, cleaning, analysis, advanced)
Data model diagram (coming)
Campaign performance dashboard (coming)
This project summary
============================================
-- 2. Rolling 7-day revenue
-- Demonstrates window functions for time-series smoothing.
-- ============================================
SELECT
date(transaction_time) AS date,
SUM(revenue) AS daily_revenue,
SUM(SUM(revenue)) OVER (
ORDER BY date(transaction_time)
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7d_revenue
FROM cleaned_transactions
GROUP BY date
ORDER BY date;
Each dataset required targeted cleaning to fix inconsistent formats, duplicated rows, and text‑stored numeric values. The cleaning pipeline uses CTEs to keep the logic readable and modular.
Ad Spend
Normalized campaign names (TRIM, LOWER, REPLACE)
Converted cost, impressions, and clicks from text to numeric
Standardized dates using DATE()
Removed rows with missing or invalid spend values
Web Sessions
Cleaned UTM parameters (TRIM, COALESCE)
Fixed inconsistent casing in campaign names
Converted session timestamps to dates
Removed duplicate sessions using ROW_NUMBER()
Transactions
Deduplicated transactions using ROW_NUMBER() on transaction_id
Converted revenue from text to numeric
Normalized date formats
Linked transactions to sessions via session_id
Full cleaning SQL is available in the GitHub repository.
The data model unifies spend, sessions, and revenue into a single structure for daily campaign performance reporting.
Keys & Relationships
Spend keyed by date + campaign_name
Sessions keyed by date + utm_campaign
Transactions keyed by transaction_id, linked to sessions via session_id
Join Logic
Spend → Sessions: matched on normalized campaign name + date
Sessions → Transactions: matched on session_id
Final model aggregates cost, sessions, clicks, impressions, and revenue at the daily campaign level
Full SQL implementation is available in the GitHub repository.