Nexus is a high-performance Data Engineering ecosystem that refactors 180,000+ raw shipment records into an optimized Star Schema. By implementing a Medallion Architecture, the project ensures "Zero-Defect" reporting and full data traceability from raw logs to executive strategy.
All code, including SQL scripts, and DAX measures, is version-controlled and fully documented.
👉 View Live Power BI Dashboard | View Technical Documentation | Data Source (Kaggle)
The transformation layer transitions data from a flat transactional stream into a convergent Star Schema, hosted on a Databricks Pro SQL Warehouse.
Bronze (Raw): Immutable landing zone for 53 variables of shipment and clickstream data.
Silver (Integrated): Sanitization hub using MD5 surrogate keys and dbt-utils for strict type casting.
Gold (Curated): The consumption layer featuring fct_shipping_performance and modular dimensions for 100% referential integrity.
Compute: Leveraged Delta Lake for ACID transactions and high-speed query performance.
CI/CD & Testing: Implemented a multi-layered testing strategy (Unique, Not_Null, and Relationship tests) to eliminate "Blank" values in downstream BI slicers.
Data Flow Architecture (Mediallion)
The Power BI layer translates engineering integrity into Prescriptive Strategy through custom-engineered DAX logic.
The Profitability Matrix: A risk-vs-reward quadrant that identifies "The Danger Zone" (unprofitable/late orders) versus "The Gold Standard" (high-margin/on-time).
The Profit Bridge (Waterfall): A dynamic model that quantifies "Revenue at Risk." It features a "What-If" parameter allowing executives to simulate margin reclamation based on SLA improvements.
SLA Gap Discovery: Root-cause analysis revealing that "First Class" delays are often due to unrealistic policy settings (1-day targets) rather than operational failure.
The Profitability Matrix (Risk vs. Reward Quadrant)
This quadrant analysis identifies exactly where logistics failures are eroding high-value margins.
To go beyond simple averages, I engineered a Profitability Matrix that plots categories/regions into four distinct quadrants. This allows executives to see exactly where margin is being eroded by logistics failures.
High Profit / Low Late Rate (Green): "The Gold Standard" — Efficient regions to be used as internal benchmarks.
High Profit / High Late Rate (Yellow): "The Growth Opportunity" — High-value areas being capped by logistics bottlenecks.
Low Profit / High Late Rate (Red): "The Danger Zone" — Orders that are both unprofitable and damaging to customer trust.
Low Profit / Low Late Rate (Gray): "The Efficiency Trap" — Reliable but low-margin orders requiring cost optimization.
Profit Bridge (Water fall)
Instead of just reporting past performance, this project models future recovery. The Waterfall Bridge calculates how much "Revenue at Risk" can be reclaimed through SLA improvements.
Using a custom-engineered Waterfall Bridge, the report models the path from Current Profit to Target Profit.
Revenue at Risk: Quantifies the total sales volume tied to "Late Delivery" orders.
SLA Recovery Simulation ("What-If"): A parameter-driven forecast showing margin reclamation based on a user-defined improvement percentage (0% - 100%).
Perfect Order Rate: Intersection of profitability and reliability (Orders where Profit > 0 AND Late = 0).
SLA Variance: A dynamic measure calculating [Real Days] - [Scheduled Days] to highlight efficiency leaks.
SLA Gap Discovery (Root Cause Analysis)
A primary discovery of this project was the Logistics Strategy Mismatch. By calculating the delta between "Scheduled" and "Real" shipping days, the dashboard identifies that high late rates are often a result of unrealistic policy settings rather than warehouse failure.
A primary discovery of this project was the Logistics Strategy Mismatch.
By calculating the delta between "Scheduled" and "Real" shipping days, the dashboard identifies that high late rates are often a result of unrealistic policy settings rather than warehouse failure.
The "Voice of the Dashboard" provides real-time analysis based on active filters and simulation parameters.
I implemented a Context-Aware Smart Narrative that serves as the "Voice of the Dashboard." This DAX-driven narrative:
Recognizes active filters for Year, Shipping Mode, and Order Status.
Interprets the "What-If" simulation, categorizing strategies as "Conservative," "Moderate," or "Aggressive."
Custom Report Page Tooltips provide deep-dive insights without cluttering the main view:
SLA Variance Tooltip: Visual comparison of "Promised" vs "Real" transit times.
Profit Leakage Tooltip: Explains the math behind "Recovery Opportunity" directly on the Waterfall bars.
The "Tricked" Waterfall: Utilized DAX SWITCH logic and BLANK() values to allow the built-in Total Column to act as a dynamic "Target Profit" pillar.
Conditional Signaling: HEX-code measures change colors based on performance (e.g., the Recovery bar turns green only when the slider is > 0%).
Performance Signaling: A color-coded status icon (🟢/🟡/🔴) that flags logistics health based on current Late Sales % thresholds.
We implemented a multi-layered testing strategy to ensure dashboard numbers remain credible and "Blank" values are eliminated in BI slicers.
Uniqueness & Null Handling: Applied unique and not_null tests to shipment_item_id and all surrogate keys to prevent row-inflation and duplicate sales reporting.
Financial Integrity: Hardened metrics with not_null constraints on sales_amount and profit_amount.
To support dbt 2.0 standards, all relationship tests use the new arguments pattern:
# Example: Product Dimension Relationship Test
- name: product_key
tests:
- relationships:
arguments:
to: ref('dim_products')
field: product_key
Metric Decoupling: * Financial KPIs (Sales/Profit): Hardened to include only COMPLETE and CLOSED statuses (measures to exclude non-realized revenue streams).
Operational KPIs (Avg Days): Includes all statuses to capture true logistics bottlenecks.
Metric Decoupling: Separated Financial KPIs (Strictly CLOSED status) from Operational KPIs (All statuses) to ensure audit accuracy without hiding root causes.
Upstream Key Gen: Moving surrogate keys to the Silver layer made the architecture DRY (Don't Repeat Yourself) and eliminated data fan-out.