🔹INDEX -
Project Overview
Business Problem Framing
Business Questions & Analytical Insights
Data Architecture (Staging → Core → Mart)
Data Transformation & Validation
Risk Segmentation & Risk Scoring Model
Mart Layer Analytical Views
Power BI Integration
BI Security Layer (Read-Only Role)
Data Model Documentation & ERD
Conclusion
Financial institutions require a structured analytics framework to monitor loan portfolio risk, identify default drivers, evaluate borrower risk across regions and employment segments, and support executive-level credit strategy decisions.
The objective of this project is to design an end-to-end analytics pipeline that transforms raw loan data into structured, decision-ready insights using PostgreSQL and Power BI.
This project was approached from a Business Analyst perspective by first defining the core risk questions stakeholders needed answered. The objective was to translate business requirements into measurable analytical outputs that support credit risk strategy and executive decision-making.
The table below illustrates how stakeholder business questions were translated into measurable KPIs and implemented as analytical views within the PostgreSQL Mart layer.
Business Questions & Analytical Insights-
The following analyses translate key lending risk questions into SQL-based insights to evaluate loan portfolio performance, borrower risk patterns, and regional exposure
BUSINESS QUESTIONS -
Key Insight: The West region recorded the highest loan volume (280 loans) followed closely by the Northeast (258), indicating stronger lending demand and portfolio exposure in these regions.
Key Insight: Most loans are Approved (503) or Paid Off (253), while 144 loans are defaulted, indicating that although the portfolio is largely performing, a noticeable portion of borrowers still present credit risk.
Key Insight: The Northeast region shows the highest default rate (16.28%), suggesting higher borrower risk in this geography and indicating an area that may require stricter credit evaluation.
Key Insight: Borrowers with lower average credit scores show higher default occurrences, confirming that credit score is a strong indicator of borrower repayment risk.
Key Insight: Borrowers with higher average debt-to-income ratios are more likely to default, highlighting DTI as an important financial stability indicator in credit risk assessment.
BUSINESS ANALYSIS -
Key Insight: Business owners show the highest default rate (16.22%), followed by salaried borrowers, indicating that employment type influences borrower repayment behavior.
Key Insight: Borrowers were successfully categorized into High, Medium, and Low risk segments, enabling clearer monitoring of portfolio risk exposure and supporting targeted lending strategies.
Key Insight: Higher-risk borrowers tend to be associated with slightly higher interest rates, suggesting that loan pricing partially reflects borrower risk levels
Key Insight: A composite risk score model was developed using credit score, debt-to-income ratio, and interest rate to provide a standardized metric for comparing borrower risk across the portfolio.
Key Insight: employment categories were ranked by default rate, revealing that business owners contribute the highest risk, helping prioritize borrower segments for monitoring.
Key Insight: More than half of borrowers fall into the High-Risk segment (56%), indicating a portfolio with a significant concentration of higher-risk borrowers.
Key Insight: The Northeast region shows the highest average risk score, confirming it as the most risk-exposed geographic segment in the portfolio.
Key Insight: This consolidated view integrates loan volume, default metrics, and risk indicators into a single decision-ready dataset, enabling executives to monitor portfolio risk and lending performance efficiently.
Data Architecture & Analytical Implementation (PostgreSQL Layer)
To support scalable loan risk analytics, I designed a structured database architecture named loan_risk_db.
I implemented a three-layer schema model (Staging → Core → Mart) to separate raw ingestion, cleaned production data, and business-ready analytical views.
To preserve raw source integrity and prevent schema conflicts during ingestion, I created a dedicated staging table (staging.loan_raw) where all fields were stored as text.
This controlled ingestion layer minimized type mismatch errors and established a clean separation between raw data loading and transformation logic.
Controlled Raw Data Ingestion (Staging Layer Architecture)
I then engineered a structured production table (core.loan) by transforming and type-casting raw staging data into validated business attributes. This table was designed with appropriate data types, constraints, and a primary key to ensure referential integrity and analytical consistency.
After structuring the core table, I implemented performance-optimized indexing strategies on high-selectivity analytical dimensions (loan_status, region, employment_type, credit_score) commonly used in filtering, grouping, and aggregation within BI reporting workloads.
Before enabling downstream risk analytics, I implemented structured data quality controls on the core.loan table to ensure analytical reliability and governance compliance.
I validated primary key uniqueness (loan_id), enforced non-null integrity on critical business attributes (region, employment_type, loan_status), and verified acceptable value ranges for credit_score and debt_to_income_ratio.
These validation controls ensured that all 1,000 records met analytical integrity standards before progressing into the Mart layer.
Data Quality Validation Queries on core.loan (Primary Key, Null Integrity & Range Checks)
With validated and performance-optimized production data in place, I transitioned into building the business-ready analytical Mart layer.
mart.region_risk_dashboard
To evaluate loan performance and default exposure at the regional level.
I aggregated total loans, default counts, default rate %, average credit score, DTI, interest rate, and risk score by region. This view helps identify high-risk geographic segments.
mart.employment_risk_dashboard -
To analyze how employment type correlates with default risk.
I calculated loan counts, default rates, and risk metrics grouped by employment type to detect higher-risk borrower profiles.
After understanding region and employment risk, I introduced structured borrower classification.
mart.risk_segment_distribution
To classify borrowers into High, Medium, and Low risk segments.
I implemented risk segmentation logic using credit score and DTI thresholds and calculated the percentage distribution of each segment.
This segmentation created clear borrower risk buckets that I later used to build a weighted risk scoring model.
mart.avg_risk_score_by_region
To quantify regional risk using a weighted risk scoring model.
I engineered a weighted composite risk scoring model by normalizing credit score, scaling debt-to-income ratio, and incorporating interest rate sensitivity into a bounded 0–100 risk framework using lease and greatest controls.
This model quantified borrower risk at the regional level, enabling comparative risk benchmarking across geographic segments.
mart.executive_risk_summary
To provide leadership with a high-level overview of portfolio risk exposure.
I combined loan volume, default metrics, and risk indicators into a single summarized view designed for strategic decision-making.
This view consolidated regional loan volume, default rates, average credit score, DTI, and composite risk score into a leadership-ready analytical layer.
It provides executives with a high-level portfolio risk overview to support strategic credit allocation and risk mitigation decisions.
mart.master_region_dashboard
To consolidate multiple regional metrics into a structured reporting layer.
This view unified region-level loan volume, default metrics, credit indicators, and composite risk scores into a structured analytical reporting layer.
It standardizes regional KPIs into a consistent format for operational dashboards and comparative performance analysis.
mart.powerbi_master
This view serves as a flattened semantic modeling layer designed specifically for BI consumption.
I consolidated row-level loan attributes, derived risk segments, composite risk scores, and standardized analytical metrics into a denormalized structure to reduce transformation logic inside Power BI.
By pushing business logic into PostgreSQL, I improved dashboard performance, simplified DAX calculations, and established a single source of truth for executive reporting.
This view is now connected directly to Power BI for executive dashboard creation.
Secure BI Access Layer – Read-Only bi_reader Role
To simulate a production-grade BI deployment environment, I implemented a dedicated read-only database role (bi_reader) to securely expose Mart-layer analytical views to Power BI.
This role enforces least-privilege access by granting:
CONNECT access to the database
USAGE on the Mart schema
SELECT permissions on all current and future Mart tables and views
This ensures BI tools can query analytical outputs without write or modification privileges, maintaining data governance and security controls.
Verification: bi_reader Access Validation
I validated role permissions by querying PostgreSQL system catalog tables to confirm bi_reader had SELECT access on all Mart-layer views.
Data Model Documentation & ERD
To document database structure and support architectural transparency, I exported an Entity Relationship Diagram (ERD) illustrating schema separation (Staging → Core → Mart) and logical data flow.
This documentation supports maintainability, onboarding, and analytical traceability across transformation layers.
This project demonstrates the development of an end-to-end loan risk analytics pipeline using PostgreSQL and Power BI. Raw financial data was transformed into structured analytical insights through a layered architecture consisting of staging, core, and mart schemas. By implementing risk segmentation, composite risk scoring, and business-ready analytical views, the project enables stakeholders to monitor loan portfolio performance, identify high-risk borrower segments, and support data-driven credit decision making.