Welcome to my portfolio site! I’m Chioma
Scrape and analyze data from a live Wikipedia table that lists the largest U.S. companies by revenue; the goal was to automate data extraction and prepare it for analysis and visualization.
Python (Requests, BeautifulSoup, Pandas)
Jupyter Notebook
Excel/CSV
Located the correct HTML table on the Wikipedia page using browser tools and inspection;
Wrote Python code that uses requests to fetch the page, BeautifulSoup to parse the HTML, and pandas to read the table;
Extracted and cleaned the dataset, converting it to a structured format and exporting it as a CSV;
Opened the file in Excel for preview, formatting, and additional presentation polish.
csv file.
Screenshots showing the target website and table
Screenshots of the Jupyter Notebook code
Screenshots of the cleaned and formatted Excel table
This project not only reinforced my foundational skills in web scraping and real-world data manipulation but also deepened my appreciation for structured data workflows; I remain open to exploring more hands-on opportunities that will allow me to grow my proficiency in Python, strengthen my data cleaning techniques, and build even more impactful projects as I continue my journey in data analysis.
About this project
This project explores the landscape of data professionals around the world. The goal was to understand the distribution of job roles, average salaries, favorite programming languages, and participants' satisfaction with work-life balance and salary. Key questions included:
What are the most common job titles and their average salaries?
Which countries are represented most?
What programming languages are preferred?
How difficult do people find it to break into the data field?
How satisfied are professionals with their salary and work-life balance?
Source: Publicly available survey dataset, from Kaggle GitHub.
Power BI – For data cleaning, transformation, and dashboard creation.
Data Cleaning:
Removed null or irrelevant entries.
Standardized country names and job titles.
Converted salary and age columns into numerical format.
Data Transformation:
Grouped data by job titles, countries, and programming languages.
Created new calculated fields for happiness scores and difficulty ratings.
Visualization:
Used Power BI to create:
Tree map for participant distribution by country.
Bar chart for average salary by job title.
Stacked bar for favorite programming languages per job title.
Donut chart for entry difficulty.
Gauge charts for happiness ratings.
Cards for count of participant and average age of participant.
Country Representation: The United States and 'Other' regions had the highest number of participants.
Salary Insights: Data Scientists earn the highest average salary, followed by Data Engineers and Architects.
Programming Languages: Python was overwhelmingly the most preferred programming language across all roles.
Entry Difficulty: Most respondents (42.7%) found it “Neither easy nor difficult” to break into the data field, while a small group found it “Very Difficult”.
Happiness Ratings:
Work-Life Balance: 5.74/10
Salary: 4.27/10
Interactive dashboard in Power BI with:
Tree map, bar charts, stacked bar chart, donut chart, and gauge charts.
Clean layout with filters and legends for easy understanding.
Conclusion/Recommendations
The data field is competitive but rewarding, with high salaries for roles like Data Scientist and Data Engineer.
There is room for improvement in employee satisfaction, especially around salary.
Python remains a vital skill to master for aspiring data professionals.
Organizations should invest in creating smoother entry paths and better compensation structures to retain talent.
This project analyzes Airbnb data to uncover pricing trends, listing diversity, and regional performance using Excel and Tableau. I created interactive dashboards highlighting key metrics like average price per bedroom and revenue trends. It sharpened my data visualization skills and showcased how insights can guide pricing and investment decisions.
This project explores trends in pricing, regional performance, and listing variety. Using Excel for data preparation and Tableau for analysis, I built interactive dashboards that provide clear insights for hosts, renters, and stakeholders. The project demonstrates my ability to extract meaningful insights from raw data and present them visually to support decision-making.
How can we understand Airbnb pricing trends, regional performance, and listing diversity to help hosts, renters, and stakeholders make more informed decisions?
Data Cleaning:
Removed unnecessary columns and rows to meet Tableau Public’s data size requirement
Ensured consistency in date formats and location names
Data Filtering:
Filtered for the most relevant variables including price, zip code, number of bedrooms, and dates
Analysis & Visualization:
Designed visualizations to highlight pricing patterns, revenue changes over time, and bedroom type distribution
Added dynamic filters for better user interaction
Customized axis labels, tooltips, and color themes to enhance clarity
Objectives:
Explore pricing patterns across different bedroom counts
Compare listing prices by zip code
Analyze revenue trends over time
Determine the diversity of bedroom listings on the platform
Average Price per Bedroom
Price Distribution by Zip Code
Total Revenue by Year
Distinct Count of Bedroom Listings
Zip codes with fewer listings often had higher average prices
One and two-bedroom listings were the most common
Revenue patterns suggested seasonality or location-based demand
Some regions showed consistent pricing while others varied significantly
Hosts in high-demand zip codes could consider adjusting their pricing strategy based on nearby competition
Investors can focus on 1–2 bedroom properties, which dominate the listing pool
Future studies could incorporate booking frequency or guest reviews for deeper insight
This project aimed to explore customer purchasing behavior, specifically whether individuals purchased a bicycle based on various demographic and regional variables. The original dataset contained raw, unstructured information, which required careful cleaning and transformation before meaningful insights could be drawn.
What factors influence whether a customer purchases a bike? Can patterns be identified based on demographic and regional data to better understand consumer behavior?
The dataset used for this project was sourced from GitHub, shared publicly by Alex Freberg (Alex The Analyst). It includes information such as age, income, marital status, education, region, and commute distance, along with a binary indication of whether the individual purchased a bike.
Here’s what I did:
✅ Cleaned and prepared the dataset by removing null values and fixing inconsistencies.
✅ Organized and structured the data to make it ready for analysis in Excel.
✅ Used Pivot Tables to analyze patterns based on Region, Marital Status, Income, Education, and Commute Distance.
✅ Added Slicers for interactive filtering, allowing users to quickly explore trends by demographic segments.
✅ Designed a clean and intuitive Excel Dashboard featuring Bar Charts and Line Graphs to highlight patterns and trends.
✅ Calculated and displayed the percentage of bike buyers vs. non-buyers, offering a quick snapshot of overall behavior.
48.1% of the surveyed customers purchased a bicycle, while 51.9% did not. This split reveals interesting behavioral trends when segmented by region, marital status, and commute distance.
For example:
Married individuals were slightly more likely to purchase a bike.
Certain regions showed a higher inclination toward bike purchases, possibly due to infrastructure or commuting culture.
Income level and commute distance also played roles in influencing buyer decisions.
Visuals Created
Interactive Excel Dashboard
Bar Chart representing average Income vs. gender filtered by purchase status.
Line Graphs showing trends across count of purchase and commute distance and also across count of purchase and age bracket.
Slicers for real-time filtering by Region and Marital Status.
KPI Cards (Key Performance Indicators) to highlight buyer percentages.
Learning Experience
This project was completed as part of Alex Freberg’s (Alex The Analyst) YouTube Bootcamp, a beginner-friendly and insightful resource that helped me understand the fundamentals of data cleaning, dashboard creation, and storytelling through data.
This analysis provides valuable insights into consumer behavior and highlights key demographic factors that influence bike purchasing decisions. Businesses in the cycling or sporting goods industry could use these findings to tailor marketing campaigns and product offerings to high-potential customer segments.
This Excel dashboard project helped me strengthen my foundational data skills and taught me how to extract meaningful insights from everyday business scenarios. It’s a great example of how Excel remains a powerful tool for entry-level data analysis.
The project was divided into two major phases: Data Cleaning and Exploratory Data Analysis (EDA).
In this stage, I prepared the dataset for analysis by performing the following key steps:
Duplicate Removal: Used ROW_NUMBER() with Common Table Expressions (CTEs) to identify and remove duplicate records based on fields like company, industry, and date.
Standardization: Ensured consistency in text-based fields (e.g., Company, Industry) to avoid fragmentation during grouping.
Null & Blank Values: Identified and addressed nulls across critical columns, deciding when to retain or exclude them for clean analysis.
Schema Refinement: Created a staging table (layoffs_staging2) that served as a clean and structured base for the analysis.
With the cleaned data, I explored trends, patterns, and key insights using SQL techniques such as grouping, filtering, window functions, and ranking.
Key insights include:
100% Layoffs: Discovered companies that laid off their entire workforce — mostly early-stage startups.
Total Layoffs by Category: Aggregated layoffs by year, industry, country, and location to understand the broader impact.
Company Rankings: Used DENSE_RANK() to identify the top three companies with the highest layoffs per year.
Rolling Trends: Implemented a rolling total using window functions to visualize layoffs over time (by month).
This project strengthened my understanding of SQL — from cleaning raw datasets to deriving actionable insights. It highlighted how clean data and the right queries can uncover valuable stories behind numbers.
Tools Used: Project Title: Course Enrollment Query Database Name: student_course_db Tool Used: MySQL Workbench Project Type: SQL Practice & Relational Database Design
I created the following tables:
students — stores student information (e.g., name, email)
courses — stores course details (e.g., course name, course code)
enrollments — a junction table linking students and courses with enrollment dates
I wrote a query that combines the courses and enrollments tables using a LEFT JOIN, then groups the results by course to count how many students are enrolled in each one.
To summarize the number of students in each course
To practice how to use SQL joins and grouping
To demonstrate the ability to extract insights from relational databases
Insights:
I learned how to join multiple tables in SQL to pull related data together
Using aggregate functions like COUNT() and GROUP BY helped me understand how data summaries are built
I saw firsthand how missing or low enrollment in courses can be highlighted using LEFT JOIN
Key Learnings:
How to create queries that return useful reports from databases
Understanding relationships between tables (e.g., one course to many enrollments)
Practicing how to group and count data to reveal trends
Final Note:
This was one of my favorite beginner SQL tasks because it gave me a hands-on way to explore real-world data relationships and practice reporting, which is a critical skill for aspiring data analysts like me.
This project is a beginner-level SQL database design and implementation inspired by Giraffe Academy’s SQL tutorial on freeCodeCamp. It simulates a company system with multiple relational tables:
Tables created: employee, branch, client, work_with, branch_supplier
Key concepts practiced: data types, primary and foreign keys, constraints, and INSERT operations
Challenges faced: fixing syntax and constraint errors, resolving foreign key conflicts
Outcome: Successfully created, connected, and queried all tables. This hands-on project helped solidify my understanding of how real-world databases are structured.
This is part of my ongoing journey in learning SQL and preparing for a future in data analytics.
Background
This analysis spans three months of historical stock market data for a wide range of publicly traded companies. We’ll look into key patterns across price movements, volatility, and trading volume—essential factors that can reveal underlying market dynamics, point to investment opportunities, and inform us about potential future price trends.
Goals
Price Trend Analysis: Track the price movement of various stocks over the three-month period by analyzing open, high, low, and close prices.
Volatility Measurement: Measure the volatility of each stock by evaluating the daily range between high and low prices throughout the period.
Volume Analysis: Assess trading volumes to gauge market interest in each stock.
Pattern Identification: Discover recurring trends, such as consistent upward or downward movements, to inform predictions.
Stock Comparison: Compare the performance of different stock symbols to identify which companies excelled over the three months.
Key Insights
The analysis uncovered several noteworthy insights:
Total Stock Trades: Over the three months, we analyzed 29,440 trades across all stocks. This broad scope provides a solid base for recognizing trends and making strategic decisions.
Top-Traded Stock: The stock with the highest trade volume during this period was Citigroup Inc., with a total of 359,141,957 shares traded. This massive trade volume reflects high market interest and potential liquidity, making Citigroup an important stock to monitor for future price movements and market sentiment.
The Alpha Stock: Our analysis identifies Cleveland-Cliffs Inc. as the "Alpha stock" of this period. This means that Cleveland-Cliffs showed standout performance based on criteria such as consistent returns, strong growth trends, or other favorable indicators that set it apart from the rest. This stock’s strength suggests it could offer solid investment potential moving forward.
Recommendations
Based on these findings, I suggest a few strategies to consider:
Focus on Trending Stocks: Stocks with consistent upward trends over the three months may continue to offer growth opportunities.
Risk Management: For stocks with high volatility, consider risk management techniques such as diversification, pairing them with more stable assets.
Monitor High-Volume Stocks: Stocks with significant volume, such as Citigroup Inc., often precede notable price movements, providing trading opportunities.
Portfolio Diversification: Including stocks like Cleveland-Cliffs Inc., which have demonstrated reliable performance, alongside other less-correlated stocks, can strengthen portfolio resilience.