This project involves analyzing Airbnb listings in Paris to uncover trends and insights. The analysis includes profiling and cleaning the data, calculating key metrics, preparing the data for visualization, and creating informative visualizations. The project aims to identify the most expensive neighborhoods, understand accommodation pricing patterns, and observe the impact of regulations on the Airbnb market over time.
The project is divided into three main objectives, each with specific tasks aimed at understanding and visualizing Airbnb listings data for Paris:
The first objective focuses on reading the Airbnb listings data, converting necessary columns to appropriate data types, filtering the data to include only Paris listings, and performing basic quality assurance (QA). The tasks include:
Importing and opening the Listings.csv file.
Casting date columns to a datetime format.
Filtering the data to include only listings in Paris and selecting relevant columns: host_since, neighbourhood, city, accommodates, and price.
Checking for missing values and calculating basic profiling metrics such as minimum, maximum, and average values for numeric fields.
The second objective involves preparing the data for visualization by aggregating and manipulating the listings data. The tasks include:
Creating a table grouped by neighbourhood that calculates the mean price of listings and sorting it from low to high.
Filtering data for the most expensive neighborhood and creating a table grouped by the accommodates column to calculate the mean price for each accommodation size.
Grouping data by the host_since year to calculate the average price and count the number of new hosts, forming a table called paris_listings_over_time.
The final objective is to build visualizations that provide insights into the Airbnb market in Paris. The tasks include:
Creating a horizontal bar chart of the average price by neighborhood in Paris.
Creating a horizontal bar chart of the average price by accommodates in Paris' most expensive neighborhood.
Creating two line charts: one showing the count of new hosts over time and another showing the average price over time.
Based on the findings, providing insights about the impact of the 2015 regulations on new hosts and prices.
Creating a dual-axis line chart to show both new hosts and average price over time as a bonus task.
The final step involves answering a key project question: "Which neighborhood in Paris has the highest average Airbnb listing price?" Based on the analysis, we identify the neighborhood with the highest average price, offering valuable insights for potential Airbnb hosts and renters.
In this project, I analyze simulated social media tweet data to uncover insights into user engagement and category popularity. I focus on data extraction, cleaning, analysis, and visualization, ultimately helping a marketing firm refine its social media strategies.
As a data analyst at a marketing firm dedicated to social media brand promotion, I utilize Python to extract, clean, and analyze tweets across various categories—health, family, and food. By generating visualizations from this data, I aim to provide valuable insights to enhance our clients' social media performance. This approach enables the firm to deliver timely tweets within budget, fostering faster and more efficient results.
Import Required Libraries: I will import essential libraries for data manipulation, numerical computations, and visualization.
Generate Random Data: I'll create a simulated dataset of tweets, incorporating random dates, categories, and numbers of likes.
Clean and Analyze Data: I will focus on processing the data to ensure its integrity and preparing it for in-depth analysis.
Visualize Data: I will generate visualizations to explore trends and patterns in user engagement across different categories.
Draw Conclusions: Based on the analysis, I will provide actionable insights that help optimize social media strategies.
Results
Through my analysis of the simulated tweet data, I discovered several key insights:
Topics such as "Food" and "Travel" garnered significantly higher average likes, suggesting these categories engage users more effectively.
User Engagement Trends: My visualizations revealed distinct trends in user engagement at different times of the year, identifying peak periods for various categories.
Actionable Recommendations: The data analysis supports the development of targeted strategies focusing on high-engagement categories, optimizing posting schedules, and tailoring content to align with audience preferences.
https://github.com/willohblack/Social_Media-Data_Analysis.git
This project aimed to analyze a dataset containing movie ratings, genres, and user ratings. By cleaning the data, performing fundamental statistical analysis, and visualizing key insights, I uncovered trends in movie ratings across different genres. The project uses the MovieLens dataset (small version), which contains thousands of user ratings and detailed movie information.
Throughout the project, I utilized several vital skills necessary for data analysis:
Reading CSV files using Pandas: Efficiently loading and managing structured data.
Data Cleaning: Handling missing values and removing duplicates for more accurate analysis.
Basic Statistical Analysis: Calculating averages, counts, and other statistics.
Data Visualization using Matplotlib: Creating compelling visual insights like bar charts and histograms.
Python Programming: Applying core Python concepts such as functions and loops for data processing.
The project is well-organized to maintain clarity and modularity, ensuring each component is easily understandable and reusable:
bash
movie_ratings_analysis/
│
├── data/ # Folder for raw and cleaned data
│ └── movies.csv # Dataset file
│
├── notebooks/ # Jupyter notebooks (optional)
│ └── analysis.ipynb # Exploratory analysis notebook
│
├── src/ # Source code folder
│ ├── __init__.py
│ ├── data_loader.py # Load and clean data
│ ├── analysis.py # Perform analysis
│ └── visualize.py # Generate visualizations
│
├── tests/ # Unit tests
│ └── test_data_loader.py # Tests for data loader functions
│
├── requirements.txt # Dependencies list
└── README.md # Project documentation
Python Version: I used Python 3.10+ for this project, ensuring compatibility with the latest language features.
Libraries:
Pandas: For data manipulation and analysis.
Matplotlib: For visualizing insights from the data.
pytest: This is used to test the functionality of core components.
I managed the project's dependencies through a requirements.txt file, ensuring easy setup. To install the necessary dependencies, run:
bash: pip install -r requirements.txt
I wrote unit tests using pytest to ensure that the core functions, like loading data and cleaning it, work as expected. Testing plays a crucial role in ensuring the robustness of the project.
The project could be extended by adding more complex insights, such as:
Find the most common genres or analyze trends in ratings over time.
Enhancing the visualizations with more advanced charts such as pie charts or scatter plots.
Refining the README.md file to detail the challenges faced during the analysis, the solutions implemented, and key findings.
Bar Chart: Showing the average ratings by genre.
Histogram: Displaying the distribution of movie ratings.
This project demonstrates essential skills in Python programming, data manipulation, visualization, and testing best practices. Combining these techniques allowed me to gain actionable insights from the MovieLens dataset and create meaningful visualizations. This project is valuable to my portfolio and highlights my ability to handle real-world datasets.