This project analyzes Airbnb listings in New York City using data visualization and exploratory data analysis to understand pricing, availability, and review patterns.
The dataset used in this project contains information about Airbnb listings in New York City in 2019. The dataset is publicly available on Kaggle here. and includes information such as listing name, host information, neighborhood, room type, price, number of reviews, and availability throughout the year.I chose this dataset because it provides interesting insights into how pricing, reviews, and availability vary across neighborhoods and room types. The dataset was created by Denis Gomonov, a Data Engineer at Comcast
Each row in the dataset represents a single Airbnb listing. The dataset contains 48,895 rows and 16 columns, providing a large amount of information about short-term rental properties in New York City.
I selected this dataset because it provides interesting insights into the Airbnb market and allows analysis of how pricing and reviews vary across neighborhoods and room types.
Data Assessment
The dataset is a tabular DataFrame with 48,870 rows and 16 columns after cleaning (originally 48,895 rows before removing illogical entries). Each row represents a unique Airbnb listing.
The data is recorded at the individual listing level. Each row describes a single Airbnb property, including its ID, name, host information, location, room type, price, review metrics, and availability.
This dataset contains Airbnb listings in New York City, covering all five boroughs. It includes 48,870 listings after cleaning, with details about location (neighbourhood_group, neighbourhood), room type, price, minimum nights, number of reviews, review activity, host information, and yearly availability.
The last_review column ranges from 2011-03-28 to 2019-07-08, indicating that the dataset captures multiple years of Airbnb activity, with a strong focus on 2019.
The minimum_nights column shows the required length of stay for each listing, typically ranging from 1 to 365 nights after removing extreme outliers.
Initially, the dataset contained missing values in several columns:
name: 15 missing values → filled with "Unknown"
host_name: 21 missing values → filled with "Unknown"
last_review: 10,043 missing values → filled with "No reviews"
reviews_per_month: 10,043 missing values → filled with 0
Listings with price = 0 were also removed because they are not realistic for Airbnb rentals.
After these steps, the dataset contains no missing values, and all columns have valid entries. Listings without reviews are expected to have missing review information, so filling them with "No reviews" and 0 preserves meaningful interpretation.
The dataset represents Airbnb activity between March 2011 and July 2019. The last_review column shows the most recent review for each listing, indicating that the dataset reflects listing activity up to early July 2019.
Listings without reviews may represent new listings or listings that have not yet received reviews. Overall, the dataset provides a historical snapshot of Airbnb activity over time, with most listings and review activity concentrated closer to 2019.
Faithfulness of the Data:
The dataset provides a reasonable capture of Airbnb listings within NYC. Cleaning steps, such as removing listings with a price of $0 and minimum_nights exceeding 365, improved its faithfulness by eliminating entries that do not reflect realistic rental conditions.
During exploration, a small number of listings were found with very low prices (below $30), including a few entire apartments, and some listings had very high prices (up to $10,000). The low-price listings are unusually cheap for NYC and may represent data entry errors, promotions, or unusual rental arrangements, while the high-price listings likely reflect luxury accommodations. Both extremes are rare and were kept in the dataset but noted as potential outliers. Overall, the distributions of price, room types, neighborhood groups, and availability appear logical, suggesting the dataset generally reflects the reality of the Airbnb market in NYC for the covered period.
Before analysis, the dataset was cleaned to ensure it accurately represents realistic Airbnb listings.
Several columns contained missing values:
name and host_name were filled with "Unknown"
last_review was filled with "No reviews" for listings without reviews
reviews_per_month was filled with 0 for listings without review activity
Some entries were removed because they did not represent realistic rental conditions:
Listings with price = 0
Listings where minimum_nights exceeded 365 nights
The last_review column was converted from text format to datetime to allow proper date analysis.
After these cleaning steps, the dataset contains 48,870 listings, with no missing values and properly formatted data. The cleaned dataset is now suitable for visualization and analysis.
Single Variable Distribution Plots
The majority of listings are Entire home/apt, followed by Private room and Shared room. This shows that most Airbnb hosts in NYC rent out full apartments rather than shared spaces.
Most listings are in Manhattan and Brooklyn, while fewer are in Queens, Bronx, and Staten Island. This reflects the popularity and density of Airbnb activity in the city’s main boroughs.
The distribution of price is highly right-skewed, indicating that most listings are at lower price points, with a few very expensive outliers. The majority of listings fall below $200. This suggests that the market has a large number of affordable options and a smaller segment of luxury or high-end properties.
The 'minimum_nights' distribution is also heavily right-skewed, with a significant concentration of listings requiring a minimum stay of 1 to 5 nights. This implies that most hosts prefer short-term stays, likely catering to tourists or short business trips, while longer minimum stays are less common.
The 'number_of_reviews' column shows a distribution where a large number of listings have very few reviews (including many with 0 reviews). There's a long tail indicating some listings have received a very high number of reviews, suggesting high popularity or longer operational periods for those properties.
Multiple Variable Plots
The scatter plot shows a dense cluster of listings with low prices and low numbers of reviews, indicating a large number of typical, perhaps newer or less popular, rentals.
As the number of reviews increases, there doesn't appear to be a strong linear correlation with price. Both low-priced and high-priced listings can have many reviews, suggesting that review count is more an indicator of popularity or longevity rather than a direct driver of higher prices.
There are some high-priced listings with very few reviews, which could be luxury accommodations or new listings. Conversely, many affordable listings have accumulated a large number of reviews, implying good value or high demand.
This bar chart shows the average price of Airbnb listings across the five boroughs (neighbourhood_group) for each room type (Entire home/apt, Private room, Shared room). From the plot, we can see that entire homes/apartments are generally more expensive than private or shared rooms, and that Manhattan consistently has the highest average prices across all room types.
Conclusion
This project explored Airbnb listings in New York City using the AB_NYC_2019 dataset. After cleaning the data and removing illogical values, several patterns became clear. Most listings are located in Manhattan and Brooklyn, and the majority of properties are entire homes or apartments. Prices vary widely, but most listings fall within a moderate price range, while a small number of luxury listings create higher outliers.
The visualizations also show relationships between price, reviews, room type, and location. For example, entire homes tend to be more expensive than private or shared rooms, and listings in Manhattan generally have higher prices than other boroughs.
Overall, the dataset provides a useful snapshot of the Airbnb market in New York City around 2019 and highlights how location, room type, and demand influence listing prices.
GitHub Repository Link: