Analysis Methodology:
Data Collection and Preparation:
Sourced the dataset from Kaggle, a reputable platform for datasets and data science projects.
Utilized Python libraries such as pandas, seaborn, and matplotlib for data manipulation and visualization.
Conducted meticulous data cleaning procedures to ensure data integrity and reliability of analysis, including:
Handling missing values through imputation or removal.
Identifying and eliminating duplicate records to prevent redundancy.
Standardizing the data by removing unnecessary columns, adjusting data types, and formatting values for consistency.
Exploratory Data Analysis (EDA):
Analyzed trends in game sales and releases over time, identifying patterns and fluctuations in the gaming industry.
Explored global sales across various attributes such as Genre, Platform, Region, Publisher, and Developers to discern market dynamics and preferences.
Investigated the market share of top Publishers through analysis of their global sales trends over the years.
Examined the distribution of user and critic scores, calculating average scores over time, and exploring correlations between them to gauge the perception of games among users and critics.
Conducted region-wise analysis of sales for different attributes like Genre and Platform, unveiling customer preferences and regional market trends.
Checking for Missing Values :
vg.isnull().sum(): This code snippet allows us to identify and quantify missing values in our dataset, providing insight into potential data quality issues that need to be addressed.
Identifying Rows with Missing Values :
vg[vg.isnull().any(axis=1)]: By filtering the dataset to identify rows with missing values, we can investigate and address instances of incomplete data, ensuring the integrity of our analysis.
Removing Rows with Missing Values:
vg.dropna(): Removing rows with missing values ensures that our dataset is clean and suitable for analysis, minimizing the risk of biases or inaccuracies in our results.
Setting Index:
vg = vg.set_index('index'): Setting a meaningful index enhances the organization and accessibility of our dataset, facilitating smoother data retrieval and analysis.
Converting Data Types:
vg['User_Score'] = vg['User_Score'].astype('float64'): This code ensures consistency in our dataset by converting the 'User_Score' column to a numerical format. It's essential for enabling accurate numerical operations and analysis.
Date Manipulation:
vg['Year_of_Release'] = pd.to_datetime(vg['Year_of_Release'], format='%Y'): By converting the 'Year_of_Release' column to datetime format, we standardize date representation, facilitating time-based analysis and ensuring data uniformity. Formatting is done to avoid repetition of same date value.
vg['Year_of_Release'] = vg['Year_of_Release'].dt.strftime('%Y'): This code further refines the 'Year_of_Release' column, displaying only the year component for improved readability and analysis.
Data Filtering:
Filtering rows based on specific years of interest: vg1[vg1['Year_of_Release'].isin(['1996','1994','1985','1992','1988'])]:
By filtering the dataset, we focus our analysis on specific years, enabling deeper insights into trends and patterns within our data.
Identifying and Removing Duplicate Rows:
Identifying rows with duplicates vg1.duplicated().sum(): Detecting duplicate rows ensures data accuracy and reliability. It allows us to identify and address redundant entries, maintaining the integrity of our analysis.
Removing duplicate rows vg1 = vg1.drop_duplicates(): Removing duplicate rows: vg1 = vg1.drop_duplicates(): By eliminating duplicate records, we ensure that each entry in our dataset is unique. This step is crucial for preventing biases and inaccuracies in our analysis results.
Here i visualize the number of game releases and global sales using a line plot. The orange line represents the trend in the number of game releases each year, while the green line depicts the trend in global sales. The plot helps identify any correlations or patterns between game releases and sales over time, providing valuable insights into the gaming industry's dynamics.
Code -
Here i calculate the total global sales for each genre and visualize the results using a horizontal bar plot. The length of each bar represents the total sales in millions, while the genre categories are displayed on the y-axis. This visualization helps identify the most popular genres in terms of global sales, providing insights into consumer preferences within the gaming market.
Code -
Here i conduct an analysis of global sales trends by platform in the gaming industry. This code calculates the total global sales for each gaming platform and visualizes the results using a horizontal bar plot. The length of each bar represents the total sales in millions, while the platform categories are displayed on the y-axis. This visualization provides insights into the popularity and performance of different gaming platforms in terms of global sales.
Code -
This code analyzes global sales trends over the years for specific gaming platforms, including PC, PS4, and Xbox (XB). It filters the dataset to include only the selected platforms and then visualizes the sales data using a line plot. Each line represents a platform, with markers indicating sales data points for each year. This visualization enables the comparison of sales trends across different platforms over time, offering insights into platform-specific market dynamics.
Code -
This code analyzes regional sales trends across different regions over time in the gaming industry. It calculates the total sales for each region, including North America, Europe, Japan, and the rest of the world, and visualizes the results using a horizontal bar plot. The length of each bar represents the total sales in millions, while the region categories are displayed on the y-axis. This visualization provides insights into the distribution of gaming sales across various regions, highlighting key markets in terms of total sales.
Code -
This code analyzes the market share trends of top publishers in the gaming industry over time. It groups the dataset by the year of release and publisher, calculates the total sales for each combination, and then computes the market share of each publisher relative to total sales for each year. The analysis focuses on the top 10 publishers based on total sales. Finally, it visualizes the market share trends over time using a line plot, with each line representing a publisher's market share, allowing for insights into the competitive landscape and publishers' performance over the years.
Code -
This code conducts an analysis of publisher sales trends in the gaming industry by identifying the top 20 publishers based on global sales. It calculates the total global sales for each publisher, selects the top 20 publishers with the highest sales, and visualizes the results using a horizontal bar plot. The length of each bar represents the total sales in millions, while the publisher names are displayed on the y-axis. This visualization offers insights into the performance of publishers in terms of global sales, highlighting key players in the gaming market
Code -
This code helps to visualize the sales trends of video games over time in different regions (North America, Europe, Japan, and other regions) and identify patterns and fluctuations in different markets.
Code -
This code dentify and visualize the most profitable video game genres based on total global sales.
Code -
This code helps to track and visualize the genre with the highest global sales each year, showing trends and shifts in genre popularity over time.
Code -
This code visualize the sales distribution of video games by platform in various regions (North America, Europe, Japan, and the Rest of the World), highlighting the most popular platforms in each region.
Code - ** For Europe, Japan & rest of the world, code is same with Y-axis column replaced accordingly.**
This code helps to understand the distribution of games across different genres, identify the most prolific publishers, and determine the most active developers in the gaming industry.
Number of Games Released by Genre, Publisher , and Developer.
Code - **For Publisher and Developer, code is same with x-axis column replaced accordingly.**
Below codes visualizes the trend of video game sales over time across different regions.
Code -
Below code segment visualizes the frequency distribution of user scores and critic scores for video games.
Code -
Below codes used to calculate and visualize the average user and critic scores over the years, providing insights into the trends and changes in game evaluations over time. Also, check any correlation between them.
We normalize the 'User_Score' values to match the scale of 'Critic_Score' for better comparison and correlation analysis via -
vg1['Normalized_User_Score'] = vg1['User_Score'] * 10
Scatter Plot is used to visualize the relationship between Critic_Score and Normalized_User_Score, providing insights into the correlation between user and critic evaluations.
code -
Distribution of Games by Genre:
Action and Sports genres dominate the market with the highest number of game releases.
Adventure and Puzzle genres have the lowest number of game releases, indicating niche markets.
Top Publishers in the Gaming Industry:
EA, Nintendo, and Ubisoft are the most prolific publishers, releasing the highest number of games.
The top 20 publishers significantly contribute to the majority of game releases.
Leading Game Developers:
Activision and Electronic Arts (EA) are the most active developers, indicating their strong presence in game development.
The top 20 developers account for a substantial portion of game releases.
Sales Trends Over Time by Region:
North America and Europe show a consistent sales trend over the years, with North America leading in sales.
Japan has fluctuating sales trends, often peaking in certain years.
Global Sales by Genre:
Action, Shooter, and Sports genres generate the highest global sales, indicating their popularity and profitability.
Less common genres like Puzzle and Adventure have lower global sales.
Top-Selling Genre Each Year:
The top-selling genre varies each year, reflecting changing player preferences and market trends.
Action and Shooter genres frequently appear as top sellers.
Regional Sales by Platform:
Different platforms dominate sales in different regions. For example, Nintendo platforms perform exceptionally well in Japan.
Sony PlayStation and Microsoft Xbox platforms have strong sales in North America and Europe.
Frequency Distribution of User and Critic Scores:
Both user and critic scores follow a normal distribution, with most scores clustering around the mid-range.
There are slight differences in score distribution, indicating variations in user and critic evaluations.
Normalized Scores for Comparison:
User scores are normalized to match the scale of critic scores, facilitating direct comparison and analysis.
Correlation Between User and Critic Scores:
A scatter plot shows a moderate positive correlation between user and critic scores, indicating that higher critic scores generally align with higher user scores.
Average Scores Over Time:
Mean user and critic scores over the years show slight variations, reflecting changing standards and expectations in game quality.
Our analysis unveils significant shifts in video game sales from 2000 to 2008, reflecting changing consumer preferences and market dynamics.
Examination of genre dominance, top publishers' market shares, and platform preferences provides valuable insights into their influence on sales and critical/user ratings.
Regional variations across North America, Europe, Japan, and other regions highlight diverse gaming cultures and market landscapes, offering insights into sales and popularity disparities.
While not explicitly stated, our analysis likely assessed the correlation between critic/user ratings and sales, enriching our understanding of critical acclaim's impact on commercial success.
Our data-driven insights equip stakeholders with actionable recommendations to optimize game development strategies and capitalize on sales potential.
This project underscores the importance of data-driven decision-making in navigating the gaming industry's evolving landscape effectively.
** For summary doc - Click here **