The Titanic dataset contains information about 418 passengers aboard the Titanic, including their demographic details, socio-economic status, and survival outcomes. It includes columns such as PassengerId, Survived (indicating whether the passenger survived or not), Pclass (passenger class), Name, Sex, Age, SibSp (siblings/spouses aboard), Parch (parents/children aboard), Ticket, Fare, Cabin, and Embarked (boarding port). The dataset is often used for analysis of survival rates based on factors like class, gender, age, and fare, revealing insights into how societal and economic factors influenced the chances of survival during the Titanic disaster.
Original Dataset :- Titanic dataset
Project Overview:
This project focuses on analyzing the Titanic disaster dataset from Kaggle to uncover critical insights into the survival rates of passengers based on factors like class, gender, age, and fare. The dataset offers a comprehensive view of the passengers aboard the Titanic, including their socio-economic status, family size, and survival outcome. The goal of this project is to collect data from an interesting source, populate a Google Spreadsheet, and perform statistical analysis using Python and Pandas.
Goal:
The goal of this activity is to analyze a dataset (Titanic dataset) consisting of at least 100 rows and 5 columns, and to generate meaningful insights from this data. By doing so, we aim to understand how various factors contributed to passenger survival during the Titanic disaster.
Dataset Description:
The Titanic dataset contains information about 418 passengers, with 12 columns, detailing their demographics, socio-economic status, ticket information, and survival outcome.
Columns and Their Descriptions:
PassengerId: A unique identifier for each passenger (acts as an index).
Survived: A binary column indicating survival status (1 = survived, 0 = did not survive).
Pclass: The class of the ticket purchased (1st, 2nd, or 3rd class).
Name: The full name of the passenger, including titles such as Mr., Mrs., etc.
Sex: The gender of the passenger (male or female).
Age: The age of the passenger. Missing values indicate unknown age.
SibSp: Number of siblings or spouses aboard the Titanic.
Parch: Number of parents or children aboard the Titanic.
Ticket: The ticket number of the passenger.
Fare: The amount paid for the ticket, which varies with passenger class.
Cabin: The cabin number assigned to the passenger. Many values are missing.
Embarked: The port where the passenger boarded the ship (C = Cherbourg, Q = Queenstown, S = Southampton).
Observations from Statistical Visualizations:
Survival Distribution: A pie chart showing the survival rate of passengers reveals a significant portion did not survive. It’s clear that class, gender, and age played a role in survival chances.
Gender and Survival: A stacked bar chart highlights that women and children had a significantly higher chance of survival, aligning with the societal norms of the time.
Class Distribution: A pie chart depicting the distribution of passengers by class reveals that most passengers were in third class, with a much smaller percentage in first class.
Age and Fare Distribution: A scatterplot reveals that older passengers paid higher fares, often corresponding to first-class tickets, which also had a higher survival rate.
Correlation Analysis: A heatmap demonstrates how Passenger Class and Fare are correlated, and the relatively weak correlation between Age and Survival.
Conclusions:
The analysis of the Titanic dataset uncovers several interesting trends about the social and economic factors influencing survival during the disaster. The dataset provides an insight into how class, gender, family status, and embarkation port affected a passenger's chances of survival.
These findings reflect the broader societal dynamics of the time, where wealth, gender, and family composition had a substantial impact on the outcomes of the tragedy.
Key Learnings:
Data Cleaning and Preparation: Handling missing data (e.g., missing age values) and transforming categorical variables (like gender and embarkation) into useful insights.
Exploratory Data Analysis (EDA): Understanding the importance of visualizing data to uncover patterns and trends that might not be immediately obvious from raw data.
Statistical Significance: Recognizing that survival was influenced by factors such as class and gender, which played a critical role in the disaster's aftermath.
This project illustrates the importance of statistics in drawing meaningful conclusions from data and how it can be applied to real-world scenarios like the Titanic disaster.
Survival by Class and Gender:
First-class passengers had a significantly higher survival rate compared to second- and third-class passengers. This pattern highlights how social and economic status influenced the likelihood of survival during the Titanic disaster.
Women and children had a notably higher survival rate, reflecting the societal norms of prioritizing women and children during life-saving efforts, as evidenced by the gender-based survival analysis.
Impact of Family Size on Survival:
Passengers traveling with larger families (high SibSp and Parch values) generally had lower survival rates compared to those traveling alone or with just one or two family members. The complexity of managing larger groups might have decreased their chances during the chaotic evacuation.
Embarkation Port and Class Distribution:
Passengers who boarded at Cherbourg (C) were more likely to be in 1st class, while passengers who boarded at Queenstown (Q) were more likely to be in 3rd class. This geographic pattern suggests a demographic difference based on the embarkation port, which correlates with the social stratification of passengers.
This project involves analyzing the Titanic dataset to uncover insights into survival rates based on factors such as class, gender, age, fare, and family size. The dataset comprises information about 418 passengers, including socio-economic details and survival outcomes. Key findings reveal that first-class passengers and women and children had significantly higher survival rates, reflecting societal norms and economic disparities of the time. Larger family sizes negatively impacted survival chances, and embarkation ports showed distinct class distributions, with Cherbourg passengers often in first class and Queenstown passengers in third class. The project emphasizes data cleaning, exploratory data analysis, and the importance of visualizations in deriving meaningful conclusions.