Python is a versatile and user-friendly programming language that has gained immense popularity due to its simplicity, readability, and wide range of applications. It's often referred to as a "general-purpose" language because it can be used for various tasks, from web development and data science to machine learning and automation.
Data Creation:
Creates a Pandas DataFrame, a two-dimensional labelled data structure, with columns "Name" "Age" and "Sex"
import pandas as pd
df = pd.DataFrame({
"Name": ["Braund, Mr. Owen Harris","Allen, Mr. William Henry","Bonnell, Miss. Elizabeth"],
"Age": [22, 35, 58],
"Sex": ["male", "male", "female"]
})
df
Column Access:
Accesses the "Age" column from the DataFrame df. Returns a Pandas Series object containing the age values for each row.
df["Age"]
Create a Series:
Creates a Pandas Series named "Age" containing the values 22, 35, and 58.
ages = pd.Series([12, 55, 87], name="Age")
ages
Load the Data:
Loads the "train.csv" file into a pandas DataFrame named titanic and displays its first 5 rows.
titanic = pd.read_csv("train.csv")
titanic.head()
dtypes
dtypes in Python, specifically within the context of data analysis using the Pandas library, provides a concise overview of the data types present in each column of the titanic DataFrame.
titanic.dtype
info()
The code titanic.info() in Python, assuming titanic is a Pandas DataFrame, provides a detailed summary of the data it holds.
1. It displays information like the total number of rows and columns in the DataFrame.
2. It reveals the data type of each column (e.g., integer, float, string).
3. It highlights any missing values present within the data, allowing you to identify potential areas for data cleaning.
titanic.info()
# Dropping Columns in the Dataframe
dfx = df.copy('Deep')
dfx = dfx.drop(['PassengerId','Ticket','Name'],axis = 1)
dfx.head()
# Replacing Values/Names in a Column:
df1 = dfx.copy('Deep')
df1["Survived"].replace({0:"Died" , 1:"Saved"},inplace = True)
df1.head(3)
# Drop Rows
df1 = dfx.drop(labels=[1,3,5,7],axis=0)
df1.head()
df.columns.tolist()
# Missing Value check Method 1
print('Method 1:')
df.isnull().sum()/len(df)*100
# Missing Value check Method 2
print('Method 2:')
var1 = [col for col in df.columns if df[col].isnull().sum() != 0]
print(df[var1].isnull().sum())
# Missing Value check 3
print('Method 3:')
import missingno as msno
msno.matrix(df)
plt.show()
# Find The Null Rows in a Particular Featues
df[df['Embarked'].isnull()]
# Find Rows with missing Values
sample_incomplete_rows =df[df.isnull().any(axis=1)].head()
sample_incomplete_rows
# Describe dataset
df.describe()
# Aggregate Function
df[['Age','Fare','Pclass']].agg(['sum','max','mean','std','skew','kurt'])
# value_counts
df['Embarked'].value_counts().to_frame()
# Count
df[['Age','Embarked','Sex']].count()
#Shuffling the data
df2 = df.sample(frac=1,random_state=3)
df2.head()
# Correlation of Data
import seaborn as sns
corr = df.select_dtypes('number').corr()
display(corr)
sns.heatmap(corr, annot=True, cmap='viridis')
plt.xlabel('Features')
plt.ylabel('Features')
plt.title('Correlation Heatmap')
plt.show()
# Find all Notna Columns
df1 = df1[df1['Cabin'].notna()]
df1.head()
# Dropna Method
df1 = df.dropna()
df1.head()
# Fillna (ffill) Method
df1.fillna(method="ffill", inplace=True)
df1.head()
# Fill Null Values by Mean Value
df1["Age"] = df1["Age"].fillna(df1["Age"].mean())
df1.head()
# Fill Null Values by Desired Value
df1['Embarked'] = df1['Embarked'].fillna(df1['Embarked'] == 'Q')
df1.head(5)
# Find All Null Values in the dataframe
df1 = df1.drop('Cabin',axis =1)
sample_incomplete_rows = df1[df1.isnull().any(axis=1)]
display(sample_incomplete_rows.shape)
sample_incomplete_rows.head()
# Find Selected Values
titanic_Fare500 = df1[df1['Fare'] > 500][['Name','Embarked']]
display(titanic_Fare500.shape)
titanic_Fare500
# Find Value in a Range
titanic_age_selection = df[(df["Sex"] == "male") & (df["Age"] > 50.00)]
display(titanic_age_selection.shape)
titanic_age_selection.head()
# Select particular value in a feature
titanic_Pclass = df1[df1["Pclass"].isin([1, 2])]
display(titanic_Pclass.shape)
titanic_Pclass.head()
# Select with Multiple Conditions
titanic_Pclass = df1[(df1["Pclass"] == 1) & (df1["Sex"] == 'female') & (df1["Age"] > 50 ) ]
display(titanic_Pclass.shape)
titanic_Pclass.head()
# Sort_Values
df1 = df.copy()
df1.sort_values(by = 'Age' , ascending = False)[['Name','Ticket','Survived','Pclass', 'Age' ]].head()