Python Pandas

Python Language

Python is a versatile and user-friendly programming language that has gained immense popularity due to its simplicity, readability, and wide range of applications. It's often referred to as a "general-purpose" language because it can be used for various tasks, from web development and data science to machine learning and automation.

Data Creation:

Creates a Pandas DataFrame, a two-dimensional labelled data structure, with columns "Name" "Age" and "Sex"

import pandas as pd

df = pd.DataFrame({

"Name": ["Braund, Mr. Owen Harris","Allen, Mr. William Henry","Bonnell, Miss. Elizabeth"],

"Age": [22, 35, 58],

"Sex": ["male", "male", "female"]

})

Column Access:

Accesses the "Age" column from the DataFrame df. Returns a Pandas Series object containing the age values for each row.

df["Age"]

Create a Series:

Creates a Pandas Series named "Age" containing the values 22, 35, and 58.

ages = pd.Series([12, 55, 87], name="Age")

ages

Load the Data:

Loads the "train.csv" file into a pandas DataFrame named titanic and displays its first 5 rows.

titanic = pd.read_csv("train.csv")

titanic.head()

dtypes

dtypes in Python, specifically within the context of data analysis using the Pandas library, provides a concise overview of the data types present in each column of the titanic DataFrame.

titanic.dtype

info()

The code titanic.info() in Python, assuming titanic is a Pandas DataFrame, provides a detailed summary of the data it holds.

1. It displays information like the total number of rows and columns in the DataFrame.

2. It reveals the data type of each column (e.g., integer, float, string).

3. It highlights any missing values present within the data, allowing you to identify potential areas for data cleaning.

titanic.info()

# Dropping Columns in the Dataframe

dfx = df.copy('Deep')

dfx = dfx.drop(['PassengerId','Ticket','Name'],axis = 1)

dfx.head()

# Replacing Values/Names in a Column:

df1 = dfx.copy('Deep')

df1["Survived"].replace({0:"Died" , 1:"Saved"},inplace = True)

df1.head(3)

# Drop Rows

df1 = dfx.drop(labels=[1,3,5,7],axis=0)

df1.head()

df.columns.tolist()

# Missing Value check Method 1

print('Method 1:')

df.isnull().sum()/len(df)*100

# Missing Value check Method 2

print('Method 2:')

var1 = [col for col in df.columns if df[col].isnull().sum() != 0]

print(df[var1].isnull().sum())

# Missing Value check 3

print('Method 3:')

import missingno as msno

msno.matrix(df)

plt.show()

# Find The Null Rows in a Particular Featues

df[df['Embarked'].isnull()]

# Find Rows with missing Values

sample_incomplete_rows =df[df.isnull().any(axis=1)].head()

sample_incomplete_rows

# Describe dataset

df.describe()

# Aggregate Function

df[['Age','Fare','Pclass']].agg(['sum','max','mean','std','skew','kurt'])

# value_counts

df['Embarked'].value_counts().to_frame()

# Count

df[['Age','Embarked','Sex']].count()

#Shuffling the data

df2 = df.sample(frac=1,random_state=3)

df2.head()

# Correlation of Data

import seaborn as sns

corr = df.select_dtypes('number').corr()

display(corr)

sns.heatmap(corr, annot=True, cmap='viridis')

plt.xlabel('Features')

plt.ylabel('Features')

plt.title('Correlation Heatmap')

plt.show()

# Find all Notna Columns

df1 = df1[df1['Cabin'].notna()]

df1.head()

# Dropna Method

df1 = df.dropna()

df1.head()

# Fillna (ffill) Method

df1.fillna(method="ffill", inplace=True)

df1.head()

# Fill Null Values by Mean Value

df1["Age"] = df1["Age"].fillna(df1["Age"].mean())

df1.head()

# Fill Null Values by Desired Value

df1['Embarked'] = df1['Embarked'].fillna(df1['Embarked'] == 'Q')

df1.head(5)

# Find All Null Values in the dataframe

df1 = df1.drop('Cabin',axis =1)

sample_incomplete_rows = df1[df1.isnull().any(axis=1)]

display(sample_incomplete_rows.shape)

sample_incomplete_rows.head()

# Find Selected Values

titanic_Fare500 = df1[df1['Fare'] > 500][['Name','Embarked']]

display(titanic_Fare500.shape)

titanic_Fare500

# Find Value in a Range

titanic_age_selection = df[(df["Sex"] == "male") & (df["Age"] > 50.00)]

display(titanic_age_selection.shape)

titanic_age_selection.head()

# Select particular value in a feature

titanic_Pclass = df1[df1["Pclass"].isin([1, 2])]

display(titanic_Pclass.shape)

titanic_Pclass.head()

# Select with Multiple Conditions

titanic_Pclass = df1[(df1["Pclass"] == 1) & (df1["Sex"] == 'female') & (df1["Age"] > 50 ) ]

display(titanic_Pclass.shape)

titanic_Pclass.head()

# Sort_Values

df1 = df.copy()

df1.sort_values(by = 'Age' , ascending = False)[['Name','Ticket','Survived','Pclass', 'Age' ]].head()

Page updated

Google Sites

Report abuse