Welcome to Foundation of Data Science Laboratory
Welcome to Foundation of Data Science Laboratory
Let's create a Python program that handles missing values in a DataFrame using a hypothetical stock dataset. We will:
Create a DataFrame with missing values.
Handle missing values by:
Dropping rows with missing values.
Filling missing values with a specific value.
Filling missing values with the mean of the column.
Here's the complete Python program:
import pandas as pd
import numpy as np
# Create a DataFrame representing stock prices with some missing values
data = {
'Stock': ['AAPL', 'GOOG', 'TSLA', 'AMZN', 'MSFT'],
'Price': [150, np.nan, 750, 3300, np.nan],
'Volume': [10000, 15000, np.nan, 20000, 25000],
'Change (%)': [1.5, -2.3, np.nan, 0.5, -1.2]
}
# Create the DataFrame
df = pd.DataFrame(data)
print("Original DataFrame with Missing Values:")
print(df)
# 1. Dropping rows with missing values
df_dropped = df.dropna()
print("\nDataFrame after Dropping Rows with Missing Values:")
print(df_dropped)
# 2. Filling missing values with a specific value (e.g., 0)
df_filled_with_zero = df.fillna(0)
print("\nDataFrame after Filling Missing Values with 0:")
print(df_filled_with_zero)
# 3. Filling missing values with the mean of each column
df_filled_with_mean = df.copy() # Create a copy to preserve the original data
df_filled_with_mean['Price'] = df['Price'].fillna(df['Price'].mean())
df_filled_with_mean['Volume'] = df['Volume'].fillna(df['Volume'].mean())
df_filled_with_mean['Change (%)'] = df['Change (%)'].fillna(df['Change (%)'].mean())
print("\nDataFrame after Filling Missing Values with Mean:")
print(df_filled_with_mean)
1. Original DataFrame with Missing Values:
Stock Price Volume Change (%)
0 AAPL 150.0 10000.0 1.5
1 GOOG NaN 15000.0 -2.3
2 TSLA 750.0 NaN NaN
3 AMZN 3300.0 20000.0 0.5
4 MSFT NaN 25000.0 -1.2
2. DataFrame After Dropping Rows with Missing Values:
Stock Price Volume Change (%)
0 AAPL 150.0 10000.0 1.5
3 AMZN 3300.0 20000.0 0.5
3. DataFrame After Filling Missing Values with 0:
Stock Price Volume Change (%)
0 AAPL 150.0 10000.0 1.5
1 GOOG 0.0 15000.0 -2.3
2 TSLA 750.0 0.0 0.0
3 AMZN 3300.0 20000.0 0.5
4 MSFT 0.0 25000.0 -1.2
4. DataFrame After Filling Missing Values with the Mean:
Stock Price Volume Change (%)
0 AAPL 150.0 10000.0 1.500
1 GOOG 1400.0 15000.0 -2.300
2 TSLA 750.0 17500.0 -0.375
3 AMZN 3300.0 20000.0 0.500
4 MSFT 1400.0 25000.0 -1.200
Missing values are replaced with the mean of their respective columns:
Price mean: (150 + 750 + 3300) / 3 = 1400.0
Volume mean: (10000 + 15000 + 20000 + 25000) / 4 = 17500.0
Change (%) mean: (1.5 + -2.3 + 0.5 + -1.2) / 4 = -0.375
Create a DataFrame:
The DataFrame represents stock prices (Price), trading volume (Volume), and daily percentage change (Change (%)) for various stocks.
Some values are set as np.nan to simulate missing values.
Dropping Rows with Missing Values:
df.dropna() drops any rows that contain at least one missing value.
Filling Missing Values with a Specific Value:
df.fillna(0) replaces all missing values with 0.
Filling Missing Values with the Mean:
For each column, missing values are filled with the column’s mean using df['column'].fillna(df['column'].mean()).
The program will display three versions of the DataFrame:
Original DataFrame with missing values.
DataFrame after dropping rows with missing values (only rows without any NaN values remain).
DataFrame after filling missing values with 0.
DataFrame after filling missing values with the mean value of each column.