Overview:
For the binary classification task on the YouTube dataset, I have chosen a straightforward feedforward neural network architecture. This design comprises an input layer, a hidden layer, and an output layer. The input layer neurons match the number of features in the dataset, while the hidden layer incorporates a moderate number of neurons. The output layer is configured with a single neuron to accommodate binary classification (like/dislike). Activation functions include ReLU for the hidden layer and Sigmoid for the output layer, contributing to the network's capability to capture complex relationships and make accurate predictions.
Data Prep.
The binary classification task on the YouTube dataset involves predicting whether a video will have high or low 'Likes.' The 'Likes' column has been transformed into binary labels, where 1 represents high likes and 0 indicates low likes. The dataset has been strategically divided, allocating 70% for training the model and reserving 30% for testing its performance. This partitioning ensures that the model generalizes well to new, unseen data during evaluation. The Training Set is utilized to optimize model parameters, while the Testing Set serves as an independent benchmark to assess the model's accuracy and effectiveness in making predictions on unfamiliar instances.
Image of the dataset:
Code.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
# Load your YouTube dataset
df = pd.read_csv('youtube_data_final_1_1.csv')
# Link to "youtube_data_final_1_1.csv" file : https://drive.google.com/file/d/15pKK_1ftSTwLWlLo06nHcDd6pR0xKHIw/view?usp=drive_link
# Assuming 'Likes' is your target column
df['Likes'] = df['Likes'].apply(lambda x: 1 if x > df['Likes'].median() else 0)
# Drop non-numeric and unnecessary columns
non_numeric_columns = df.select_dtypes(exclude=['float64', 'int64']).columns
df = df.drop(columns=non_numeric_columns)
# Split the data into Features (X) and Target variable (y)
X = df.drop('Likes', axis=1)
y = df['Likes']
# Feature Scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split the data into Training and Testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)
# Build the Neural Network model with improved configurations
model = Sequential([
Dense(64, activation='relu', input_dim=X_train.shape[1]),
BatchNormalization(), # Add BatchNormalization
Dense(32, activation='relu'),
BatchNormalization(),
Dense(1, activation='sigmoid')
])
optimizer = Adam(learning_rate=0.001) # Adjust learning rate
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
# Early stopping to prevent overfitting
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
# Train the model with more epochs
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test), callbacks=[early_stopping])
# Evaluate the model on the test set
y_pred = (model.predict(X_test) > 0.5).astype("int32")
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
# Create confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Plot confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()
# Plot training history
plt.plot(history.history['accuracy'], label='train_accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.title('Training History')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
Explanation:
Certainly! Here's a breakdown of the provided code in bullet points:
1. Import Libraries:
- Imports necessary libraries for data manipulation, machine learning, and visualization.
2. Load Dataset:
- Reads a dataset from a CSV file ('youtube_data_final_1_1.csv').
- Converts the 'Likes' column into binary labels (1 or 0) based on the median value.
3. Data Preprocessing:
- Drops non-numeric and unnecessary columns from the dataset.
4. Split Data:
- Separates data into features (`X`) and the target variable (`y`).
- Scales the features using `StandardScaler`.
- Splits the data into training and testing sets.
5. Build Neural Network Model:
- Constructs a Sequential neural network model with:
- Input layer with 64 neurons and ReLU activation.
- Batch Normalization layer.
- Hidden layer with 32 neurons and ReLU activation.
- Batch Normalization layer.
- Output layer with 1 neuron and sigmoid activation for binary classification.
6. Compile Model:
- Compiles the model using the Adam optimizer, binary cross-entropy loss, and accuracy as the metric.
7. Training:
- Implements early stopping to prevent overfitting during training.
- Trains the model for 50 epochs using a batch size of 32.
8. Evaluate Model:
- Predicts on the test set and converts probabilities to binary predictions.
- Calculates and prints the accuracy of the model.
9. Confusion Matrix Visualization:
- Creates a confusion matrix to visually evaluate the model's performance on the test set.
- Displays the confusion matrix using a heatmap.
10. Training History Visualization:
- Plots and displays the training history, showing accuracy changes over epochs for both training and validation sets.
Results.
Output Explanation:
Data Preprocessing:
Binary labels are assigned to the 'Likes' column, distinguishing likes above the median (1) from those below or equal to it (0).
Non-numeric and unnecessary columns are removed, retaining only numeric features.
Data Splitting:
The dataset is divided into features (X) and the target variable (y).
Standard scaling is applied to normalize feature values.
The dataset is further partitioned into training and testing sets, with a split of 70% for training and 30% for testing.
The Adam optimizer with a learning rate of 0.001 is utilized, and binary crossentropy serves as the chosen loss function.
Training and Early Stopping:
The model undergoes training for 50 epochs with a batch size of 32.
Early stopping is incorporated with a patience of 5 epochs to mitigate overfitting, ensuring restoration to the best weights.
Model Evaluation:
The model is assessed on the test set.
Predictions are binarized at 0.5 to convert probabilities into binary labels (0 or 1).
Accuracy Calculation:
Model accuracy is computed using scikit-learn's accuracy_score by comparing predicted labels (y_pred) with actual labels (y_test).
Confusion Matrix:
A confusion matrix is generated, offering detailed insights into True Positives, True Negatives, False Positives, and False Negatives.
Visualizations:
A heatmap illustrating the confusion matrix provides a visual depiction of the model's performance.
Training history is visualized, indicating accuracy trends on both the training and validation sets throughout epochs.
Conclusion:
The obtained accuracy and confusion matrix provide valuable insights into the model's efficacy in binary classification based on the provided features. Visualizations aid in comprehending the training process and potential overfitting.
Architectural Diagram:
Input Layer:
The input layer has neurons equal to the number of features in your dataset. The number of features is determined by the shape of X_train, and you have used input_dim=X_train.shape[1] in the first Dense layer.
Activation function: None (implicitly linear activation when not specified).
Hidden Layers:
First Hidden Layer:
Number of neurons: 64.
Activation function: ReLU.
Batch Normalization Layer after the first hidden layer.
Second Hidden Layer:
Number of neurons: 32.
Activation function: ReLU.
Batch Normalization Layer after the second hidden layer.
Output Layer:
Number of neurons: 1 (because it's a binary classification problem).
Activation function: Sigmoid (since it's a binary classification problem).
Optimization Algorithm:
Adam optimizer with a learning rate of 0.001.
Loss Function:
Binary Crossentropy (suitable for binary classification tasks).
Training Configuration:
Training for 50 epochs.
Batch size: 32.
Early stopping is implemented to monitor validation loss, with patience set to 5 epochs.
This architecture can be represented as follows:
Conclusion:
Key Learnings and Anticipated Outcomes:
Model Accuracy Assessment:
Evaluating the model's accuracy on the test set serves as a quantitative gauge for its proficiency in categorizing YouTube videos into high and low likes, based on the chosen set of features.
Feature Significance Understanding:
Delving into the model architecture and training dynamics unveils insights into the importance assigned to specific features. The weights assigned to features offer implicit indications of their relevance in predicting likes.
Mitigation of Overfitting:
The incorporation of BatchNormalization and early stopping reflects a preemptive strategy to forestall overfitting. This ensures the model's ability to generalize effectively to new, unseen data.
Discerning Training Trends:
Analyzing the training history graph aids in pinpointing epochs where the model's performance on the validation set reaches a plateau or experiences deterioration. This analysis informs strategies for refining the model training process.
Patterns in Confusion Matrix:
Scrutinizing the confusion matrix reveals potential patterns in misclassifications, such as discernible trends in false positives or false negatives. This information guides further fine-tuning of the model.
Future Predictive Capabilities:
Armed with the trained model and validated performance metrics, the next step involves applying the model to novel, unseen data to predict the likelihood of a YouTube video garnering high or low likes based on its features.
Optimization Avenues Exploration:
The analysis prompts considerations for further optimization endeavors, including experimenting with additional layers, nodes, or alternative activation functions to elevate the model's efficacy.
Applicability to Analogous Datasets:
The assimilated knowledge extends its relevance to analogous datasets within the realm of YouTube video analytics. Similar models can be constructed for diverse prediction tasks associated with user engagement.