YouTube Comments Review

Based on hand-labelled toxicity data set containing 1000 comments crawled from YouTube videos about the Ferguson unrest in 2014. In addition to toxicity, this data set contains labels for multiple subclassifications of toxicity which form a hierarchical structure. Each comment can have multiple of these labels assigned. The structure can be seen in the following enumeration:

Data Source : https://www.kaggle.com/datasets/reihanenamdari/youtube-toxicity-data

Analysis Summary

Train - Test Data Split : 80:20%

#evaluated 98% (recall) of the 647 false samples with 68% accuracy [out of 647 evaluated 634 samples with 431 samples identified correctly]

#evaluated 27% (recall) of the 353 true samples with 91% accuracy [out of 353 evaluated 95 samples with 86 samples identified correctly]

647+353 = Total 1000 Samples

*Based on 200 Testing Sample set

Observation : 25% (approx) FN needs Furter refinement

Resources :

from sklearn.feature_extraction.text import TfidfVectorizer

NLTK :: Natural Language Toolkit NLTK :: Natural Language Toolkit https://www.nltk.org Natural Language Processing with Python provides a practical introduction to programming for language processing. Written by the creators of NLTK, it guides the ..

import re

from nltk.corpus import stopwords

import nltk

nltk.download('stopwords')

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix

from flask import Flask, request, jsonify

import joblib

Deployment

# Save the model and vectorizer

joblib.dump(model, "model.pkl")

joblib.dump(tfidf_vectorizer, "vectorizer.pkl")

# Load the model in the API

model = joblib.load("model.pkl")

vectorizer = joblib.load("vectorizer.pkl")

API Testing via Flask Framework

app = Flask(__name__)

@app.route('/')

def home():

return "Hello, Flask in Jupyter!"

@app.route('/predict', methods=['POST'])

def predict():

data = request.json

review = data['review']

print(review)

review_vector = vectorizer.transform([review]).toarray()

prediction = model.predict(review_vector)

return jsonify({'prediction': int(prediction[0])})

if __name__ == '__main__':

app.run(debug=True, use_reloader=False)

Page updated

Google Sites

Report abuse