Creating the models

TF-IDF Vectorizer

Tfidvectorizer analyzes the dataset by determining the importance of each word in the text. It converts a raw dataset to a matrix of numbers representing how often a word is repeated throughout a document.

n-grams = the range of the amount of words in a sentence (ex. "Good puppy" is a 2 gram and "Good" and "Puppy" has a range of (1,2))

max_features = the amount of unique features to consider when the data is being split (important, not important)

Bernoulli Naive Bayes

BNB implements the Naive Bayes training and classification methods for data distributed through applying the Naive Bayes theorem. Despite the numerous features, each one is taken to be a binary-valued variable.

Linear Support Vector Classification

SVC maximizes the margin between two classes, which is separated by a hyperplane. Anything that falls to either side of the hyperplane will be labelled as class 1 or 2.

Logistic Regression

Logistic regression uses a sigmoid curve to use binary classification and label data as 0 or 1.

Page updated

Google Sites

Report abuse