This project is a web-based application designed to classify text messages as either spam or not spam. It applies natural language processing (NLP) and machine learning techniques to analyze and predict whether a given message is legitimate or potentially harmful. The application features a simple and responsive user interface that allows real-time input and immediate feedback.
The primary objective of this project is to demonstrate the use of text preprocessing, feature extraction, and supervised machine learning to solve a practical classification problem. The system aims to provide fast and accurate predictions in an accessible format through a deployed web interface.
Programming Language: Python
Libraries and Frameworks: Scikit-learn, NLTK, Pandas, NumPy
Machine Learning Model: Multinomial Naive Bayes
Text Representation: TF-IDF Vectorization
Web Framework: Streamlit
Deployment Platform: Render
The dataset used consists of labeled SMS messages categorized as "spam" or "ham" (not spam). Preprocessing steps included converting text to lowercase, removing punctuation and stopwords, applying stemming using NLTK, and transforming the cleaned text into numerical features using TF-IDF vectorization.
After preprocessing, the dataset was split into training and testing subsets. A Multinomial Naive Bayes classifier was trained on the TF-IDF features. The model achieved over 95 percent accuracy on the test data and showed strong performance across precision, recall, and F1-score metrics.
The frontend of the application was developed using Streamlit, enabling users to interact with the model through a simple web page. Users can enter any message, and the system will return a classification indicating whether the message is spam or not. The app is deployed on Render and accessible online.
This project demonstrates a practical application of natural language processing and supervised learning. It highlights the importance of data cleaning, feature engineering, model selection, and user-focused deployment in building a functional machine learning solution.
GitHub Link: https://github.com/vishwaspw/Spam_Classifier