Course Projects
Explainable Movie Recommender
Developed a transparent and explainable movie recommendation system in Python. The tool shows what genres the user typically watches and explains the individual recommendations in natural language
Library Used: numpy, scipy, panda, nltk
Image Classification
Developed a simple image classification pipeline from scratch based on the k-Nearest Neighbor, SVM/Softmax classifier and Two layer neural network on CIFAR-10 Dataset
Implemented Two Layer Fully Connected Network and Three Layer CNN Network from scratch, also played around with Dropout and batch normalization on CIFAR-10 Dataset. The model achieves around 50% accuracy
Implemented DNN architecture similar to ResNet using PyTorch and the model achieves around 72% accuracy on CIFAR-10 Dataset
Library Used: numpy, scipy, matplotlib, PyTorch
House Price Prediction
This is part of the Kaggle Competition where you’re challenged to predict the final price of each home with 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa.
My submission is top 6% in Kaggle where I've used advanced regression technique to predict the final price like ensembling StackedRegressor, XGBoost and LightGBM
Library Used: Pandas, NumPy, Matplotlib, scikit-learn
Real or Not? NLP with Disaster Tweets
This is part of the Kaggle Competition where you’re challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t
I've done exploratory data Analysis, preprocessing, wordcloud for common words in real and non-real tweets and build a DNN model with Bidirectional GRU with accuracy of 71.437%
WebCrawler
WebCrawler built in python that traverse the web associated with user-specified root URL address using Iterative Deepening Search (IDS) algorithm
The program saves each url's HTML to a file and runs a Character Unigram Feature Extractor on those files
Topic Modeling with LDA
Developed unsupervised LDA model to detect topic of news articles
Library used: nltk, gensim, pyLDAvis
News Category Prediction
Adopted logistic regression to predict news category (i.e., Entertainment, Games) with accuracy of 87%
Library Used: numpy, pandas, sklearn
cpmFS
Designed and implemented a simple file system called cpmFS which allows users to list directory entries, rename files, copy files, delete files, as well as code to read/write/open/close files.
pWordCount
pWordCount is a pipe-based word count tool, where two processes are coorperating through unix pipes to count word in a text file