Course Projects

Explainable Movie Recommender

  • Developed a transparent and explainable movie recommendation system in Python. The tool shows what genres the user typically watches and explains the individual recommendations in natural language

  • Library Used: numpy, scipy, panda, nltk

[Project URL]

Image Classification

  • Developed a simple image classification pipeline from scratch based on the k-Nearest Neighbor, SVM/Softmax classifier and Two layer neural network on CIFAR-10 Dataset

  • Implemented Two Layer Fully Connected Network and Three Layer CNN Network from scratch, also played around with Dropout and batch normalization on CIFAR-10 Dataset. The model achieves around 50% accuracy

  • Implemented DNN architecture similar to ResNet using PyTorch and the model achieves around 72% accuracy on CIFAR-10 Dataset

  • Library Used: numpy, scipy, matplotlib, PyTorch

[Project URL]

House Price Prediction

  • This is part of the Kaggle Competition where you’re challenged to predict the final price of each home with 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa.

  • My submission is top 6% in Kaggle where I've used advanced regression technique to predict the final price like ensembling StackedRegressor, XGBoost and LightGBM

  • Library Used: Pandas, NumPy, Matplotlib, scikit-learn

[Project URL]

Real or Not? NLP with Disaster Tweets

  • This is part of the Kaggle Competition where you’re challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t

  • I've done exploratory data Analysis, preprocessing, wordcloud for common words in real and non-real tweets and build a DNN model with Bidirectional GRU with accuracy of 71.437%

[Project URL]

WebCrawler

  • WebCrawler built in python that traverse the web associated with user-specified root URL address using Iterative Deepening Search (IDS) algorithm

  • The program saves each url's HTML to a file and runs a Character Unigram Feature Extractor on those files

[Project URL]

Topic Modeling with LDA

  • Developed unsupervised LDA model to detect topic of news articles

  • Library used: nltk, gensim, pyLDAvis

[Project URL]

News Category Prediction

  • Adopted logistic regression to predict news category (i.e., Entertainment, Games) with accuracy of 87%

  • Library Used: numpy, pandas, sklearn

[Project URL]

cpmFS

  • Designed and implemented a simple file system called cpmFS which allows users to list directory entries, rename files, copy files, delete files, as well as code to read/write/open/close files.

[Project URL]

pWordCount

  • pWordCount is a pipe-based word count tool, where two processes are coorperating through unix pipes to count word in a text file

[Project URL]