Machine Learning Final Projects

Machine Learning 2021 Spring

Xinyu Li, Li Yi, Harry Lee

A Lightweight Deep Sequential Model for Continuous Chord and Melody Generation (paper)

With the fast progress in deep learning in recent years, generative models have given machines unprecedented artistic creativity in the level of grids (e.g. images) and simple sequences (e.g. words and sentences). However, music generation remains to be a challenging fields for machines. In this paper, we focus on music composition, specifically, chord and melody generation, which is a fundamental part of songwriting. By analyzing the symbolic representation of melody and chords in some time series, deep sequential models should grasp the inner connections between melody notes and chords. We propose a lightweight deep sequential model that continues the composition based on some given chords and melody. By using some masking techniques, we can regularize the generating process, where we can control the rhythm and decide where the melody notes should change or rest.

Jiasheng Ni, Muqing Yang, Ruochen Miao

An Image Captioning Model with CNN and RNN (paper)

In our project, we try to apply the knowledge of deep learning to design an image captioning model. It can capture and understand features from random input images, and then generate a reasonable caption to describe the content of the image. We notice that many existing image captioning models do not obtain the attention mechanism and are flawed in their performance, so we add the attention mechanism into the model construction. After researching, we build an encoder-decoder model with attention for image captioning. We use the CNN model as the encoder and try on two pre-trained models: ResNet50 and VGG19. Then we use the RNN model as the decoder and compare the performance between LSTM and GRU respectively with global attention, local attention or no attention. We apply different optimizers to test the gradient descent speed and model accuracy. After model evaluation, we choose the model that is composed of a pre-trained VGG-19 CNN encoder and a one-layered LSTM RNN decoder with local attention. This model can effectively identify the elements in the image and generate captions. As for the demonstration, we build a web application allowing users to upload an image and show the caption generated by our model.

Gracie Zhou, Qianyu Zhu, Sihan Liu

Automatic Music Transcription with LSTM and RNN (paper)

With the aim of classifying different songs based on the emotions they can inspire, this project trains different machine learning models with different sets of features. We stack traditional models including SVM and random forests, train it with the shallow features provided by Spotify’s API, yielding an average precision of 0.76. recall 0.77, f1-score 0.76 for three-class classification task. This comparatively low accuracy can be attributed to the limited size of the dataset and the inherent errors in the labels. In comparison, the more complicated CLDNN model (CNN + LSTM + DNN), achieves an accuracy of 90.56% in binary classification.

Jingru Fan, Haochen Hu, Xujun Lian

Chatbot Using LSTM & CNN (paper)

Recurrent neural network (RNN) is widely used in the field of natural language processing, which promotes the development of machine translation, voice assistant and other artificial intelligence technologies. In this paper, we design a chatbot based on an Encoder-Decoder based Seq2Seq model and Gated Recurrent Unit (GRU). In addition, we include bi-directional GRU and attention mechanism to improve model performance. We also use techniques like Teach Forcing, Gradient Clipping and Adam optimizer to boost up the training process. Finally, our model is able to have basic conversations with humans andgenerate understandable answers to questions.

Tang Sheng, Jingyi Zhao

Colorization of Monochrome Pictures (paper)

This project tackles the issue of restoring monochromatized images with plausible colorization. We train a CNN (convolutional neural network) which predicts the color of each pixel given the grayscale value in a CIE Lab color space. CNN performs well in coping with computer vision, and our trained model is capable of coloring landscape pictures in a valid way.

Mengjie Shen, Yajie Xiao, Yuanhao Shen

Comparing Models for Image Text Extraction in Automatic Chinese Car Plate Recognition System (paper)

This project aims at building an end-to-end Chinese car-plate detection and recognition system using deep neural networks. The project is divided into two parts: car-plate detection and character recognition. In the first stage of the system, we would customize the YOLOv5 object detection model to locate the bounding box coordinates and take the sub-image containing the car-plate. Then, we would conduct pre-processing on the sub-image: pre-scaling the images, transforming to grayscale images, and drawing contours of each character in the car-plate. The main contribution of this project is that we perform a modified version of LeNet 5 CNN to recognize the Chinese character and apply pytesseract to recognize the numerals and alphabets.

Crystal Wen

Cultural Collective Consciousness by a Country: A Spotify Case Study (paper)

While many forms of entertainment become more integrated into culture and politics, the success of short-form content, seemingly random or ungrounded, generally remains difficult to understand. The real-world significance of popular music is much greater than it often gets credit for, as it can resonate deeply and shape perceptions. Analyzing how audio features translate to national popularity can uncover some of the mechanisms behind what different audiences are searching for from music. This paper explores possible algorithms to those ends.

Haozhen Guo, Siyuan Yu, Yufeng Bai

Facial Expression Recognition under COVID-19 Era (paper)

This project aims to recognize the facial expressions under the covid-19 pandemic, the “mask era”, where the lower half of face information is covered. We applied the self-design functions to paste all the sample images with masks as data preprocessing. Additionally, we used CNN as the major model to solve the question with hyperparameter and model complexity tuning. Finally, we evaluated our model, predicted the facial expressions from collecting images, and explored the model limitations and future development.

Leyi Guo

Live Cancer Molecular Clustering (paper)

The goal of this project is to cluster liver cancer based on the genes of patients. Liver cancer molecular clustering is an interesting topic because, based on the clustering, we can analyze which genes and proteins contribute most to the clustering, leading our study on targeted medicine of liver cancer. The challenge is to process the gene dataset so that the clustering results are more closely aligned with medical criteria. By tuning the K-means clustering model on a classified breast cancer dataset, we chose to apply t-SNE as a feature transformation, making the clustering closer to the way doctors classify them. Our final model successfully divides the dataset of liver cancer samples into three clusters and gives the most significant genes.

Yuxuan Li, Xinran Wang, Shuhan Yuan

Garbage Sorting: Image Classification with CNN and Data Augmentation (paper)

This project aims to apply Convolutional Neural Network (CNN) to the classification of the images of household garbage. The processed dataset for this project involves 7200 household garbage photos uniformly distributed in 12 categories. The project first examines the performance of VGG19, ResNet18 and ResNet50 with default model settings, and finds out that ResNet50 outperforms the other two models. With further training data augmentation and dropout rate tuning, the ResNet50 model with augmented data and a dropout rate of 0.5 obtains a 93.25% testing accuracy. A further analysis on error cases is also carried out, using a confusion matrix to find the model’s frequently misclassified categories. Finally, this model’s practical capability is tested by some images of garbage manually taken in real life.

Zecheng Wang, Kaan Tekin, Junyan Feng

Handwriting Recognition (Characters & Numbers) Modeling with SVM and Decision Tree (paper)

Optical Character Recognition (OCR) has become a very important topic in the age of digitization. Handwriting recognition is among the most challenging OCR problems that involves an almost infinite amount of variability and uncertainty in terms of the formatting and combination of handwritten texts. This project aims to implement a handwriting recognition system that takes several input sentences while trained with only individual characters using fast, highly cost-efficient classical machine learning algorithms and basic CNN architectures. Great emphasis has been put on reducing the dimensionality of the raw dataset while retaining as much information as possible. Our machine learning algorithms involves end-to-end, classical models (Support Vector Machine, Decision Tree, Random Forest) and Convolutional Neural Networks with less than 70,000 parameters. Choosing the best algorithm in terms of consistency and test accuracy, We have also design a prediction pipeline which connects a non-machine-learning segmentation method before the machine learning classification to cope with consecutive inputs in a single image. Among all the attempts, CNN feature extraction with SVM classification, CNN-only architecture and Random Forest are comparable in performances in terms of accuracy and consistency (87% to 89%). Given the input is correctly segmented using Histogram Projection method, the results produced by the prediction pipeline are highly consistent with the model’s test accuracy when dealing with characters taken from the dataset compared with those we have written ourselves

Wenbin Qi, Xinhao Liu

Mask Compatible Facial Verification (paper)

The “new normality” of COVID-19 makes wearing facial mask a daily routine. This causes a big problem to the widely-used facial verification systems because a number of them don’t perform well when recognizing people’s faces with masks on. This adds many extra work to people’s daily life especially for frequently used systems like iPhone’s face id, since users need to take off their masks to be identified. The difficulty of this problem mainly lies in how to recognize a person’s identity only be the face that is not covered by the mask. We solve this problem mainly by introducing center loss and CBAM to traditional convolutional neural network. Though the accuracy of our model is not very high, when we apply it to to real-time videos, it has a good performance in distinguishing different identities.

Penghao Weng, Dennis Hu

NBA Team Winning Rate and Season Record Prediction (paper)

Statistics driven performance analysis has been gaining ground in all major sports leagues. Particularly, the National Basketball Association (NBA) is at the frontlines of transitioning to data-driven approaches. While various statistical models for predicting the MVP candidate and the championship possibilities have been developed, the field of single-game winning rate prediction has gone rather under-explored. Here we report our efforts in achieving accuracies of predicting single-game outcomes comparable to preexisting machine learning models. After formatting the raw dataset with aggregate functions that yield recent performance, we evaluated across five different models to generate our best guess at the W/L result. Since our model takes into account the lineups of each team, it also offers quantitative insights into NBA team’s roster management.

Anran Wang, Tian Jin, Zhenming Wang

People's Daily Micro-blog Simulator: Applications of Machine Learning Models to Assist Editors to Evaluate Posts (paper)

Weibo is one of the largest social platform where tens of millions of users publish and exchange information everyday. Among the many official accounts, People’s Daily is one of the most influential ones. Therefore, it is worth looking into the text corpus of People’s Daily’s weibo account and is insightful to explore the relationship between each blog post and their popularity on Weibo, under the measurement of likes, comments and forwards. We used several machine learning models to address our research problem, from linear regression to neural networks. In general, different models tend to give quite different predictions of target variables.

Jiahe Tian, Angelina Shen, Luisa Wang

Predicting the Price of Second-hand Cars in China (paper)

In 2020, Chinese second-hand car market reached over 14,000,000 transactions. Traditionally, the prices of second-hand cars are set manually by the labor force. In this paper, we introduced machine learning algorithms to predict the price of a second-hand car given its features such as body types, travel distances, etc. Several machine learning algorithms including linear regression, gradient boosting decision trees, random forest and neural networks are employed in our work. The performance of the algorithms is evaluated based on the precision of the prediction and quantified by the mean absolute error.

Yuxuan Wang, Jiachen Zhang

Project Proposal (COVID-19) (paper)

This project aims to predict the severity of patients’ potential adverse reaction after COVID-19 vaccination. We collected data from a government system, preprocessed the features and assigned two label sets to each observation for binary classification and multiclass classification. Six models including Logistic Regression, SVM, LDA, KNN and Boosting Tree were applied. The models achieved similar accuracy scores around 76% for binary classification and 72% for multi-class classification, with Histogram Gradient Boosting delivering the best performance. Moreover, the coefficients of variables followed the same order of significance across all models, with vaccine type, vaccine manufacturer, patients’ age and sex ranking the most important.

Boyan Xu, Jialiang Zhong, Zhengyuan Liu

RNN-based Stock Price Predictor (paper)

In this study, we implement machine learning methods to predict the stock price movement (regression task) and fluctuation signal (classification task). To best fit the sequential format of input data, we choose RNN and LSTM as our models for both classification and regression tasks. We adopt simple linear regression and ridge regression as benchmarks for the regression task; and we adopt logistic and SVM regression as benchmarks for the classification task. We find neither model outperforms random guess significantly in classification task, but our models’ performance on price regression task is satisfactory.

Star Chen, Muyang Xu, Chengyu Zhang

Semantic Segmentation on Aerial Drone Image Using U-net (paper)

Our project addresses the developing automation in aerial drone piloting and image capturing. While current automation is not yet ripe, with the manual operation still needed, we decided to run image semantic segmentation on aerial drone-captured images to automate and refine the object detection in drone operation. This project first uses U-Net and then advances to Mobile-Unet as the primary machine learning approach towards solving the problem. Our model yields satisfactory results with the implementation of the high-performance model (in both computational time and accuracy) and by fine-tuning the model and self-creating testing datasets. The full implementation (based on PyTorch) and the trained networks are available at repository, please refer to its README.md file for details.

Zihan Shao, Yuqian Sun

Sentiment Labeling of Movie Reviews (paper)

Sentiment has long been seen as a unique characteristic that only humans hold. But with the development of technology, it is becoming possible for computers to predict the sentiment contained in human language. In this project, we build several binary classification models that analyse short movie comments to predict whether it delivers a positive or negative sentiment, which can make qualitative judgments on audience’ reflection of the movie. The procedure of building the model involves a data preprocessing part with many, then with word embedding methods like Word2Vec, TFIDF, and classifier SVM, SVC, Random Forest, Naive Bayes, FCNN. The predicted sentiments were evaluated by accuracy. The results show that word embedding method TF-IDF followed by a simple FCNN outperforms all other models with 91% accuracy. In future, more effort should be devoted to a better word embedding method or a trainable word embedding method.

Wes Wang, Yilun Kuang, Youqing Liang

Photorealistic Cyberpunk Scene Stylization Based on Neural Style Transfer (paper)

Many modern NLP systems perform bidirectional encoding of the input sentences for integrating contextual information. It remains unclear whether bidirectional inputs perform similarly as bidirectional models. This study investigates the role of bidirectional inputs by training the forward LSTM with unidirectional and bidirectional input sentences from CoLA, spam email, and COVID sentiment analysis datasets. By eliminating potential confounders, we establish that bidirectional inputs are causal factors for the improvement in classification performance for LSTM on certain tasks. We also present a qualitative analysis for the misclassified examples in the context of unidirectional and bidirectional inputs.

Thomas Hillenmeyer

Determining Cryptocurrency Trends with Twitter (paper)

(Not found) Common images and videos primarily focus on people. Indeed, about 35% of pixels in movies and YouTube videos as well as about 25% of pixels in photographs belong to people (Laptev, 2013). Therefore, person detection in videos as well as photographs is a key problem for computer vision and object detection. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. Especially in surveillance video data, diverse angles and poses are involved, which is valuable to analyze and worthwhile information can be extracted. In our school, all the classrooms has limits for the number of people allowed at the same time and they are equipped with video surveillance cameras. However, no detection of overrunning the people limit has been made. In this paper, I will apply the prevailing YOLO model, train it on customized data, to conduct a real-time detection of head number in videos.

Dianjing Fan, Yukai Yang

Email Response Suggestions (paper)

(Not found) Common images and videos primarily focus on people. Indeed, about 35% of pixels in movies and YouTube videos as well as about 25% of pixels in photographs belong to people (Laptev, 2013). Therefore, person detection in videos as well as photographs is a key problem for computer vision and object detection. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. Especially in surveillance video data, diverse angles and poses are involved, which is valuable to analyze and worthwhile information can be extracted. In our school, all the classrooms has limits for the number of people allowed at the same time and they are equipped with video surveillance cameras. However, no detection of overrunning the people limit has been made. In this paper, I will apply the prevailing YOLO model, train it on customized data, to conduct a real-time detection of head number in videos.

Yang Zhao

Emotion Analysis of Weibo (paper)

(Not found) Common images and videos primarily focus on people. Indeed, about 35% of pixels in movies and YouTube videos as well as about 25% of pixels in photographs belong to people (Laptev, 2013). Therefore, person detection in videos as well as photographs is a key problem for computer vision and object detection. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. Especially in surveillance video data, diverse angles and poses are involved, which is valuable to analyze and worthwhile information can be extracted. In our school, all the classrooms has limits for the number of people allowed at the same time and they are equipped with video surveillance cameras. However, no detection of overrunning the people limit has been made. In this paper, I will apply the prevailing YOLO model, train it on customized data, to conduct a real-time detection of head number in videos.

Eric Zhang, Wenhan Yang

Emotion Recognition Based on Different Machine Learning Models (paper)

(Not found) Common images and videos primarily focus on people. Indeed, about 35% of pixels in movies and YouTube videos as well as about 25% of pixels in photographs belong to people (Laptev, 2013). Therefore, person detection in videos as well as photographs is a key problem for computer vision and object detection. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. Especially in surveillance video data, diverse angles and poses are involved, which is valuable to analyze and worthwhile information can be extracted. In our school, all the classrooms has limits for the number of people allowed at the same time and they are equipped with video surveillance cameras. However, no detection of overrunning the people limit has been made. In this paper, I will apply the prevailing YOLO model, train it on customized data, to conduct a real-time detection of head number in videos.

Yuelong Li, Olivia Wang, Yuting Wang

From Lizst to Justin Bieber: An Audio-based Music Genre Classifier (paper)

(Not found) Common images and videos primarily focus on people. Indeed, about 35% of pixels in movies and YouTube videos as well as about 25% of pixels in photographs belong to people (Laptev, 2013). Therefore, person detection in videos as well as photographs is a key problem for computer vision and object detection. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. Especially in surveillance video data, diverse angles and poses are involved, which is valuable to analyze and worthwhile information can be extracted. In our school, all the classrooms has limits for the number of people allowed at the same time and they are equipped with video surveillance cameras. However, no detection of overrunning the people limit has been made. In this paper, I will apply the prevailing YOLO model, train it on customized data, to conduct a real-time detection of head number in videos.

Linhao Hu, Tingzhao Fang, Rui Sun

Happiness Data Analysis and Level Prediction (paper)

(Not found) Common images and videos primarily focus on people. Indeed, about 35% of pixels in movies and YouTube videos as well as about 25% of pixels in photographs belong to people (Laptev, 2013). Therefore, person detection in videos as well as photographs is a key problem for computer vision and object detection. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. Especially in surveillance video data, diverse angles and poses are involved, which is valuable to analyze and worthwhile information can be extracted. In our school, all the classrooms has limits for the number of people allowed at the same time and they are equipped with video surveillance cameras. However, no detection of overrunning the people limit has been made. In this paper, I will apply the prevailing YOLO model, train it on customized data, to conduct a real-time detection of head number in videos.

Diane Gu, Zining Wang

Hum to Search (paper)

(Not found) Common images and videos primarily focus on people. Indeed, about 35% of pixels in movies and YouTube videos as well as about 25% of pixels in photographs belong to people (Laptev, 2013). Therefore, person detection in videos as well as photographs is a key problem for computer vision and object detection. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. Especially in surveillance video data, diverse angles and poses are involved, which is valuable to analyze and worthwhile information can be extracted. In our school, all the classrooms has limits for the number of people allowed at the same time and they are equipped with video surveillance cameras. However, no detection of overrunning the people limit has been made. In this paper, I will apply the prevailing YOLO model, train it on customized data, to conduct a real-time detection of head number in videos.

Xinyue Liu, Yuejiao Qiu

Image Transformation (paper)

(Not found) Common images and videos primarily focus on people. Indeed, about 35% of pixels in movies and YouTube videos as well as about 25% of pixels in photographs belong to people (Laptev, 2013). Therefore, person detection in videos as well as photographs is a key problem for computer vision and object detection. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. Especially in surveillance video data, diverse angles and poses are involved, which is valuable to analyze and worthwhile information can be extracted. In our school, all the classrooms has limits for the number of people allowed at the same time and they are equipped with video surveillance cameras. However, no detection of overrunning the people limit has been made. In this paper, I will apply the prevailing YOLO model, train it on customized data, to conduct a real-time detection of head number in videos.

Chengyang Song, Jiawei Zhang, Zihang Xia

Predict Stock Market Using Machine Learning (paper)

(Not found) Common images and videos primarily focus on people. Indeed, about 35% of pixels in movies and YouTube videos as well as about 25% of pixels in photographs belong to people (Laptev, 2013). Therefore, person detection in videos as well as photographs is a key problem for computer vision and object detection. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. Especially in surveillance video data, diverse angles and poses are involved, which is valuable to analyze and worthwhile information can be extracted. In our school, all the classrooms has limits for the number of people allowed at the same time and they are equipped with video surveillance cameras. However, no detection of overrunning the people limit has been made. In this paper, I will apply the prevailing YOLO model, train it on customized data, to conduct a real-time detection of head number in videos.

Yanyu Chen, Hongyi Zheng, Zihan Zhang

Predicting Peak Bloom Date of Cherry Trees with Classification Model and Neural Network (paper)

(Not found) Common images and videos primarily focus on people. Indeed, about 35% of pixels in movies and YouTube videos as well as about 25% of pixels in photographs belong to people (Laptev, 2013). Therefore, person detection in videos as well as photographs is a key problem for computer vision and object detection. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. Especially in surveillance video data, diverse angles and poses are involved, which is valuable to analyze and worthwhile information can be extracted. In our school, all the classrooms has limits for the number of people allowed at the same time and they are equipped with video surveillance cameras. However, no detection of overrunning the people limit has been made. In this paper, I will apply the prevailing YOLO model, train it on customized data, to conduct a real-time detection of head number in videos.

Yiqiu Luo, Zerui Ji

Prediction of Stock Market Price Direction through Machine Learning Algorithm (paper)

(Not found) Common images and videos primarily focus on people. Indeed, about 35% of pixels in movies and YouTube videos as well as about 25% of pixels in photographs belong to people (Laptev, 2013). Therefore, person detection in videos as well as photographs is a key problem for computer vision and object detection. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. Especially in surveillance video data, diverse angles and poses are involved, which is valuable to analyze and worthwhile information can be extracted. In our school, all the classrooms has limits for the number of people allowed at the same time and they are equipped with video surveillance cameras. However, no detection of overrunning the people limit has been made. In this paper, I will apply the prevailing YOLO model, train it on customized data, to conduct a real-time detection of head number in videos.