Projects

This page describes smaller projects that I have come up with my own (or with other students) or from courses. For research projects see Research and Work Experience.

Advanced HTTP Proxy


November - December 2022

Matt Zhou and I built a proxy supports standard HTTP GET and CONNECT communication over Transmission Control Protocol (TCP) for multiple clients. Each client who communicates with the proxy is serviced sequentially. The proxy maintains a finite capacity RAM cache of recently requested resources. The proxy also maintains a set of banned hostnames where any requests that a client sends to those hosts are rejected. Another feature of our proxy is byte rate limiting. 

The proxy also makes some additional performance optimizations by reusing client and server TCP connections. In particular, a client socket is kept open and a client can send multiple requests to the proxy over the same socket. The proxy will service one request at a time and service future requests only after returning the responses for earlier requests. The proxy also will keep a finite number of server sockets open if they are not in use. These sockets can be used by different clients who are interested in the same host and port. However, this capability is not supported for HTTPS communication. The proxy also will modify any requested HyperText Markup Language (HTML) pages by coloring any links green if the linked resource is fresh in the cache. 

The proxy has a separate mode that supports HTTPS communications over TCP. Most of the HTTPS functionality is implemented through the OpenSSL library. In HTTPS mode, the proxy establishes a secured SSL session with the client and another secure SSL session with the target server. The proxy achieves this “man in the middle” status through dynamic certificate generation. HTTPS mode helps provide extra security and protection for the client, and it also allows the proxy to decrypt and examine the HTTPS traffic going through the proxy. With this, the proxy is able to support caching, rate limiting, filtering, and other functionality with HTTPS traffic. Note that in HTTPS mode, server TCP connections are closed after a client finishes its communication, that is, we do not reuse server TCP connections in HTTPS mode. In addition, the proxy also supports CONNECT tunneling in the default HTTP mode. The client can utilize this mode simply by sending a CONNECT header to the proxy. 

This project is currently in a private GitLab repository. Please contact me for more information.

Extracting Binary Trees from Tree Drawings


November - December 2022

For our project, Ryan Polhemus and I built a model for constructing the binary trees that correspond to drawings of trees in images. One application of such a model would be to automatically grade questions that ask the respondent to draw a binary tree. For example, one could ask the respondent to draw an example of a binary search tree. The model would read in the image the student submits and produce a unique mathematical representation of the tree. This representation could be passed to another program that would determine if the tree is indeed a binary search tree and assign the student their score. We hope that software such as this could be one day integrated into existing educational software such as Gradescope  to expand its autograding functionality for computer science courses. See this GitHub repository for more information.

GitHub Scraping


October 2022 - Present

The script github_scraping.py can be used for scraping GitHub repositories based on a certain query keyword. This is useful when collecting potential solutions to assignments in a course. The script allows one to scroll through the files that match a particular search query and decide whether or not the file is considered a match for a course assignment. I worked on this project alongside Eitan Joseph. See this GitHub repository for more information.

MOSS Wrapper


September 2022 - Present

moss.py is designed for COMP 15 and enables use of MOSS with the Tufts Computer Science Department provide submission and grade frameworks, provides a more user friendly interface, and enables downloading of MOSS results in the interest of maintaining a record of plagiarism cases. To see the software, please see this GitLab repository.

DALPy


March 2021 - Present

DALPy is a Python module for learning data structures and algorithms. It is based off of Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. This library was made specifically for administering and grading assignments related to data structures and algorithms in computer science. The software was used in the spring 2022 semester at Brandeis University in COSI 21A Data Structures and the Fundamentals of Computing. To see the software, please view the project's PyPI and GitHub. To read about the project and its motivations, please read this article in the Brandeis Justice newspaper.  This project was formerly known as Cormen-Lib.

Wordle Solver


July - August 2022

This project examines different ways of solving the Wordle game. The project has two components. The first is the WordleSolver which tracks which words are eligible based on the feedback from previous guesses. The second component is the ranking scheme. The WordleSolver uses the ranking scheme to pick the next guess from the eligible words. This implementation allows for the easy construction of a variety of ranking schemes. For more about the project, please view this GitHub repository.

NER on Harry Potter Novels

April - May 2022

For my final project for COSI 217B, my group performed Named Entity Recognition on text from the Harry Potter novels. We created an ontology, annotated the seven Harry Potter novels, and tested how well a CRF-based NER model performed at recognizing named entities in the Harry Potter universe. 

Parameter Prediction for the SIR Model

April - May 2022

To make a prediction with a model, we generally know some information from the past and want to use the model to predict some information for the future. To use the SIR model directly, both the initial conditions and model parameters need to be known which is unrealistic. For my final project for MATH 162A my group analyzed the performance of various algorithms at optimizing the parameters of the SIR model given some prior information on the progression of the disease. 

Website Rating Prediction

November - December 2021

The goal of my group's Math 122A final project is to design a model that predicts the ratings that users would give to products based on their histories of website engagement. The model will predict the ratings 4,500 users would give to 75 different products. We ended up using an ensemble technique of training a unique model for each user based on their website history.

Battleships: Python Tutorial and Project

July - August 2021

I made a programming project that has a student implement a simple two player battleships game in Python. Along with the project instructions and starter kit there is a Python tutorial for someone new to programming that walks through the Python topics necessary to complete the project. To read about it, please see the GitHub repository.

Analyzing Count Five Test for Equal Variances

April - May 2021

For my final project for MATH 36B my group analyzed the performance of the Count Five Test for equal variances. Here you will find our implementation of the test, our results of simulations conducted, as well as links to the paper and Google Colaboratory notebook where the simulations were conducted.


Kaggle Riiid! Answer Correctness Prediction

December 2020

For my COSI 123 fall 2020 term project, I worked with two of my fellow classmates Eitan Joseph and Caroline Wang on the Riiid! Answer Correctness Prediction Kaggle competition. In the competition, we worked to predict the probability of a student answering a question properly based on their exam history up to answering the question. Over the course of the project, we learned a great deal about applying machine learning techniques to time series predictions which involved learning about feature leakage. 

Our team name is CCE and we scored 0.764 on the private test set as can be seen on the leaderboard.

SIR Model Plot Generator

December 2020

The purpose of this notebook is to allow for qualitative analysis of the Kermack-McKendrick SIR Model for epidemics are part of my final project for MATH 37A: Differential Equations at Brandeis University. The user is able to enter 4 parameters to control the disease spread and then will see two plots (a vector field and 3 numerical time plots) so that they can qualitatively observe the impact of the epidemic.


Standard I/O Homework Utility

October 2019 - January 2020

During the semester that I was a TA for Advanced Programming Techniques at Brandeis University, I realized that many of the course's programming assignments used console based input and output (through Scanner and System.out in Java). This made grading the assignments somewhat tedious as TAs had to try various keyboard inputs and then compare the output using a diffchecker online. To help solve this issue, I decided to abstract and standardize console testing into a single file future TAs would have to import into their projects when writing JUnit tests for assignments. That way, the work of editing system input/output streams is already handled and the TAs can focus on developing test cases for the assignment. 


I plan to use this project in future courses that I TA at Brandeis because it abstracts a portion of the tests away and makes the creation of test cases easier even if other course's assignments may not be as reliant on console input / output. 


As of now, the project is in a private GitHub repository.

Music Tablature Project

May - July 2019

The primary purpose of this project was to provide a program that takes an ASCII text file holding a bass guitar tab and converting it into sheet music. The sheet music is stored in an HTML file and can be displayed in a browser. For more information, please visit the linked GitHub repository and look at the README.md file.