This page describes smaller projects that I did on my own (outside of school), from courses (sometimes in groups), and from my time working as a teaching assistant. For research projects see Research and Work Experience.
There are additional projects on my GitHub that range from educational resources (e.g. useful code snippets, materials from online courses), to small experiments investigating different programming languages, to full tools and frameworks that have had users (e.g. grading, plagiarism detection).
Table of Contents
Grading Administration Utility
Enhanced Wrapper for MOSS
GitHub Scraping Tool
Differential Privacy Analysis of Kaggle Dataset
Advanced HTTP Proxy
Extracting Binary Trees from Tree Drawings
DALPy
Wordle Solver
NER on Harry Potter Novels
Parameter Prediction for SIR Model
Website Ratings Prediction
Battleships: Python Tutorial and Project
Analyzing Count Five Test for Equal Variances
Kaggle Riiid! Answer Correctness Prediction
SIR Model Plot Generator
Standard I/O Homework Utility
Music Tablature Project
August 2023 - May 2024, Tufts University
This is a command line version of a previous, web-based Tufts Computer Science Department grading utility known as hitme. This utility allows TAs to be randomly assigned students, TFs (senior TAs) to supervise grading progress, and TAs to perform backups of students' submissions from Gradescope onto Tufts EECS servers. The software is available at this GitHub repository and on PyPI. This tool is used by Tufts CS 15 course staff.
September 2022 - May 2024, Tufts University
moss.py is designed for CS 15 and enables better use of the Measure of Software Similarity (MOSS) plagiarism detection software. It offers integration with the Tufts Computer Science Department "provide" and "grade" frameworks, provides a more user friendly interface, enables downloading of MOSS results, and is an improvement over Gradescope's version which allows only a single assignment's worth of submissions. The software is available at this GitHub repository and on PyPI. This tool is used by Tufts CS 15 course staff.
October 2022 - May 2024, Tufts University
The script github_scraping.py can be used for scraping GitHub repositories based on a certain keyword. This is useful when collecting potential solutions to assignments in a course. The script allows one to scroll through the files that match a particular search query and decide whether or not the file is considered a match for a course assignment. See this GitHub repository or PyPI for more information. This tool is used by Tufts CS 15 course staff.
February - April 2024, Tufts University
I worked on a group project that looked into the Kaggle Open University Learning Analytics Dataset. We conducted exploratory data analysis to discover trends in how student performance varied across regions, ages, and amount of interactions with online resources. The primary purpose of this project was to implement differential privacy to protect student privacy. The code is available on this GitHub repository.
November - December 2022, Tufts University
Matt Zhou and I built a proxy that supports standard HTTP GET and CONNECT communication over Transmission Control Protocol (TCP) for multiple clients. Each client who communicates with the proxy is serviced sequentially. The proxy maintains a finite capacity RAM cache of recently requested resources. The proxy also maintains a set of banned hostnames where any requests that a client sends to those hosts are rejected. Another feature of our proxy is byte rate limiting.
The proxy also makes some additional performance optimizations by reusing client and server TCP connections. In particular, a client socket is kept open and a client can send multiple requests to the proxy over the same socket. The proxy will service one request at a time and service future requests only after returning the responses for earlier requests. The proxy will also keep a finite number of server sockets open if they are not in use. These sockets can be used by different clients who are interested in the same host and port. However, this capability is not supported for HTTPS communication. The proxy will also modify any requested HyperText Markup Language (HTML) pages by coloring any links green if the linked resource is fresh in the cache.
The proxy has a separate mode that supports HTTPS communications over TCP. Most of the HTTPS functionality is implemented through the OpenSSL library. In HTTPS mode, the proxy establishes a secured SSL session with the client and another secure SSL session with the target server. The proxy achieves this “man in the middle” status through dynamic certificate generation. The HTTPS mode helps provide extra security and protection for the client, and it also allows the proxy to decrypt and examine the HTTPS traffic going through the proxy. With this, the proxy is able to support caching, rate limiting, filtering, and other functionality with HTTPS traffic. Note that in HTTPS mode, server TCP connections are closed after a client finishes its communication, that is, we do not reuse server TCP connections in HTTPS mode. In addition, the proxy also supports CONNECT tunneling in the default HTTP mode. The client can utilize this mode simply by sending a CONNECT header to the proxy.
This project is currently in a private GitLab repository. Please contact me for more information.
November - December 2022, Tufts University
Ryan Polhemus and I built a model for constructing the binary trees that correspond to drawings of trees in images. One application of such a model would be to automatically grade questions that ask the respondent to draw a binary tree. For example, one could ask the respondent to draw an example of a binary search tree. The model would read in the image the student submits and produce a unique mathematical representation of the tree. This representation could be passed to another program that would determine if the tree is indeed a binary search tree and assign the student their score. We believe that software like this could be integrated into existing educational software (e.g. Gradescope) to expand its autograding functionality for computer science courses. See this GitHub repository for more information.
July - August 2022
This project examines different ways of solving the Wordle game. The project has two components. The first is the WordleSolver which tracks which words are eligible based on the feedback from previous guesses. The second component is the ranking scheme. The WordleSolver uses the ranking scheme to pick the next guess from the eligible words. This implementation allows for the easy construction of a variety of ranking schemes. For more about the project, view this GitHub repository.
March 2021 - May 2022, Brandeis University
DALPy is a Python module for learning data structures and algorithms. It is based on the textbook Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. This library was made specifically for administering and grading assignments related to data structures and algorithms in computer science. The software was used in the spring 2022 semester at Brandeis University in COSI 21A Data Structures and the Fundamentals of Computing. To see the software, view the project's PyPI and GitHub. To read about the project and its motivations, read this article in the Brandeis Justice newspaper. This project was formerly known as Cormen-Lib.
April - May 2022, Brandeis University
For my final project for COSI 217B, my group performed Named Entity Recognition on text from the Harry Potter novels. We created an ontology, annotated the seven Harry Potter novels, and tested how well a CRF-based NER model performed at recognizing named entites in the Harry Potter universe. The work for this project is in a private repository.
April - May 2022, Brandeis University
To make a prediction with a model, we generally know some information from the past and want to use the model to predict some information for the future. To use the SIR model directly, both the initial conditions and model parameters need to be known which is unrealistic. For my final project for MATH 162A, my group analyzed the performance of various algorithms at optimizing the parameters of the SIR model given some prior information on the progression of the disease. The work for this project is in a private repository.
November - December 2021, Brandeis University
The goal of my group's Math 122A final project was to design a model that predicts the ratings that users would give to products based on their histories of website engagement. The model predicted the ratings 4,500 users would give to 75 different products. We ended up using an ensemble technique of training a unique model for each user based on their website history. The work for this project is in a private repository.
July - August 2021
I made a programming project that has a student implement a simple two player battleships game in Python. Along with the project instructions and starter kit there is a Python tutorial for someone new to programming that walks through the Python topics necessary to complete the project. To read about it, please see the GitHub repository.
April - May 2021, Brandeis University
For my final project for MATH 36B my group analyzed the performance of the Count Five Test for equal variances. Here you will find our implementation of the test, our results of simulations conducted, as well as links to the paper and Google Colaboratory notebook where the simulations were conducted.
December 2020, Brandeis University
For COSI 123, I worked with Eitan Joseph and Caroline Wang on the Riiid! Answer Correctness Prediction Kaggle competition. In the competition, we worked to predict the probability of a student answering a question properly based on their exam history up to answering the question. Over the course of the project, we learned a great deal about applying machine learning techniques to time series predictions which involved learning about feature leakage.
Our team name is CCE and we scored 0.764 on the private test set as can be seen on the leaderboard.
December 2020, Brandeis University
The purpose of this notebook is to allow for qualitative analysis of the Kermack-McKendrick SIR Model for epidemics. This was done as part of my final project for MATH 37A. The user is able to enter 4 parameters to control the disease spread and then will see two plots (a vector field and 3 numerical time curves) so that they can qualitatively observe the impact of the epidemic.
October 2019 - January 2020, Brandeis University
During the semester that I was a TA for COSI 12B, I realized that many of the course's programming assignments used console based input and output (through Scanner and System.out in Java). This made grading the assignments somewhat tedious as TAs had to try various keyboard inputs and then compare the output using a diffchecker online. To help solve this issue, I decided to abstract and standardize console testing into a single file future TAs would have to import into their projects when writing JUnit tests for assignments. That way, the work of editing system input/output streams is already handled and the TAs can focus on developing test cases for the assignment.
The project is in a private GitHub repository.
May - July 2019
The primary purpose of this project is to provide a program that takes an ASCII text file holding a bass guitar tab and converting it into sheet music. The sheet music is stored in an HTML file and can be displayed in a browser. For more information, please visit the linked GitHub repository and look at the README.md file.