Computational Ethics of Natural Language Processing (Data Delineation For Foundation Models):
Teammate(s): Paymon Haddad, Sheriff Issaka, and Preetham Pangaluru
Description: Based on the Trust No Bot presentation given earlier in the course, we advocate for a data delineation framework that distinguishes all data used in the context of foundation model training into 3 distinct categories: (1) no training, (2) internalized training, and (3) general training, with category 1 containing the most sensitive data, and category 3 containing the least sensitive data. We advocate that any data that is used to train or fine-tune machine learning systems must first be categorized into 1 of these 3 buckets in accordance with the terms of the regulations (which we define below in detail). In this perspective piece, we will (1) motivate the need for our proposed policy, (2) discuss the existing policy landscape pertaining to this area and how our policy fits into this landscape (3) provide a detailed account of the policy itself (what determines how data is binned into each of these categories), and finally (4) discuss recommendations for enforcing our proposed policy.
Large-scale Machine Learning (FedLandscape: Exploring Distributional Shifts, Data Heterogeneity, and Spurious Correlations in a Federated Learning Setup):
Teammate(s): Abdolrahim Arjomand, Mrinal Anand, and Nilay Naharas
Description: The decentralized architecture of federated learning presents significant challenges in machine learning, particularly concerning data heterogeneity and model robustness. This paper outlines a series of experiments aimed at addressing these challenges through innovative approaches. The first experiment introduces the Fed-Energy mechanism to detect distributional shifts and identify out-of-distribution samples, enhancing the model's ability to adapt to changing data environments. The second experiment implements scaled weights to mitigate the impact of heterogeneous data, ensuring a balanced contribution from all devices involved. The third experiment emphasizes improving fairness among participating devices, thereby reducing biases introduced by varying data sizes and distributions. Finally, the fourth experiment focuses on minimizing the effects of spurious correlations and irrelevant characteristics in federated learning landscapes. These experiments collectively aim to mitigate the adverse impacts associated with decentralized learning, ultimately enhancing the effectiveness and reliability of federated learning systems in real-world applications.
Language(s): Pytorch
Artificial Life for Computer Graphics and Vision (Controllable Humanoid and Ant using Reinforcement Learning with Diverse Rewards):
Teammate(s): Benet Oriol Sabat, Avalon Vinella, Mohsen Fayyaz, and Chae Yeon Seo
Description: In this project, we explore the application of reinforcement learning (RL) to train controllable humanoid and ant agents that can switch between different movement modes and styles. Using the Gymnasium simulation environment and the Soft Actor-Critic algorithm (SAC), we developed models capable of executing a variety of locomotive tasks with diverse rewards. The trained agents can adapt to real-time user inputs, offering a flexible and robust simulation for autonomous locomotion.
Language(s): Pytorch
Foundation Models for Autonomous Agents (OccFlowNet: Optical Flow Estimation leveraging 3D Occupancy Prediction):
Teammate(s): Wenlong Yi, and Sraavya Pradeep
Description: The development of deep neural networks has significantly improved the accuracy of optical flow estimation, benefiting applications such as video analysis, autonomous driving, and navigation. However, it is important to recognize that the existing methods do not incorporate 3D reconstruction of the surrounding environment for this estimation. In this project, we introduce OccFlowNet, an innovative approach that combines OccNeRF and FlowNet to estimate optical flow. By integrating OccNeRF and FlowNet, OccFlowNet leverages 3D occupancy prediction as a key component in its methodology, offering a unique and promising solution for accurate optical flow estimation. Our evaluations on the Nuscenes-Mini dataset demonstrate that OccFlowNet outperforms FlowNet, showcasing its superior performance in optical flow estimation.
Language(s): Pytorch
Advanced Data Mining (Leveraging Large Language Models and Topic Modeling for Toxicity Classification):
Teammate(s): Christina Chance, Claire Huang, Margaret Capetz, Elizabeth Eyeson
Description: Content moderation and toxicity classification represent critical tasks with significant social implications. However, studies have shown that major classification models exhibit tendencies to magnify or reduce biases and potentially overlook or disadvantage certain marginalized groups within their classification processes. Researchers suggest that the positionality of annotators influences the gold standard labels in which the models learned from propagate annotators' bias. To further investigate the impact of annotator positionality, we delve into fine-tuning BERTweet and HateBERT on the dataset while using topic modeling strategies for content moderation. The results indicate that fine-tuning the models on specific topics results in a notable improvement in the F1 score of the models when compared to the predictions generated by other prominent classification models such as GPT-4, PerspectiveAPI, and RewireAPI. These findings further reveal that the state-of-the-art large language models exhibit significant limitations in accurately detecting and interpreting text toxicity contrasted with earlier methodologies.
Language(s): Pytorch
P. S. Course assignments are available at this link.
Computer Animation (Pool Game):
Teammate(s): Benet Oriol Sabat, Avalon Vinella, and Mohsen Fayyaz
Description: In this project, we present a comprehensive simulation of a pool game wherein users are empowered to manipulate shot parameters such as angle and velocity. The system employs various techniques, including articulated body dynamics and inverse kinematics, to accurately replicate real-world shot execution. Furthermore, our simulation integrates advanced physics dynamics, encompassing aspects such as ball movement, rotation, and collision behavior. Notably, when a ball successfully lands in a pocket, it triggers a visually captivating explosion effect driven by meticulously designed velocity fields. Throughout this report, we examine each aspect of our simulation, providing detailed insights into its design and implementation.
Language(s): JavaScript
P. S. Course assignments are available at this link.
Real-time Systems (Real-time display of multiple time zones):
Teammate(s): Sara Khosravi, and Matina Mehdizadeh
Description: In this project, we have 4 threads that each thread displays a different time zone. The threads have different priorities and are scheduled using FIFO.
Language(s): Java
Principals of Image Processing:
Description: This course had multiple projects that are available on GitHub. I selected two of my favorites to discuss here.
1. Active Contours: Active contour model, also called snakes, is a framework in computer vision for delineating an object outline from a possibly noisy 2D image. The snakes model is popular in computer vision, and snakes are widely used in applications like object tracking, shape recognition, segmentation, edge detection, and stereo matching.
2. Morphing: Morphing is a special effect in motion pictures and animations that changes (or morphs) one image or shape into another through a seamless transition.
Language(s): Python (Numpy, and CV2)
Systems Analysis and Design (Fuel App):
Teammate(s): Matina Mehdizadeh, Sara Khosravi, and Sepehr Amini Afshar
Description: In this project, we developed a website for fuel delivery. The customer requests a delivery and the fuel man comes to the destination and delivers the fuel.
Language(s): Python, Django, CSS, HTML, and JavaScript
Bioinformatics (Acute Myeloid Leukemia Microarray Analysis):
Description: For this project, I used the GSE48558 dataset. I did some preprocessing on the data and dimension reduction, and I selected genes with high and low expressions in AML. I then analyzed the pathways and gene anthologies.
Language(s): R
Machine Learning:
Teammate(s): Reza Amini Majd, and Alireza Shaterian
Description: This project consisted of two phases. Phase 1 was about determining whether a patient needs to be kept in ICU or not. Phase 2 was about sarcasm detection on twitter's data.
Language(s): Python (Sklearn, Numpy, Pandas, and Pytorch)
Phase 1: Notebook
Phase 2: Notebook 1 - Notebook 2
Numerical Computation (Paper implementation - NLP):
Teammate(s): Sepehr Amini Afshar
Description: In this project, we implemented the method described in "Application of Doc2vec and Stochastic Gradient Descent algorithms for Text Categorization".
Language(s): Python (Numpy, Sklearn, and NLTK)
Embedded Systems (Smart Door Lock using QR Code):
Teammate(s): Seyede Saba Hashemi, and Parham Saremi
Description: In this project, we developed and designed a smart door lock that authenticates the entrance of users. The door opens by scanning the QR code shown on an OLED using the mobile app that we developed.
Language(s): C (Arduino)
Computer Architecture (Processor):
Teammate(s): Zahra Yousefi Jamarani, Kimia Noorbakhsh, and Tarlan Bahadori
Description: In this project, we implemented a processor that works with both integer and floating-point inputs. We used pipelining method to increase its efficiency.
Tool(s): Quartus
Compiler (Decaf Compiler):
Teammate(s): Masih Eskandar, and Farzam Zohdi Nasab
Description: Compiler for Decaf Programming Language. Developed as a course project for the Compilers course. Decaf is a strongly-typed, object-oriented language with support for inheritance and encapsulation. By design, it has many similarities with C/C++/Java, so you should find it fairly easy to pick up. But it is not an exact match to any of those languages. The feature set has been trimmed down and simplified to keep the programming projects manageable.
Language(s): Python, and Assembly
Computer Structure and Language (CFG for PTX):
Teammate(s): Mohammad Farahani
Description: In this project, we created the control flow graph for the codes written in the PTX assembly language.
Advanced Programming (Deulyst):
Teammate(s): Zahra Yousefi Jamarani, and Seyede Saba Hashemi
Description: Duelyst is a free, fair, competitive tactics game. Inspired by this game, we developed a modified version of this game in our course project. This project was awarded Best Project in Advanced Programming, in the year 2018, fall semester.
Language(s): Java
Fundamental Programming (Alter Tank):
Description: Tank Trouble is an online tank game where you drive in a maze and shoot missiles at your enemies. Alter Tank is a modified version of this game. In this project, I developed a similar project with some differences in visualizations and agents' abilities.
Language(s): C
Open-weight Soccer Robot:
Teammate(s): Nazanin Yousefian, and Anahit Nassaj Yazdi
Supervisor: Faramarz Daemi
Description: In this robotic competition league, 2-on-2 teams of autonomous mobile robots play in a highly dynamic environment, tracking a special light-emitting ball in an enclosed, landmarked field.
Language(s): C (Code Vision)