Content: Explore the world of deep learning, from LLMs to image diffusion models. Our project group covers fundamentals and state-of-the-art technologies, including neural networks, CNNs, transformers, and diffusion models. You will be learning about key components of deep learning from our lessons and will doing fun deep learning projects. See the MLDS website for more details and join the MLDS Discord for updates.
Prerequisites: Basic knowledge of Python or programming is recommended to do the projects.
Group Type: Project-based
Mentors: Justin Sasek, Harshal Bharatia
Curriculum
Intro to Machine Learning and Neural Networks (3 weeks)
Intro to Computer Vision (CNN) (2 weeks)
Intro to Transformers (3 weeks)
Miscellaneous - Diffusion, VAE, RL, etc. (3 weeks)
Time/Location: Every Monday at 5 pm. Our first meeting will be on 2/3 in GDC 4.304.
Content: Hi! In this reading group my goal will be to help mentees gain a solid broad understanding of key concepts in NLP (field that ChatGPT came out of), while developing the ability to understand and figure out the strengths and weaknesses of machine learning papers. I like to customize the papers / format based on the mentees interests and background, but if you have any interest in this stuff I hope you'll find it a useful experience! (I'll buy everyone pizza at the end)
Prerequisites: Ideally a good understanding of how to write and understand code. ML background, or NLP background could also be helpful (Recommended)
Group Type: Reading-based
Level: Beginner or Advanced
Mentors: Prasann Singhal
When2Meet: https://www.when2meet.com/?28645276-Z5wyO
Content: Reinforcement Learning has had many recent advances from powering OpenAI's o1 model to achieving superhuman performance in many perfect and imperfect information games. If you want to learn more about how this paradigm works and build projects, this is the DiRP for you!
Prerequisites: Some basic knowledge about the RL formalism, and some Python experience (Recommended)
Group Type: Reading and Project group hybrid
Level: Beginner
Mentors: Sarthak Dayal
When2Meet: https://www.when2meet.com/?28645288-1IeN8
Content: It's pretty hard to get AI models to act the way we intended, and as models get more powerful, this can be really dangerous. The field of AI Alignment and Safety is dedicated to reducing the risks, and some research includes understanding what models are thinking about, evaluating different methods of oversight, or steering and controlling behaviors. We think this is one of the most important problems of our time.
Prerequisites: You've read AI safety papers/LessWrong or have other AI-related experience (Recommended)
Group Type: Reading-based
Level: Advanced
Mentors: John Dunbar
When2Meet: https://www.when2meet.com/?28645293-KnHqI
Content: We will go over the new trends in natural language reasoning LRLMs (Large Reasoning Language Models) like o1. To best understand these systems we will read about the fundamentals that lead to this type of training and "reasoning" like CoT and inference time compute.
Prerequisites: Basic understanding of language models (words -> tokens -> transformer -> probabilities over next token) - Should know some basics but not specifics
Group Type: Reading-based
Level: Beginner
Mentors: Zayne Sprague
Curriculum: https://drive.google.com/file/d/1SUdEE0FmES2QsnzgmhJ2nZ0SZcB1OnAx/view?usp=sharing
When2Meet: https://www.when2meet.com/?28645315-flymc
Content: Let's try to understand how LLMs like ChatGPT work, what exact math operations are done to generate text. This will be the most useful in the future if you want to research improving parts of the Transformer architecture or compare new architectures, not as useful for just using LLMs. We will read relatively few papers since the content takes time to understand.
Prerequisites: matrix multiplication, algebra, have used ChatGPT at least once (recommended)
Group Type: Reading-based
Level: Beginner
Mentors: Rishi Astra
When2Meet: https://www.when2meet.com/?28645321-4tdBp
Content: This group explores the basics of computer vision and generative models (including autoencoders, GANs, as well hybrid models such as VAE-GANs). We will start with fundamental computer vision concepts such as image representation, filtering, convolutions, etc. and work our way up to state-of-the-art models currently being used today. This will be a reading-based group and will function more as a collaborative study group than a class, especially later in the semester.
Prerequisites: No strict prerequisites, but familiarity with high-level machine learning concepts might be useful, plus some basic knowledge of linear algebra (e.g., matrix multiplication) and stats (again, not strict; if required we can spend time reviewing these or I will recommend resources to get up to speed). (recommended)
Group Type: Reading-based
Level: Beginner
Mentors: Sanika Nandpure
Curriculum: https://docs.google.com/document/d/18TEHXget6_Ez1xREaT2ghIFFf_AotcDgLO5Px6AApkI/edit?usp=sharing
When2Meet: https://www.when2meet.com/?28647997-DyxQm
Content: Interested in designing and training language models? Modern Language models depend on massive training sets. Can we do better? Here we can explore whether it is possible to train language models under more modest resources. Can we get a "12-year old LM" with a reduced vocabulary? We can start with replicating and extending existing papers, and try strategies like new tokenization schemes and curriculum learning.
Prerequisites: Experience with Python, interest in NLP and reading papers, and experience with huggingface, pytorch, etc. would be helpful but not required.
Group Type: Project-based
Level: Advanced
Mentors: Juan Diego Rodriguez
Curriculum: https://docs.google.com/document/d/117foioXfzX0So_mgZmdw9_ZxZZy61cYghahyNOKHDuw/edit?tab=t.0
When2Meet: https://www.when2meet.com/?28668910-edIMz
Content: Reasoning methods in large language models (LLMs) are an exciting new area of scaling which yield incredible performance. I am interested in applying RL based "reasoning" methods to large transformers used in physics applications as opposed to LLMs. In general, modern LLM inference methods have yet to break into the domain of large transformers operating on continuous valued inputs.
Prerequisites: linear algebra (must have), neural networks, basic probability, coding
Group Type: Reading-based
Level: Advanced
Mentors: Jeffrey Lai
When2Meet: https://www.when2meet.com/?28672854-X62Pp
Content: We'll be exploring a variety of topics around systems/architecture, guided by your input. If you want to learn more about or do a project related to anything in the world of low-level computing, this is the group for you.
Prerequisites: An interest in low-level computer science
Group Type: Project Group
Level: Beginner
Mentors: Noah Klayman
When2Meet: https://www.when2meet.com/?28645331-eamjZ
Content: If you are interested in advanced topics in architecture/systems, this is the group for you! In the past we've covered things like modern file systems, alternative virtual memory implementations, circuit design, and more.
Prerequisites: Taken Arch + OS (required)
Group Type: Reading Group
Level: Advanced
Mentors: Noah Klayman
When2Meet: https://www.when2meet.com/?28645340-SN7fF
Content: In my Distributed Systems groups we'll look in depth at large-scale technologies. We'll explore key research papers, engage in meaningful discussions, and learn together about topics like scalability, consensus algorithms, and building reliable systems.
Prerequisites: Preferably taken OS (Recommended)
Group Type: Reading-based
Level: Advanced
Mentors: Nathan Berry
Curriculum: https://docs.google.com/document/u/0/d/1fyzDEg4WZ4kr3PWI2CefF8J6GpDAuXtc2XBg8G9DE7M/mobilebasic
When2Meet: https://www.when2meet.com/?28645361-yeBoF
Content: How safe are modern computer systems? Come find out how to break computers and think like an attacker!
Prerequisites: Taken 429/439 (Recommended)
Group Type: Reading-based
Level: Beginner
Mentors: Guilherme Amaral
Curriculum:
Memory Errors: The Past, the Present, and the Future by Cavallaro et. all
When2Meet: https://www.when2meet.com/?28645377-W7eGw
Content: This DiRP will focus on developing a fast solver for the Poisson equation in 2D using the Nonuniform Fast Fourier Transform (NUFFT). We will start by implementing a finite difference solver on a uniform grid, which we will accelerate using the Fast Fourier Transform, and then extend to non-uniform computational domains. Participants will gain hands-on experience with numerical methods while deepening their understanding of the Fourier transform and differential equations.
Prerequisites: Linear Algebra (Required), Python, Git, Differential Equations (Recommended)
Group Type: Project-based with relevant papers assigned as readings
Level: Advanced
Mentors: Gabriel Kosmacher
Curriculum: https://drive.google.com/file/d/1Le3cDpoMD80UbQtlEGjVTMZSlTqUdqna/view?usp=sharing
When2Meet: https://www.when2meet.com/?28645381-9vTih
Content: A metric space consists of a set of points and distances between pairs of points; some common metrics include Euclidean space or the distance on an unweighted graph. Many algorithms only perform correctly on inputs that are members of some well-understood metric like Euclidean space, so the first step in many algorithms is to map an arbitrary input metric to some other well-understood space without “messing up” the distances too much. There is a wide variety of possible target spaces, definitions of distortion (aka how messed up the distances are), and other variations of this problem, so while it is an old problem there is a lot of recent interesting work on the topic.
Prerequisites: Strong understanding of undergrad algorithms and discrete math (Required)
Group Type: Either Reading or Project based
Level: Advanced
Mentors: Kristin Sheridan
Curriculum: https://drive.google.com/file/d/1LwPhco0vAERt1ecfPhxRdI7yXqzKUTAe/view?usp=sharing
When2Meet: https://www.when2meet.com/?28645388-JIuJq
Content: Visualizing results of simulations of complex physical phenomena is a challenging and often overlooked area. There is a need to visualize fields such as displacement and temperature on higher-order (quadratic, cubic, etc) bspline meshes that is not currently offered by open-source tools such as Paraview, a popular visualization GUI. This group will focus on addressing this need by developing proof of concept tools for higher-order bspline mesh visualization and later launching an open-source tool for this purpose.
Prerequisites: Completion of calculus sequence, M340L (Required), preferably M427L and M427J. Experience with C++ (recommended)
Group Type: Project-based
Level: Beginner
Mentors: Kenneth Meyer
When2Meet: https://www.when2meet.com/?28645398-nq6we
Content: Data processing frameworks (e.g., Apache Spark and Flink) provide built-in support for user-defined aggregation functions (UDAFs). However, for these frameworks to support efficient UDAF execution, the function needs to satisfy a homomorphism property. In this project, we will explore and develop Ink, a program synthesis-based tool that verifies this property, enabling efficient execution in PySpark pandas UDAF for data science.
Prerequisites: CS311H or CS311: Discrete Math; programming in Rust (required)
Group Type: Project-based
Level: Advanced
Mentors: Ziteng Wang
When2Meet: https://www.when2meet.com/?28645408-Lxe4u
Content: As a programmer, we interact with programming languages every day, but we don't have good methods for writing programs that have provable guarantees about it. Program synthesis and verification allow us to automatically generate correct code and check if our code meets these properties respectively. This group will cover material from CS 345H and CS 393P if you were interested in taking those courses, which weren't offered this year.
Prerequisites: some familiarity with proofs and induction (recommended)
Group Type: Reading-based
Level: Beginner
Mentors: Aaryan Prakash
Curriculum: We have some flexibility to choose topics that work for the mentees, but I was planning on mostly focusing on paper reading in program synthesis and verification with one paper a week. Depending on how familiar people are with PL, I was also thinking about an introduction to various semantics that were taught in Dr. Bornholt's class previously (maybe a week or two).
When2Meet: https://www.when2meet.com/?28645415-hX7Om
Content: Everyone has used Copilots and LLMs to generate code, and once we start using it too often it is not hard to see some subtle bugs which go unoticed because LLM misses some corner cases. It might work on certain test cases we proposed, but still may be incorrect. Can we make sure that the code generated by AI is provably correct? There are of tools called Interactive Theorem Provers built for mechanically checking proofs written in a formal language, how can we leverage these to prove the correctness of code generated by AI. This problem is at the heart of automating generation of code for big software projects via AI. Some interesting things to learn and we can collaborate to solve some real-world problems.
Prerequisites: Knowledge about python programming. Basic understanding of LLMs and how to use them. Used LLMs (VS Code Copilot) in past to write code is a plus. Knowledge about Interactive Theorem Provers is a plus. (recommended)
Group Type: It will be a reading group for the first half and then a research project towards the end which we plan to submit to an AI/ML conference.
Level: Beginner
Mentors: Amitayush Thakur
When2Meet: https://www.when2meet.com/?28645426-z4yIo
Content: Join us for a weekly roundtable where we dive into the latest weekly tech news. We will cover anything that seems interesting, whether it be JS, Rust, Zig, projects, drama, technical deep dives; whatever floats our boat. Come join us, and bring topics of discussion! It's party time.
Prerequisites: Any range of experience, and willing to learn/discuss more intricate topics in addition to generalities.
Group Type: Project and Reading-based
Level: Advanced
Mentors: Elie Soloveichik
When2Meet: https://www.when2meet.com/?28645437-MZVfS
Content: Cardiovascular modeling is a very interesting area of research where mathematics, computer science, and domain knowledge become tightly intertwined. The pipeline to go from raw data to a polished simulation requires many pieces that must fit together in the correct way. Depending on the group's interest, we can target different areas of the pipeline such as: using computer vision on raw data to generate a mesh, creating neural networks that can predict how a heart will beat, or improving the visualization of these simulations.
Prerequisites: Completed Calculus Sequence (Required), Python would be beneficial
Group Type: Project-based
Level: Advanced
Mentors: Benjamin Thomas
Curriculum: https://docs.google.com/document/d/11ZvoOFWCIfXCcLaMOEJC-U8ELFzSbxiJtWRBALFSi24/edit
When2Meet: https://www.when2meet.com/?28645441-g9g57
Content: The Goal is to provide tutorials so students learn how to manipulate basic linux tools that are used by modern software engineers in order to diagnose real problems efficiently. Students will become familiar with Linux, Docker, Kubernetes, Measuring for Research, Version Control, CI/CD. The final project will be to deploy the KernMLOps tool on Arch Linux on a baremetal machine.
Prerequisites: None.
Group Type: Project-based
Level: Beginner
Mentors: Aryan Khatri
When2Meet: https://www.when2meet.com/?28647921-gzuHm