Supercomputing on your desktop:

Programming the next generation of inexpensive massively parallel hardware using CUDA.

Can someone get a PhD in Computational Sciences in 2 years using inexpensive massively parallel hardware and CUDA ?

Faculty: Prof. Steven G. Johnson
Instructor: Nicolas Pinto (
TAs: Nicolas Poilvert (, Justin Riley (


Lectures: Mon, Wed, Fri, Jan 9-30, 10am-12:00pm
Hands-on: Mon, Wed, Fri, Jan 7-30, from 2pm
Free Lab Hours: Tues, Thur, Jan 8-29, from 2pm

Pre-register on WebSIS and/or attend first class.
Limited to 30 participants (for credits)
Listeners allowed, space permitting
Prereq: familiarity with programming
Level: U/G 3 units Standard A - F Can be repeated for credit   


Can someone get a PhD in Computational Sciences in 2 years using cheap massively parallel hardware and CUDA ?

With the generous contribution of $70 000 from NVIDIA, the Rowland Institute at Harvard, and MIT (OEIT, BCS, EECS) students in this IAP will have the opportunity to find out.

The goal of this class is to provide students with extensive hands-on experience in parallel programming on modern commodity graphics hardware. Programming these cheap
massively parallel hardware requires some specific knowledge about parallel programming concepts and models (threading, communication, and memory) that we'll cover in the course. Hands-on training with high-end hardware will provide the student with the necessary experience to fully design and develop the final project of his/her choice (and preferentially one that will significantly help him/her in his/her research). The only requirement is that the project must involve a very demanding application such as an intensive mathematics- or physics-based simulation or other data-intensive computation.

On the hardware side, we'll provide the students with up to 30 theoretical Teraflops (and a personal high-end MacBook Pro per project).

Targeted audience: 

Undergraduate and graduate students from all departments with a need to drastically speed up their computationally intensive research.

Ultimately, we would love to have a broad impact on the MIT community by introducing supercomputing on cheap commodity hardware. Our goal is to help researchers of any level to take advantage of a disruptive technology that
holds great promise for drastically accelerating computational sciences and thus generating new experimentally-testable hypotheses.


Nicolas Pinto (, Nicolas Poilvert (, Justin Riley (

Lecture topics:

- General introduction
- Parallel Programming Concepts and Design Patterns
- Theoretical Analysis of Parallel Algorithms (Amdahl's law, Memory Hierarchy Models, Communication Models, etc.)
- GPU Computing History
- Introduction to CUDA
- CUDA Hardware
- CUDA Programming
- CUDA Threading Model
- CUDA Communication and Memory Model
- CUDA Performance Monitoring and Optimizations
- Tricks of the Trade
- Case Studies: CUDA Applications
- Case Studies: CUDA Application Performance Insights
- Guest speakers from academic institutions (MIT, Harvard, etc.)
- Guest speakers from NVIDIA
- The Future

Recitation/Lab topics:

- Development environment set-up
- Software engineering
- Tricks of the Trade
- Tools of the Trade
- Developping / Debugging
- Profiling / Optimizing
- Project development

Some Assignments:

1. simple matrix multiplication
2. optimized matrix multiplication contest
3. theoretical analysis of simple algorithms
4-5. optimized {image convolution, prefix scan, parallel sort, SVD, etc.} contest