"Please note that open source code with documentation was an important class requirement ;-)" - Nicolas

Pedestrian Detection in High-Definition Video Frames

Gerald Dalley, Geza Kovacs, Krista Ehinger, Jim Sukha

{presentation} {report} {code}

In problem domains such as space planning, emergency response, and physical security, it is important to know where people are and what is happening within a region of space.  One approach to this problem is to use one or more video cameras to monitor a scene and detect people in the spatial region.  A common way of solving this problem is to build a detector that checks for the presence of a person at each pixel location.  The detector of Dalal and Triggs, based on histograms of spatial image gradients, yields good results.  Unfortunately it takes on the order of a minute to process a single video frame.  The most time-consuming portions of the algorithm are convolutions, dot products, and memory permutations.  We have ported key portions of the algorithm to the GPU in an effort to significantly improve the runtime performance.

Biologically Inspired Object Recognition

Sharat Chikkerur

{presentation} {report} {code}

Visual processing in the human brain is roughly organized into two parallel streams. The ventral stream is responsible for object recognition and identification ("What?" stream) and the dorsal stream is responsible for motion and position estimation ("Where stream"). CBCL ( has been working on a computational model of visual processing in the ventral stream for several years. Recently, a model for the dorsal stream is also being developed. The algorithm can perform on par with state-of-the-art computer vision algorithms. However, practical applications of the have been limited due to the computational complexity of the algorithm. In this project, we wish to utilize parallel capabilities of the GPU to make the object-recognition algorithms applicable to engineering problems. The reference MATLAB code takes about 14 seconds on a 256x256 image. A naive CUDA implementation takes 0.8 sec  for the same. Using coalesced memory access and loop unrolling, the time has been reduced to 0.3 sec. This corresponds to a speedup of ~50x

H.264 Motion Estimation in CUDA

Alex Rothberg, Lawrence Chan, Jae Lee, Paul Weaver

{presentation} {report} {website} {code}

Motion estimation is currently the most computationally intensive portion of the H.264 encoding process.  Previous attempts to parallelize this algorithm in CUDA showed sizable speedups but sacrificed quality by ignoring the motion vector prediction (MVp).  We demonstrate the viability of a hierarchical (pyramid) motion estimation algorithm in CUDA.  This solution addresses the MVp problem while still taking advantage of the CUDA framework.

Lipid Bilayers Simulation

Claudio A. Andreoni, Reid Van Lehn

{presentation} {report} {code}

The projects provides an environment for simulating biopolymers dynamics. Specifically, it allows to create a three-dimensional space filled with biomolecules moving due to interactions such as FENE, WCA, and others, and to track the status of the system at any given time.

Particle Interaction Simulation

Kyle Peter Fritz, Sergio Herrero

{presentation} {report} {webpage} {code}

Simple particles can model many different physical systems, from small particulates moving through a fluid to large planetary objects flying through space.  We have built a platform for the realistic simulation of these particles.  Using our platform, we can apply various forces (including, but not limited to, gravity, velocity proportional damping, and interparticle collisions) to create a more and more accurate model.  We expect to add different sorts of particles and additional forces to develop a more dramatic simulation.  To make this simulation computationally feasible, we make use of the CUDA parallel architecture to compute the forces on each particle and update their positions many times each second.  The simulation uses OpenGL to visualize the particle motion.

GPU Optimized Regression Analysis

Jan Balewski, Cory Li

{presentation} {report} {code}

Least-squares regression on huge datasets are found in a wide variety of fields, from computational biology to financial market modeling.  We present optimized kernel code on the GPU for computation of a least squares regression using the conjugant gradient method in hopes of accelerating many of these applications.

Proposals (January 2009)

6.963 CUDA @ MIT / Projects