In this assignment, we will try to optimize the Viola-Jones Face Detection algorithm on GPUs.

  • You can perform this assignment with another student (a group of maximum two people). You can also choose to do this assignment individually.
  • If you are in a group, you and the other student in your group only need to submit one report and one code package. However, the oral exam is individual.
  • There are CUDA examples of Viola-Jones from the web. You can study them for learning purposes. However, you still have to write your own code for this assignment.
Programming guidelines:
  • Make a timing breakdown of the program. Perform optimizations on the parts that dominate the execution time.
  • Find out the bottleneck of the dominating parts. Apply optimization techniques accordingly. Do not randomly choose optimization techniques, e.g., loop-unrolling helps little if the code is severely memory bounded.
  • Keep a record of each intermediate step. Re-do the timing breakdown after each optimization step. Some once-dominating parts may become less critical after the optimizations. Then you should move on to other parts.
  • Try to analyze the program. There are Performance Analysis Tools, but sometimes pen-and-paper exercises are good enough. For example, if the code is severely bandwidth bounded, you can quickly find it out by estimating the computation-to-memory ratio of the algorithm and plot it on the Roofline model.

Report guidelines:
  • Do not just present the results, but also explain the reasons. What do you expect? Do the results match what you expect? Why or why not?
  • Explain why certain optimizations are performed. Are you optimizing the dominating parts? Are the chosen optimization techniques solving the bottleneck?
  • If you do not manage to optimize the dominating parts, explain the reason. It is important to understand the fundamental limits.
  • Explain results in a concise and clear way. Tables and figures may help avoid verbose text. Yet data in tables and figures should be clearly explained.

  • A report of maximum 8 pages in PDF format.
  • A zip file containing the source code. Try to keep the zip file small (avoid binary files and unnecessary image files).
  • Send the report and the zip file via email to Dongrui She (d.she _at_ and Zhenyu Ye ( _at_