Welcome to my page on architecture specific optimization for modern processors

As multicore architectures overtake single-core architectures in today and future computer systems, applications must switch to parallel algorithms to achieve higher performance. Years of researches have yielded many ways to parallelize applications –functional decomposition, data partitioning, etc. However, we found exploiting parallelism alone at the algorithmic level is not sufficient to achieve the best performance. We must take into account the underlying platform architecture characteristics such as core architecture, SIMD width, bandwidth, etc. to achieve optimal application performance. We have been engaging in platform specific optimizations for many years and this tutorial will present our architecture specific optimization guides for modern CPU and GPU. We will use industry examples to illustrate how specific optimization techniques benefited the application performance.

  • Processors with multiple cores per socket
  • Short vector computation
  • Cache or local storage to capture data reuse and reduce average access latency
  • High compute to bandwidth ratio (typ. 0.2 Byte per DP flop)
  • Custom accelerators to provide efficient computation for specific functions (e.g. video decoding, encryption)

Optimization Principles:
  • Keep all ALU busy with minimal idle time.
  • Work from inside out (start from within a single core and gradually work towards a whole chip, then the whole system consists of multiple chips and finally a whole cluster)
  • Iterate between algorithm/data structure design, implementation and analysis

  • Following "Algorithm - Data Structure - Implementation - Analysis" cycle
  • Optimize for scalar, then vector then multi-core

Past Tutorials:
Over time, I have offered a number of tutorials on architecture specific optimization for modern processors at different venues.   Materials will be posted later.

2014 Intel Developer Forum (San Francisco)

2013 Supercomputing

2013 Intel Developer Forum (San Francisco)

HPCA tutorial  

HiPC tutorial


  • Tutorial Offering 1st tutorial to be offered at ICS'10 - Epochal Tsukuba, Tsukuba, Japan
    Posted Apr 16, 2010, 4:46 AM by Victor Lee
Showing posts 1 - 1 of 1. View more »