Fast Convolution Algorithms

Takeaways

  1. CUBLAS and CUDNN are very fast and if the operation exists in them, it's probably faster than an equivalent one you can write.
  2. Convolutions can be treated as matrix multiplications, and with a few tricks, the memory footprint isn't too large.
  3. The Winograd algorithm uses more adds and fewer multiplies because adds are cheaper.
  4. The Winograd algorithm works best (saves the most time) on small kernels like 3x3. Many modern networks use 3x3 in part because it is fast.
Fast Convolution Algorithms