Fast Convolution Algorithms
- CUBLAS and CUDNN are very fast and if the operation exists in them, it's probably faster than an equivalent one you can write.
- Convolutions can be treated as matrix multiplications, and with a few tricks, the memory footprint isn't too large.
- The Winograd algorithm uses more adds and fewer multiplies because adds are cheaper.
- The Winograd algorithm works best (saves the most time) on small kernels like 3x3. Many modern networks use 3x3 in part because it is fast.