Last update: 19DEC2017
References:
[1] https://www.cs.cornell.edu/~bindel/class/cs5220-s10/slides/lec03.pdf, p.21
Compiler flags
-O3: Aggressive optimization
-march=core2: Tune for specific architecture
-ftree-vectorize: Automatic use of SSE (supposedly)
-funroll-loops: Loop unrolling
-ffast-math: Unsafe floating point optimizations
Several principles
- Sometimes recomputing is faster than saving!
- Preload local variables
- Avoid branches inside internal loop
- Use local variables to expose independent computations
- Function calculation or table of precomputed values?
- Several (independent) passes over a data structure or one combined pass?
- Dense matrix vs sparse matrix?