Fast Transforms and Matrix Multiplications
Layered-Givens Transforms: Tunable Complexity, High-performance Approximation of Optimal Non-separable Transforms
A layered-Givens transform (LGT) is a tunable-complexity linear orthonormal transform designed to provide high-fidelity approximations of general transforms.
A general (N x N) transform/matrix may have great application performance but would incur N^2 complexity.
LGTs allow decreasing the complexity to NM where M is the number of LGT layers. (As M increases, LGTs gets closer and closer to the general transforms they are approximating at the expense of increased complexity). The point obviously is to generate high fidelity designs with M as small as possible.
The input x gets transformed to c by going through a sequence of LGT layers.
Each LGT layer is composed of data shuffling permutations and butterflies.
One can change the butterflies to general nonorthogonal matrices or even arbitrary nonlinear two-terminals to accelerate general matrix multiplication and deep-nets.
An LGT layer.
The LGT design uses powerful graph-theory-based methods to solve a nonlinear, highly combinatorial optimization problem. The solution to this problem yields the optimal permutation and butterfly parameters of each layer.
We have designed fixed-point LGTs to improve upon existing data compression transforms within HEVC with successful results having below separable transform complexity while accomplishing close to non-separable transform compression performance.
Separable vs. LGT approximation of a general transform. The LGT is tuned to 0.75 x separable computational complexity.
Here is how LGTs do in approximating eight different directional transforms aligned with eight different directions. The approximation performance increases with increased layers. All shown LGTs have complexity below separable transform complexity. For discussion on RCT and separable results please see further below.
IEEE International Conference on Image Processing, Conference Best Paper Award, 2017.
Source code to play with coming soon.
Row-Column Transforms: Low Complexity Approximation of Optimal Non-separable Transforms
Row-column transforms (RCTs) work on image patches (can be extended to volumes). The goal is similar to the LGT goal, provide high-fidelity, low-complexity approximations of general, non-separable patch/block transforms.
Let's change notation and say 2D patches/blocks are NxN (in LGT above we in effect called N^2 -> N). For an NxN patch a general non-separable transform incurs complexity N^4, a separable transform 2N^3, a DCT-like fast transform 2N^2 logN.
RCTs by design have separable transform complexity, i.e., 2N^3. They are hence ~16x, 8x, 4x, 2x faster for block sizes of 32, 16, 8, 4.
Convnet acceleration: Suppose you are running a conv-net layer on NxN patches with F filters. If F is comparable to N or if you are already running separable filters to keep things fast then you should consider switching to RCTs. (For a potentially bigger bang for buck, LGTs.)
RCTs are easy to understand: Separable transforms run a 1D transform over each row of the block, then run another 1D transform (usually the same as the row transform) over each column of the block. RCTs use a different transform for each row and for each column. They look like this:
An RCT applied on the patch X to yield coefficients Y. A different 1D row transform is applied to each row, followed by a different column transform for each column. The final result is permuted/shuffled with permutation P to obtain the coefficients.
RCTs are parameterized by N row kernels (each an N x N matrix), N column kernels, and the permutation P. The magic of the design is the incorporation of the permutation P in the design algorithm.
The way to think about the role of the permutation is as follows:
Suppose you generate a SOT or a PCA basis for NxN patches. Try approximating that with an RCT. Now try again but reorder the basis vectors in a random way. The results and approximation fidelity will be very different. Conclusion: It pays to optimize that reordering permutation.
Similar observations hold if you are designing based on some correlation/compaction metric, i.e., when you start without a target basis but with a target metric. Typically in the n-th iteration of metric minimization you will find yourself minimizing ||A_n - RCT_n|| which will basically make things look like the previous bullet.
The design algorithm (please see the paper) incorporates the permutation in its optimization. Not doing so will yield sub-par results. (One can use the same algorithm to design separable transforms that will do better than separable design methods that do not incorporate the permutation.)
The "Basis approximation - SNR (dB)" figure above illustrates how RCTs compare to improved separable designs (themselves designed in a permutation aware fashion). RCTs do significantly better. (LGTs do better than RCTs at even lower complexity.) Here is the picture for the vertical orientation where RCT does as well as the LGT.
RCT and improved separable approximation of a vertically aligned non-separable transform. Both RCT and separable designs have separable (2N^3) complexity.
Recommendations:
If your general transformation is too slow, try RCTs/LGTs instead.
If you are using separable designs, switch to RCTs with little impact to your current code/algorithm flow.
If you must keep using separable designs, see if a redesign using the RCT algorithm will benefit you (please see the paper, the same algorithm but all row/column kernels constrained). It likely will.
IEEE International Conference on Image Processing, Best Student Paper Runner-up Award, 2016.
Complexity-regularized matrix factorization
Assuming matrix multiplication complexity is proportional to the number of nonzero elements in the matrix (see sparse matrix multiplication algorithms) this work factorizes a given matrix to minimize the total number of nonzero elements. Sparsity comes about in a natural way, as a measure of complexity in multiply-adds.
One way of finding a sparse factorization is to carry out a low-rank approximation of A. A lot of work in vision and signal processing does that. Interestingly, it is quite well-established that there are large and important classes of matrices for which low-rank approximations are very, very suboptimal.
Here is the flow of this work:
The minimization in terms of B and C is not straightforward but one can come up with algorithms to do it well.
Pattern matching: We built a pattern matching prototype where the columns of A are different pattern vectors and the objective is to find the column that best matches the target pattern x very quickly. Imagine taking the source image and extracting patches from it at every shift. Populate the columns of A with these patches lexicographically converted into vectors. Repeat with distorted versions of the source adding more columns to A. Now apply A to x with x being patches extracted at every shift from the target. The actual process is a bit different and a lot faster but this conceptual construction works to illustrate the main idea.
So what do the features look like in this type of pattern matching? In the A = BC model one can look at columns of B as the features or feature-primitives that are put together by columns of C to form patterns. Let's look at a contrived but illustrative example:
Features to match a white square (all rotations and +/- 2-pixel translations) at two different complexity levels.
The top set can be used to put together all target squares but with somewhat rounded corners, etc. Bottom features provide better descriptions and improve matching accuracy. When one adds further robustness to the matching problem (e.g., scale variations in addition to rotations/translations) the primitives and needed complexity for accurate matching change.
Missing data prediction: The A = BC model where both B and C are solved/learnt from A using the above sparsity construct is a surprisingly powerful way for data prediction. Take a signal with missing data, extract patches from it at every shift (in a translation invariant or "convolutional" fashion). Form the columns of A. Here is the rest of the algorithm:
Sparsity is not an arbitrary regularization term. If data is predictable in the regression context then it is sparse in some domain, i.e., by attempting to predict missing pixels we are assuming the data is sparse (or predictable). Note that other than sparsity and the patch size, the algorithm learns everything about the data/signal/image from a single observation. No training is required. Similar to nonlinear-approximation-based recovery the flow of this algorithm can be written as a convnet. Here are a few results:
Patents:
M. Koo and O. G. Guleryuz, “Layered givens transforms (lGT) design methods and apparatus,” filed, February 2017. Assigned to LG Electronics.
B. Li, O. G. Guleryuz, and A. Vosoughi, “Layered Givens Transforms (lGT) for Data Compression and Analysis,” filed, August 2016. Assigned to LG Electronics.
J. Ehmann and O. G. Guleryuz, “Jointly Optimal Oblique Transforms,” filed, August 2016. Assigned to LG Electronics.
O. G. Guleryuz, H. Egilmez, and J. Ehmann, “Methods to design low-complexity, non-isotropic transforms,” filed, August 2016. Assigned to LG Electronics, Inc.
J. Zujovic and O. G. Guleryuz, “Complexity Regularized Pattern Representation, Search, and Compression,” issued, November 2012. Assigned to NTT DoCoMo, Inc. Patent no: 8,311,334.
Papers:
B. Li, O. G. Guleryuz, J. Ehmann, and A. Vosoughi, "Layered-Givens Transforms: Tunable-Complexity, High Performance Approximation of Optimal Non-Separable Transforms," Proc. IEEE Int'l Conf. on Image Proc. (ICIP2017), Beijing, China, Sept. 2017. {.pdf} [ICIP 2017 Best Paper Award]
H. E. Egilmez, O. G. Guleryuz, J. Ehmann, and S. Yea, “Row-Column Transforms: Low-Complexity Approximation of Optimal Non-Separable Transforms,” Proc. IEEE Int’l Conf. on Image Proc. (ICIP2016), Phoenix, AZ, Sept. 2016. {.pdf} [ICIP 2016 Best Student Paper Award Runner-Up]
J. Zujovic and O. G. Guleryuz, “Complexity Regularized Pattern Matching,” Proc. IEEE Int’l Conf. on Image Proc. (ICIP2009), Cairo, Egypt, Nov. 2009, {pdf}.