Research

Geometry-Oblivious Fast Multipole Methods (GOFMM)

GOFMM (geometry-oblivious Fast Multipole Method) is a novel method that creates a hierarchical low-rank approximation, or ``compression'', of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in O(N log N) or even O(N) time, where N is the matrix size. Compression requires N log N storage and work. In general, our scheme belongs to the family of hierarchical matrix approximation methods. In particular, it generalizes the fast multipole method (FMM) to a purely algebraic setting by only requiring the ability to sample matrix entries. Neither geometric information (i.e., point coordinates) nor knowledge of how the matrix entries have been generated is required, thus the term ``geometry-oblivious.'' Also, we introduce a shared-memory parallel scheme for hierarchical matrix computations that reduces synchronization barriers. We present results on the Intel Knights Landing and Haswell architectures, and on the NVIDIA Pascal architecture for a variety of matrices.

GOFMM collocates with HMLP at https://github.com/ChenhanYu/hmlp.

[pdf] Geometry-Oblivious FMM for Compressing Dense SPD Matrices, proceedings of SC'17

This 20-second film demonstrates the O(N) MATVEC of GOFMM. It composes three different tree traversals: an O(N) upward pass, an O(N) downward pass, and an O(N) sparse matrix multiplication. In GOFMM, computation dependencies are handled by a self-contained runtime system, dispatched by heterogenous scheduling policy.

High Performance Machine Learning Primitives (HMLP)

HMLP is a portable framework that provides high performance, memory efficient matrix-matrix multiplication like operations and their extension based on the BLIS framework. It generalizes C = AB to allow pre- and post-processing on A, B and C with arbitrary types. These extra computation may take no cost during the memory packing process (in transition), or even reduce the memory latency in a matrix-free fashion.

For examples,

Direct conv2d (in the film) = im2col while packing A and B;
All neighbor search = selecting column entries of C;
Strassen algorithm = adding submatrices while packing A and B.

Currently, HMLP has implementations on Intel x86_64, Knights Landing, ARM and NVIDIA GPU. We may further support other architectures in the future.

https://github.com/ChenhanYu/hmlp

[pdf] Performance Optimization for the K-Nearest Neighbors Kernel on x86 Architectures, proceedings of SC'15.

Multifrontal Methods (Sparse Direct Linear Solver)

Multifrontal methods are efficient and widely used to solve large-scale sparse linear systems. These methods transform the computations involving the large and sparse linear system into many smaller and dense sub-matrices called frontals. [1,2] show how these methods can be parallelized efficiently on a modern heterogenous architecture.

[1] Adaptive Performance Models and Workload Distribution Algorithms for Optimizing Hybrid CPU-GPU Multifrontal Solver, Computers & Mathematics with Applications (CAMWA), 2014

[2] A CPU-GPU Hybrid Approach for the Unsymmetric Multifrontal Method, Journal of Parallel Computing, 2011

multifrontal_demo.m4v

Real Time Hand Gesture Recognition

This is a side project to demo the capability of GOFMM. Real time video stream is processed with OpenCV to extract the gesture contour (left figure). 32-by-32 image is further reduced by extracting its HOG feature in 81 dimensions. The classifier is trained as a Gaussian kernel soft-max regression with full gradient decent. The input size is 23K, and the 23K-by-23K kernel matrix is approximated by GOFMM. This allows O(N) gradient descent in each iteration, because the MATVEC only takes O(N) work in GOFMM. The training complete with in 10 minutes on my Macbook Pro.

Neural Network Evolution

Neural evolution collision-free robotic arm trajectory planning. Matlab animations of the final trajectories. Figures show how the robotic arm is evolved to reach the target with 2 ball-shape obstacles and 3 ball-shape obstacles.

The two demos are in GIF format. If they stop moving, just refresh the page to reload them.

Particle Swarm Optimization

PSO (Particle Swarm Optimization) method is a stochastic base optimization method, usually used to find global optima for non-convex or non-differentiable high dimensional problems. The algorithm is designed to describe the behavior of grouping creatures; thus, it's itself suitable for finding multiple target and parallel computing. The figure shows how PSO is used to minimize the Lennard-Jones particle cluster and find the stable state.

Site designed by Chenhan D. Yu

Google Sites

Report abuse