HPC
HPC: High Performance Computing / Scientific Computing
We tightly collaborate with the group 'High Performance Computing and Applications' from University of Almeria in the development and evaluation of High Performance Computing (HPC) techniques to accelerate computationally demanding problems in Three-Dimensional Electron Microscopy. We also collaborate in providing novel and fast approaches for major operations in scientific computing, such as sparse matrix vector product (SpMV). In these works, we devise solutions for execution on state-of-the-art HPC platforms (supercomputers, GPUs, standard multicore computers) and make use of different parallel paradigms and strategies (MPI, shared memory, GPU computing, vectorization, single-core code optimization; hybrid computing techniques).
People involved in this project (present and past):
JI Agulleiro
JR Bilbao-Castro
I Garcia
EM Garzon
JA Martinez
A Martinez-Sanchez
JJ Moreno
F Vazquez
Modern computing architectures. (Top-left) Modern computers ship with several multicore chips (typically 2–4) configured to share a centralized memory. Each multicore chip contains several computing cores (2–6) sharing a cache memory (typically the third level, L3). Internally, each core contains two more cache levels (L1 and L2, not shown in this figure). (top-right) Cluster of multicore computers. Each node has m processors sharing a single centralized memory. The nodes are then connected through an interconnection network. Most current supercomputers are also based on this architectural model. In this case, the so-called distributed-shared memory (DSM) architecture may be available, whereby there is a virtually unique memory system but there is a non-uniform memory access (NUMA), depending on the physical location of the data. (Bottom-left) Graphics Processing Units (GPUs) are composed by several Streaming Multiprocessors (SM) (e.g. 30 and 16 in the second and third generation of NVIDIA GPUs, respectively). Each SM is made up of a number of cores (8 and 32, respectively) that share a register file and a local memory. All the SMs share the global device memory. In the third generation, a hierarchy of cache memory is provided. In particular, a L2 cache level is between the SMs and the device memory. (bottom-right) Hybrid CPU + GPU computing on a computer equipped with multicore processors and multiple GPUs. The system keeps a pool of tasks to do. A number of threads to be mapped to CPU cores (denoted by C-threads) are running concurrently in the system. Also, specific threads (denoted by G-threads) in charge of the tasks to be computed on the GPUs are also running. The tasks are asynchronously dispatched to the threads on-demand. In the figure, allocation of tasks to threads are color-coded. Note that the G-threads will request tasks more often than the C-threads as GPUs make the calculations faster than a single CPU core. Moreover, faster GPUs will be assigned work more frequently than modest GPUs.
Relevant publications:
Reviews
Hybrid Computing (GPU+CPU)
Hybrid computing: CPU+GPU co-processing and its application to tomographic reconstruction
JI Agulleiro, F Vazquez, EM Garzon, JJ Fernandez.
Ultramicroscopy 115:109-114, 2012. [Software for developers]
Tomographic Reconstruction on standard computers (code optimization, vectorization, threads)
Tomo3D 2.0--exploitation of advanced vector extensions (AVX) for 3D reconstruction.
JI Agulleiro, JJ Fernandez.
Journal of Structural Biology 189:147-152, 2015. [Software]
Evaluation of a multicore-optimized implementation for tomographic reconstruction
JI Agulleiro, JJ Fernandez.
PLoS ONE 7(11): e48261, 2012. [PDF] [Software]
Fast tomographic reconstruction on multicore computers.
JI Agulleiro, JJ Fernandez.
Bioinformatics 27:582-583, 2011. [Software]
Vectorization with SIMD extensions speeds up reconstruction in electron tomography
JI Agulleiro, EM Garzon, I Garcia, JJ Fernandez.
Journal of Structural Biology 170:570-575, 2010. [Software]
Tomographic Reconstruction with GPU computing.
A matrix approach to tomographic reconstruction and its implementation on GPUs.
F Vazquez, EM Garzon, JJ Fernandez.
Journal of Structural Biology 170:146-151, 2010.
Tomographic Reconstruction with MPI.
Efficient parallel implementation of iterative reconstruction algorithms for electron tomography.
JJ Fernandez, D Gordon, R Gordon.
Journal of Parallel and Distributed Computing 68(5):626-640, 2008.
HPC in Noise Filtering
TomoEED: fast edge-enhancing denoising of tomographic volumes.
JJ Moreno, A Martinez-Sanchez, JA Martinez, EM Garzon, JJ Fernandez.
Bioinformatics 34:3776-3778, 2018. [Software]
Three-dimensional feature-preserving noise reduction for real-time electron tomography
JJ Fernandez, JA Martinez
Digital Signal Processing 20:1162-1172, 2010. [Software]
High performance noise reduction for biomedical multidimensional data
S Tabik, EM Garzon, I Garcia, JJ Fernandez.
Digital Signal Processing 17:724-736, 2007. [Software]
HPC for 3D reconstruction in Single-Particle Electron Microscopy.
Exploiting desktop supercomputing for 3D electron microscopy reconstructions using ART with blobs.
JR Bilbao-Castro, R Marabini, COS Sorzano, I Garcia, JM Carazo, JJ Fernandez.
Journal of Structural Biology 165:19-26, 2009.
Parameter optimization in 3D reconstruction on a large scale grid.
JR Bilbao-Castro, A Merino, I Garcia, JM Carazo, JJ Fernandez.
Parallelization of reconstruction algorithms in three-dimensional electron microscopy.
JR Bilbao-Castro, JM Carazo, I Garcia, JJ Fernandez.
HPC for Sparse Matrix Vector Product
Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach.
F Vazquez, JJ Fernandez, EM Garzon.
Parallel Computing 38:408-420, 2012. [Software]
A new approach for sparse matrix vector product on NVIDIA GPUs.
F Vazquez, JJ Fernandez, EM Garzon.
Concurrency and Computation: Practice and Experience 23:815-826, 2011. [Software]