HPC

HPC: High Performance Computing / Scientific Computing

We tightly collaborate with the group 'High Performance Computing and Applications' from University of Almeria in the development and evaluation of High Performance Computing (HPC) techniques to accelerate computationally demanding problems in Three-Dimensional Electron Microscopy. We also collaborate in providing novel and fast approaches for major operations in scientific computing, such as sparse matrix vector product (SpMV). In these works, we devise solutions for execution on state-of-the-art HPC platforms (supercomputers, GPUs, standard multicore computers) and make use of different parallel paradigms and strategies (MPI, shared memory, GPU computing, vectorization, single-core code optimization; hybrid computing techniques).

People involved in this project (present and past):

JI Agulleiro

JR Bilbao-Castro

I Garcia

EM Garzon


      JA Martinez

A Martinez-Sanchez

JJ Moreno

F Vazquez

Modern computing architectures. (Top-left) Modern computers ship with several multicore chips (typically 2–4) configured to share a centralized memory. Each multicore chip contains several computing cores (2–6) sharing a cache memory (typically the third level, L3). Internally, each core contains two more cache levels (L1 and L2, not shown in this figure). (top-right) Cluster of multicore computers. Each node has m processors sharing a single centralized memory. The nodes are then connected through an interconnection network. Most current supercomputers are also based on this architectural model. In this case, the so-called distributed-shared memory (DSM) architecture may be available, whereby there is a virtually unique memory system but there is a non-uniform memory access (NUMA), depending on the physical location of the data. (Bottom-left) Graphics Processing Units (GPUs) are composed by several Streaming Multiprocessors (SM) (e.g. 30 and 16 in the second and third generation of NVIDIA GPUs, respectively). Each SM is made up of a number of cores (8 and 32, respectively) that share a register file and a local memory. All the SMs share the global device memory. In the third generation, a hierarchy of cache memory is provided. In particular, a L2 cache level is between the SMs and the device memory. (bottom-right) Hybrid CPU + GPU computing on a computer equipped with multicore processors and multiple GPUs. The system keeps a pool of tasks to do. A number of threads to be mapped to CPU cores (denoted by C-threads) are running concurrently in the system. Also, specific threads (denoted by G-threads) in charge of the tasks to be computed on the GPUs are also running. The tasks are asynchronously dispatched to the threads on-demand. In the figure, allocation of tasks to threads are color-coded. Note that the G-threads will request tasks more often than the C-threads as GPUs make the calculations faster than a single CPU core. Moreover, faster GPUs will be assigned work more frequently than modest GPUs.

Relevant publications:

Computational methods for electron tomography.

JJ Fernandez

Micron 43:1010-1030, 2012.   

High performance computing in structural determination by electron cryomicroscopy.

JJ Fernandez.

Journal of Structural Biology 164:1-6, 2008.  

  Hybrid computing: CPU+GPU co-processing and its application to tomographic reconstruction

  JI Agulleiro, F Vazquez, EM Garzon, JJ Fernandez.

  Ultramicroscopy 115:109-114, 2012.   [Software for developers]

Tomo3D 2.0--exploitation of advanced vector extensions (AVX) for 3D reconstruction.

  JI Agulleiro, JJ Fernandez.

  Journal of Structural Biology 189:147-152, 2015.   [Software]

Evaluation of a multicore-optimized implementation for tomographic reconstruction

  JI Agulleiro, JJ Fernandez.

  PLoS ONE  7(11): e48261, 2012. [PDF] [Software]  

Fast tomographic reconstruction on multicore computers.

  JI Agulleiro, JJ Fernandez.

  Bioinformatics 27:582-583, 2011.   [Software]

  Vectorization with SIMD extensions speeds up reconstruction in electron tomography

  JI Agulleiro, EM Garzon, I Garcia, JJ Fernandez.

  Journal of Structural Biology 170:570-575, 2010. [Software]

 A matrix approach to tomographic reconstruction and its implementation on GPUs.

  F Vazquez, EM Garzon, JJ Fernandez.

  Journal of Structural Biology 170:146-151, 2010.

  Efficient parallel implementation of iterative reconstruction algorithms for electron tomography.

  JJ Fernandez, D Gordon, R Gordon.

  Journal of Parallel and Distributed Computing 68(5):626-640, 2008. 

 Three-dimensional reconstruction of cellular structures by electron microscope tomography and parallel computing.

  JJ Fernandez, JM Carazo, I Garcia.

  Journal of Parallel and Distributed Computing 64(2):285-300, 2004. 

TomoEED: fast edge-enhancing denoising of tomographic volumes.

  JJ Moreno, A Martinez-Sanchez, JA Martinez, EM Garzon, JJ Fernandez.

  Bioinformatics 34:3776-3778, 2018.   [Software]

Three-dimensional feature-preserving noise reduction for real-time electron tomography

  JJ Fernandez, JA Martinez

  Digital Signal Processing 20:1162-1172, 2010.   [Software]

High performance noise reduction for biomedical multidimensional data

  S Tabik, EM Garzon, I Garcia, JJ Fernandez.

  Digital Signal Processing 17:724-736, 2007.   [Software]

Exploiting desktop supercomputing for 3D electron microscopy reconstructions using ART with blobs.

  JR Bilbao-Castro, R Marabini, COS Sorzano, I Garcia, JM Carazo, JJ Fernandez.

  Journal of Structural Biology 165:19-26, 2009.

Parameter optimization in 3D reconstruction on a large scale grid.

  JR Bilbao-Castro, A Merino, I Garcia, JM Carazo, JJ Fernandez. 

  Parallel Computing 33:250-263, 2007. 

Parallelization of reconstruction algorithms in three-dimensional electron microscopy.

  JR Bilbao-Castro, JM Carazo, I Garcia, JJ Fernandez. 

  Applied Mathematical Modelling 30(8):688-701, 2006. 

Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach.

  F Vazquez, JJ Fernandez, EM Garzon.

  Parallel Computing 38:408-420, 2012.   [Software]

A new approach for sparse matrix vector product on NVIDIA GPUs.

  F Vazquez, JJ Fernandez, EM Garzon.

  Concurrency and Computation: Practice and Experience 23:815-826, 2011.    [Software]