MetaHipMer, is the first high-quality end-to-end de novo metagenome assembler designed for extreme scale data sets on distributed memory HPC systems. It is based on an earlier single genome version, HipMer, which itself was based on a single node Meraculous assembler. MetaHipMer is a PGAS application, and following a major rewrite of the code (sometimes called MHM2), the main software dependencies are the UPC++ programming system and the underlying GASNet-EX communication layer. There is also support for GPU acceleration with a dependency on CUDA and HIP. MetaHipMer’s high performance is based on several novel algorithmic advancements attained by leveraging the efficiency and programmability of the one-sided communication capabilities and RPC calls from UPC++, including optimized high-frequency k-mer analysis, communication avoiding de Bruijn graph traversal, advanced I/O optimization, and extensive parallelization across the numerous and complex application phases.
The primary authors of MHM2 are Steven Hofmeyr, Rob Egan and Muaaz Awan. The authors of the original MetaHIpMer and HipMer are Evangelos Georganas, Steven Hofmeyr, Aydin Buluc, Rob Egan and Eugene Goltsman. Leonid Oliker and Kathy Yelick have provided direction and advice throughout the development process. The original Meraculous was developed by Jarrod Chapman, Isaac Ho, Eugene Goltsman, and Daniel Rokhsar.
The latest release of MetaHipMer (version 2.1.0.2 released January 2023) is now available here.
For more information about MetaHipMer, contact Steven Hofmeyr
HipMCL is a high-performance parallel algorithm for large-scale network clustering. HipMCL parallelizes popular Markov Cluster (MCL) algorithm that has been shown to be one of the most successful and widely used algorithms for network clustering. It is based on random walks and was initially designed to detect families in protein-protein interaction networks. Despite MCL’s efficiency and multi-threading support, scalability remains a bottleneck as it fails to process networks of several hundred million nodes and billion edges in an affordable running time. HipMCL overcomes all of these challenges by developing massively-parallel algorithms for all components of MCL. HipMCL can be x1000 times faster than the original MCL without any information loss. It can easily cluster a network of ~75 million nodes with ~68 billion edges in ~2.4 hours using ~2000 nodes of Cori supercomputer at NERSC. HipMCL is developed in C++ language and uses standard OpenMP and MPI libraries for shared- and distributed-memory parallelization.
Primary authors are Ariful Azad and Aydin Buluc, in collaboration with Georgios Pavlopoulos (JGI), Nikos Kyrpides (JGI) and Christos Ouzounis (CERTH).
The first release of HipMCL (1.0.0) is now available. Download from Bitbucket
For more information about HipMCL, contact Ariful Azad
MerBench is a set of microbenchmarks originally developed for analyzing the performance of the primary communication patterns implemented in HipMer, an extreme-scale de novo genome assembler. One of the keys to HipMer’s high performance is attained by leveraging one-sided communication capabilities of the Unified Parallel C (UPC) for asynchronous Alltoall and Alltoallv communication. These benchmarks are a distillation of these essential communication patterns and parameters (e.g. message size) for cross-architecture and cross-application network performance analysis.
The primary authors are Evangelos Georganas, Rob Egan, and Marquita Ellis. Evangelos Georganas developed the original version of the microbenchmarks for analyzing HipMer. Rob Egan contributed a number of extensions for usability and cross-platform portability.
The microbenchmarks are available as part of HipMer on Sourceforge, and as a standalone release.
For more information about MerBench, contact Marquita Ellis.
Metamer is a workflow tool that takes in multiple next generation sequencing metagenome samples, calculate pairwise distance based on their k-mer content, and further cluster them based on the distance. The framework will help in providing structure to available metagenome samples, which will be essential in generating a metagenome-based database for characterization of metagenomes.
The primary author is Migun Shakya and Patrick Chain.
This is primarily a Python code.
For more information Migun Shakya.
Distributed memory version of BELLA based on MPI and using CombBLAS library. diBELLA 2D uses BELLA's overlapping methodology and completes it adding a transitive reduction step performed through algebraic operations. Future work includes a repeat resolution step and a scaffolding step to obtain an end-to-end distributed memory long-read assembler.
The primary authors are Giulia Guidi, Aydin Buluc, Saliya Ekanayake, and Oguz Selvitopi.
The first release is available at https://github.com/PASSIONLab/diBELLA.2D (master branch).
For more information about diBELLA 2D, contact Giulia Guidi.
A computationally efficient and highly accurate long-read to long-read aligner and overlapper. BELLA is written in C++ and it is currently implemented in shared-memory, single node using OpenMP.
The primary authors are Giulia Guidi and Aydin Buluc.
The first release is available at https://github.com/PASSIONLab/BELLA (master branch).
For more information about BELLA, contact Giulia Guidi.
PASTIS a fully distributed pipeline for large-scale protein similarity search. PASTIS constructs similarity graphs from large collections of protein sequences, which in turn can be used by a graph clustering algorithm to accurately discover protein families. A major novelty of PASTIS is its use of distributed sparse matrices as its underlying data structure. Not only the sequences and their k-mers are stored through sparse matrices, but also the substitute k-mers that are critical for controlling sensitivity and specificity during sequence overlapping. PASTIS extensively hides communication and exploits the symmetricity of the similarity matrix to achieve load balance. PASTIS is demonstrated to scale up to 2025 nodes (137,700 cores) and its accuracy is on par with the state-of-the-art.
The primary authors are Oguz Selvitopi, Saliya Ekaneyake, and Aydin Buluc.
The first release is available at https://github.com/PASSIONLab/PASTIS.
For more information about PASTIS, contact Oguz Selvitopi.