Publications

2024


Coassembly and binning of a twenty-year metagenomic time-series from Lake Mendota, Tiffany Oliver, Neha Varghese, Simon Roux, Frederik Schulz, Marcel Huntemann, Alicia Clum, Brian Foster, Bryce Foster, Robert Riley, Kurt LaButti, Robert Egan, Patrick Hajek, Supratim Mukherjee, Galina Ovchinnikova, T_B_K Reddy, Sara Calhoun, Richard D Hayes, Robin R Rohwer, Zhichao Zhou, Chris Daum, Alex Copeland, I-Min A Chen, Natalia N Ivanova, Nikos C Kyrpides, Nigel J Mouncey, Tijana Glavina Del Rio, Igor V Grigoriev, Steven Hofmeyr, Leonid Oliker, Katherine Yelick, Karthik Anantharaman, Katherine D McMahon, Tanja Woyke, Emiley A Eloe-Fadrosh, Scientific Data 11 (1), Nature Publishing Group UK, p. 966, September 4, 2024.


Exabiome: Advancing Microbial Science through Exascale Computing, Steven Hofmeyr, Aydin Buluç, Robert Riley, Rob Egan, Oguz Selvitopi, Leonid Oliker, Katherine Yelick, Migun Shakya, Brett Youtsey, Ariful Azad, IEEE Computing in Science & Engineering, May 22, 2024. 


SIMCoV-GPU: Accelerating an Agent-Based Model for Exascale, Kirtus Leyba, Steven Hofmeyr, Stephanie Forrest, Judy Cannon, Melanie Moses, Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, June 3, 2024, pp. 233-333.

GenomeFace: a deep learning-based metagenome binner trained on 43,000 microbial genomes, Richard Lettich, Robert Egan, Robert Riley, Zhong Wang, Andrew Tritt, Leonid Oliker, Katherine Yelick, Aydın Buluç, bioRxiv, February 8, 2024. (preprint)

2023

Space efficient sequence alignment for sram-based computing: X-drop on the graphcore IPU, Luk Burchard, Max Xiaohang Zhao, Johannes Langguth, Aydın Buluç, Giulia Guidi, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Pages 1-16, November 12, 2023.


Terabase-Scale Coassembly of a Tropical Soil Microbiome, Robert Riley, Robert M Bowers, Antonio Pedro Camargo, Ashley Campbell, Rob Egan, Emiley A Eloe-Fadrosh, Brian Foster, Steven Hofmeyr, Marcel Huntemann, Matthew Kellom, Jeffrey A Kimbrel, Leonid Oliker, Katherine Yelick, Jennifer Pett-Ridge, Asaf Salamov, Neha J Varghese, Alicia Clum, Microbiology Spectrum, June 13, 2023, pages e00200-23, American Society for Microbiology. 

Designing Efficient SIMD Kernels for High Performance Sequence Alignment, Doru Thom Popovici, Muaaz Gul Awan, Giulia Guidi, Rob Egan, Steven Hofmeyr, Leonid Oliker, Katherine Yelick, 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 15, 2023, IEEE, pages 167-176

Singleton Sieving: Overcoming the Memory/Speed Trade-Off in Exascale K-mer Analysis, Hunter McCoy, Steven Hofmey, Katherine Yelick, Prashant Pandey, SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23), May 31-June 2, 2023, Seattle, WA, pages 213-224.

High-Performance Filters for GPUs, Hunter McCoy, Steven Hofmeyr, Katherine Yelick, Prashant Pandey, Principles and Practice of Parallel Programming (PPoPP 2023), February/March 2023, Montreal, Canada, arXiv preprint arXiv:2212.090052022. 


2022

Extreme-scale many-against-many protein similarity search, Oguz Selvitopi, Saliya Ekanayake, Giulia Guidi, Muaaz G Awan, Georgios A Pavlopoulos, Ariful Azad, Nikos Kyrpides, Leonid Oliker, Katherine Yelick, Aydin Buluç, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Pages 1-12, November 13, 2022.  Gordon Bell Finalist.

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly, Giulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine Yelick, Aydın Buluç, 51st International Conference on Parallel Processing (ICPP ’22), August/September 2022, arXiv preprint arXiv:2207.0435012022.

Parallel algorithms for masked sparse matrix-matrix products, Srđan Milaković, Oguz Selvitopi, Israt Nisa, Zoran Budimlić, Aydin Buluç, Proceedings of the 51st International Conference on Parallel Processing, August 8, 2022.

Critical Assessment of Metagenome Interpretation: the second round of challenges, Fernando Meyer, Adrian Fritz, Zhi-Luo Deng, David Koslicki, Till Robin Lesker, Alexey Gurevich, Gary Robertson, Mohammed Alser, Dmitry Antipov, Francesco Beghini, Denis Bertrand, Jaqueline J. Brito, C. Titus Brown, Jan Buchmann, Aydin Buluç, Bo Chen, Rayan Chikhi, Philip T. L. C. Clausen, Alexandru Cristian, Piotr Wojciech Dabrowski, Aaron E. Darling, Rob Egan, Eleazar Eskin, Evangelos Georganas, Eugene Goltsman, Melissa A. Gray, Lars Hestbjerg Hansen, Steven Hofmeyr, Pingqin Huang, Luiz Irber, Huijue Jia, Tue Sparholt Jørgensen, Silas D. Kieser, Terje Klemetsen, Axel Kola, Mikhail Kolmogorov, Anton Korobeynikov, Jason Kwan, Nathan LaPierre, Claire Lemaitre, Chenhao Li, Antoine Limasset, Fabio Malcher-Miranda, Serghei Mangul, Vanessa R. Marcelino, Camille Marchet, Pierre Marijon, Dmitry Meleshko, Daniel R. Mende, Alessio Milanese, Niranjan Nagarajan, Jakob Nissen, Sergey Nurk, Leonid Oliker, Lucas Paoli, Pierre Peterlongo, Vitor C. Piro, Jacob S. Porter, Simon Rasmussen, Evan R. Rees, Knut Reinert, Bernhard Renard, Espen Mikal Robertsen, Gail L. Rosen, Hans-Joachim Ruscheweyh, Varuni Sarwal, Nicola Segata, Enrico Seiler, Lizhen Shi, Fengzhu Sun, Shinichi Sunagawa, Søren Johannes Sørensen, Ashleigh Thomas, Chengxuan Tong, Mirko Trajkovski, Julien Tremblay, Gherman Uritskiy, Riccardo Vicedomini, Zhengyang Wang, Ziye Wang, Zhong Wang, Andrew Warren, Nils Peder Willassen, Katherine Yelick, Ronghui You, Georg Zeller, Zhengqiao Zhao, Shanfeng Zhu, Jie Zhu, Ruben Garrido-Oter, Petra Gastmeier, Stephane Hacquard, Susanne Häußler, Ariane Khaledi, Friederike Maechler, Fantin Mesny, Simona Radutoiu, Paul Schulze-Lefert, Nathiana Smit, Till Strowig, Andreas Bremges, Alexander Sczyrba & Alice Carolyn McHardy, Nature methods 19 (4), 429-440, April 8, 2022. https://doi.org/10.1038/s41592-022-01431-4

2021

Accelerating Large-Scale Genome Assembly with GPUs, Muaaz Awan, Jack Deslippe, Steven Hofmeyr, Rob Egan, Aydın Buluç, Leonid Oliker, Katherine Yelick.  ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), November, 2021.  Best paper finalist.

EXAGRAPH: Graph and combinatorial methods for enabling exascale applications, Seher Acer, Ariful Azad, Erik G Boman, Aydın Buluç, Karen D Devine, SM Ferdous, Nitin Gawande, Sayan Ghosh, Mahantesh Halappanavar, Ananth Kalyanaraman, Arif Khan, Marco Minutoli, Alex Pothen, Sivasankaran Rajamanickam, Oguz Selvitopi, Nathan R Tallent, Antonino Tumeo, The International Journal of High Performance Computing Applications, 9/30/2021.


Combinatorial BLAS 2.0: Scaling combinatorial algorithms on distributed-memory systems, Ariful Azad, Oguz Selvitopi, MT Hussain, John R Gilbert, Aydin Buluç, IEEE Transactions on Parallel and Distributed Systems 33 (4), 989-1000, 2021.


Introduction to GraphBLAS 2.0, Benjamin Brock, Aydin Buluç, Timothy G Mattson, Scott McMillan, Jose E Moreira, 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Pages 253-262, 2021.


Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs, Prashant Pandey, Brian Wheatman, Helen Xu, Aydin Buluc, Proceedings of the 2021 International Conference on Management of Data, pages 1372-1385, 2021.


Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication, Oguz Selvitopi, Benjamin Brock, Isra Nisa, Alok Tripathy, Katherine Yelick, Aydin Buluç, Proceedings of the ACM International Conference on Supercomputing, pages 431-442, 2021


Communication-avoiding and memory-constrained sparse matrix-matrix multiplication at extreme scale, MT Hussain, O Selvitopi, A Buluç, A Azad, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021


Scaling Generalized N-Body Problems, A Case Study from Genomics, Marquita Ellis, Aydın Buluç, Katherine Yelick.  50th International Conference on Parallel Processing, Lemont, IL (virtual format), pages 1-9, August 2021. https://doi.org/10.1145/3472456.3472517 


Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly, Giulia Guidi, Oguz Selvitopi, Marquita Ellis, Leonid Oliker, Katherine Yelick, Aydin Buluç. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2021.


BELLA: Berkeley efficient long-read to long-read aligner and overlapper, Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, Aydın Buluç. SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), Society for Industrial and Applied Mathematics, Pages 123-134, 2021.


Distributed-Memory k-mer Counting on GPUs, Israt Nisa, Prashant Pandey, Marquita Ellis, Leonid Oliker, Aydin Buluç, Katherine Yelick. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2021.


10 Years Later: Cloud Computing is Closing the Performance Gap, Giulia Guidi, Marquita Ellis, Aydin Buluç, Katherine Yelick, David Culler. Hot Topics in Cloud Computing Performance (HotCloudPerf 2021), France virtual conference, 2021. arXiv preprint arXiv:2011.00656.


PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction, Nicolas Swenson, Aditi S Krishnapriyan, Aydın Buluç, Dmitriy Morozov, Katherine Yelick, NeurIPS Workshop, 2021.  arXiv preprint arXiv:2010.16027.

2020

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly, Giulia Guidi, Oguz Selvitopi, Marquita Ellis, Leonid Oliker, Kathering Yelick, Aydin Buluc, arXiv:2010.10055, 2020. 

Reducing Communication in Graph Neural Network Training, Alok Tripathy, Katherine Yelick, Aydin Buluc,  Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20), November 2020. arXiv preprint arXiv:2005.033002020Q. To appear.

Distributed many-to-many protein sequence alignment using sparse matrices, Oguz Selvitopi, Saliya Ekanayake, Giulia Guidi, Georgios Pavlopoulos, Ariful Azad, and Aydin Buluç, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20), November 2020.  arXiv preprint arXiv:2009.1446732020. To appear.

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale, Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç, Ariful Azad, October 2020, arXiv preprint arXiv:2010.085262020.

ADEPT: a domain independent sequence alignment strategy for GPU architectures, Muaaz Awan, Jack Deslippe, Aydin Buluc, Oguz Selvitopi, Steven Hofmeyr, Leonid  Oliker, Katherine  Yelick, BMC Bioinformatics 21 (1), 1-29, September 2020.

Gradual polyploid genome evolution revealed by pan-genomic analysis of Brachypodium hybridumand its diploid progenitors, Sean P. Gordon, Bruno Contreras-Moreira, Joshua J. Levy, Armin Djamei, Angelika Czedik-Eysenberg, Virginia S. Tartaglio, Adam Session, Joel Martin, Amy Cartwright, Andrew Katz, Vasanth R. Singan, Eugene Goltsman, Kerrie Barry, Vinh Ha Dinh-Thi, Boulos Chalhoub, Antonio Diaz-Perez, Ruben Sancho, Joanna Lusinska, Elzbieta Wolny, Candida Nibau, John H. Doonan, Luis A. J. Mur, Chris Plott, Jerry Jenkins, Samuel P. Hazen, Scott J. Lee, Shengqiang Shu, David Goodstein, Daniel Rokhsar, Jeremy Schmutz, Robert Hasterok, Pilar Catalan, John P. Vogel, Nature Communications, 11 (3670) July 2020.

Optimizing high performance Markov clustering for pre-exascale architectures. Oguz Selvitopi, Md Taufique Hussain, Ariful Azad, and Aydin Buluç. In Proceedings of the International Parallel and Distributed Processing Symposium, May 2020.

Parallelizing Irregular Applications for Distributed Memory Scalability: Case Studies from Genomics. Marquita Ellis, (Committee: Katherine A. Yelick, James Demmel, Aydin Buluç, and Daniel Rokhsar). PhD dissertation, University of California, Berkeley, May 2020.

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment, Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D Santambrogio, Steven Hofmeyr, Aydın Buluç, Leonid Oliker, Katherine Yelick, 2020.

GPU accelerated partial order multiple sequence alignment for long reads self-correction, Francesco Peverelli, Lorenzo Di Tucci, Marco D Santambrogio, Nan Ding, Steven Hofmeyr, Aydın Buluç, Leonid Oliker, Katherine Yelick, 2020/1/1, bioRxiv2020.

BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper, Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, Aydın Buluç, 2020/1/1, bioRxiv2020. doi: https://doi.org/10.1101/464420.

Terabase-scale metagenome coassembly with MetaHipMer, Steven Hofmeyr, Rob Egan, Evangelos Georganas, Alex C. Copeland, Robert Riley, Alicia Clum, Emiley Eloe-Fadrosh, Simon Roux, Eugene Goltsman, Aydın Buluç, Daniel Rokhsar, Leonid Oliker, Katherine Yelick, Scientific reports, Nature Publishing Group (10:1), pp. 1-11. July 1, 2020.

Parallel algorithms for finding connected components using linear algebra, Yongzhe Zhang, Ariful Azad, and Aydin Buluç, Journal of Parallel and Distributed Computing, April 2020.

The Parallelism Motifs of Genomic Data Analysis. Katherine Yelick, Aydın Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi, Cristina Teodoropol, and Leonid Oliker, Philosophical Transactions of the Royal Society A, 2020, 378:20190394. doi: 10.1098/rsta.2019.0394

Exascale Applications: Skin in the Game. Francis Alexander, Ann Almgren, John Bell, Amitava Bhattacharjee, Jacqueline Chen, Phil Colella, David Daniel, Jack DeSlippe, Lori Diachin, Erik Draeger, Anshu Dubey, Thom Dunning, Thomas Evans, Ian Foster, Marianne Francois, Tim Germann, Mark Gordon, Salman Habib, Mahantesh Halappanavar, Steven Hamilton,William Hart, Zhenyu (Henry) Huang, Aimee Hungerford, Daniel Kasen, Paul R. C. Kent, Tzanio Kolev, Douglas B. Kothe, Andreas Kronfeld, Ye Luo, Paul Mackenzie, David McCallen, Bronson Messer, Sue Mniszewski, Chris Oehmen, AmedeoPerazzo, Danny Perez, David Richards,William J. Rider, Rob Rieben, Kenneth Roche, Andrew Siege, Michael Sprague, Carl Steefel, Rick Stevens, Madhava Syamlal, Mark Taylor, John Turner, Jean-Luc Vay, Artur F. Voter, Theresa L.Windus, and Katherine Yelick, Philosophical Transactions of the Royal Society A, 2020, 378:20190056. doi: 10.1098/rsta.2019.0056

2019

RDMA vs. RPC for implementing distributed data structures, Benjamin A Brock, Yuxin Chen, Jiakun Yan, John Owens, Aydın Buluç, Katherine Yelick, 2019/11/18, 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3) at SC19, Pages 17-22.

diBELLA: Distributed Long Read to Long Read Alignment. Marquita Ellis, Giulia Guidi, Aydin Buluc, Leonid Oliker, and Katherine Yelick, International Conference on Parallel Processing, Kyoto, Japan, August  5-8, 2019. DOI 10.1145/3337821.3337919

BCL: A Cross-Platform Distributed Data Structures Library. Benjamin Brock, Aydin Buluc, and Katherine Yelick, International Conference on Parallel Processing, Kyoto, Japan, August  5-8, 2019.

2018

Extreme Scale De Novo Metagenome Assembly. Evangelos Georganas, Steven Hofmeyr, Leonid Oliker, Rob Egan, Daniel Rokhsar, Aydin Buluc, Katherine Yelick.International Conference for High Performance Computing, Networking, Storage and Analysis (“Supercomputing”, SC’18), Dallas, Texas, November 2018.  Best Paper Finalist.

BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper. Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, Aydın Buluç. bioRxiv:464420, November 7, 2018.  Presented as a poster at the 3rd Annual Northern California Computational Biology Symposium (NCCB) on October 6, 2018 in San Francisco.  Also a poster presentation at the Biological Data Science meeting in Cold Spring Harbor Laboratory, New York on November 7-10, 2018.

HipMCL: A high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Ariful Azad, Georgios A. Pavlopoulos, Christos A. Ouzounis, Nikos C. Kyrpides, and Aydin Buluc. Nucleic Acids Research (NAR), Vol. 46, No. 6, April 6, 2018. 

2017

Extreme-Scale De Novo Genome Assembly. Evangelos Georganas, Steven Hofmeyr, Leonid Oliker, Rob Egan, Daniel Rokhsar, Aydin Buluc, Katherine Yelick.  Exascale Scientific Applications: Scalability and Performance Portability, CRC Press, November 13, 2017.

MerBench: PGAS Benchmarks for High Performance Genome Assembly. Evangelos Georganas, Marquita Ellis, Rob Egan, Steven Hofmeyr, Aydin Buluç, Brandon Cook, Leonid Oliker, Katherine Yelick,  Proceedings of the Second Annual PGAS Applications Workshop (at SC'17), November 12, 2017.

Performance characterization of de novo genome assembly on leading parallel systems. Marquita Ellis, Evangelos Georganas, Rob Egan, Steven Hofmeyr, Aydin Buluc, Brandon Cook, Leonid Oliker, and Katherine Yelick.  In EuroPar - International European Conference on Parallel and Distributed Computing, 2017.

2016

Scalable Parallel Algorithms for Genome Analysis. Evangelos Georganas. PhD thesis, EECS Department, University of California, Berkeley, August 2016

2015

HipMer: an extreme-scale de novo genome assembler. Evangelos Georganas, Aydın Buluç, Jarrod Chapman, Steven Hofmeyr, Chaitanya Aluru, Rob Egan, Leonid Oliker, Daniel Rokhsar, and Katherine Yelick. 2015. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, , Article 14 , 11 pages. DOI: https://doi.org/10.1145/2807591.2807664

merAligner: A Fully Parallel Sequence Aligner. Evangelos Georganas, Aydın Buluç, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar and Katherine Yelick.  29th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2015), Hyderabad, INDIA, May 2015. DOI: https://doi.org/10.1109/IPDPS.2015.96

Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar, and Katherine Yelick. 2014. Parallel de bruijn graph construction and traversal for de novo genome assembly. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). IEEE Press, Piscataway, NJ, USA, 437-448. DOI: http://dx.doi.org/10.1109/SC.2014.41