Plenary Title
Applying Large Protein Language Models
to Predict Phage-Host Interactions
Abstract
Phages are viruses that infect bacteria. They are seen as promising complements to conventional antibiotics which are becoming increasingly ineffective due to the emergence and spread of antibiotic resistance. The growing interest in phages has driven the development of computational approaches for predicting phage-bacterium interaction. Until recently, most prediction methods relied on manual feature engineering, which posed difficulty in selecting the most informative sequence properties. We address this issue by using a variety of protein language models to produce embeddings of phages’ receptor-binding proteins, which are then fed to downstream classifiers. We show that our approach outperforms machine learning models using hand-crafted features as well as traditional sequence-alignment-based methods, especially at low sequence similarity settings.
Workshop Title
The Bioinformatics of Profiling Microbial Communities
in Wastewater from Shotgun Metagenome Samples
Abstract
Metagenomics is the application of high-throughput sequencing to DNA extracted directly from environmental, uncultured samples. With sequencing becoming more accessible to labs worldwide, we are witnessing increasing use of metagenomics to study microbial communities in environmental, ecological, agricultural, epidemiological, and clinical settings. The power of metagenomics, however, comes with the challenging task of handling and analyzing large volumes of sequence data. This hands-on workshop will use the case of shotgun metagenomic samples of hospital waste-water to demonstrate bioinformatics workflows for typical tasks such as identifying microbial composition and diversity, making statistical comparisons between samples, detecting presence and diversity of antibiotic resistance genes, etc.