Structural characterization of a nematode’s orphan proteome
Meloidogyne (root-knot worms) are a genus of plant-infesting nematodes responsible for the loss of approximately 20% of global crop yields. Their proteomes contain, on average, 24% orphan proteins - i.e., proteins with no known homologous sequences outside the genus. Experimental evidence suggests that these orphan proteins may play an important role in crop infestation, making the structural and functional characterization of their orphan proteome of significant agronomic interest. However, orphan proteins with known structures are exceedingly rare. While transformer-based deep learning methods, such as AlphaFold, have revolutionized protein structure prediction, their performance on orphan proteins has not been thoroughly assessed. Methods like AlphaFold, which rely on the quality of input multiple sequence alignments (MSAs), are expected to perform poorly due to the absence of homologs. Does this also apply to single-sequence, end-to-end strategies such as ESMfold, OmegaFold, and ProstT5? To address this question, we utilized recent, high-quality sequencing data from Meloidogyne genomes to create the first proteome-wide benchmark of orphan proteins, testing various state-of-the-art protein structure prediction tools. Cross-evaluation of the predicted structures revealed important differences between the approaches, leading us to propose a homology-agnostic method for classifying the structures of orphan proteins.