ProteinGym

ProteinGym is an extensive set of Deep Mutational Scanning (DMS) assays curated to assess the ability of mutation effect predictors to predict the fitness of mutated proteins. It is comprised of two benchmarks:

  • A substitution benchmark which consists of the experimental characterization of ∼1.5M missense variants across 87 DMS assays

  • An indel benchmark that includes ∼300k mutants across 7 DMS assays

This website is aimed at facilitating comparisons of a large collection of mutation effect predictors in various regimes (eg., mutation depth, taxa, MSA depth).

Instructions to download the benchmarks are available on our GitHub repository.

More details about the benchmarks are provided in our paper.

This project has been developed by:

Pascal Notin, Mafalda Dias, Jonathan Frazer, Javier Marchena-Hurtado, Aidan N. Gomez, Debora S. Marks, Yarin Gal


OATML - Oxford Applied and Theoretical Machine Learning Group

Marks Lab - Harvard Medical School