by Noelia Ferruz from the Department of Structural and Molecular Biology, Molecular Biology Institute of Barcelona (IBMB-CSIC)
Self-supervised methods are emerging as incredibly compelling tools in fields such as Natural Language Processing (NLP) and Computer Vision (CV), impacting the technology we use in our daily lives. Language models have shown incredible performance at understanding and generating human text, producing text often indistinguishable from that written by humans. Inspired by these recent advances, we trained a language model, ProtGTP2, which effectively learned the protein language and generated sequences in unexplored regions of the protein space. A desirable critical feature in protein design is having control over the design process, i.e., designing proteins with specific properties. For this reason, we trained ZymCTRL, a model trained on enzyme sequences and their associated Enzymatic Commission (EC) numbers. ZymCTRL generates enzymes upon user-defined specific catalytic reactions, thus enabling conditional de novo design of biocatalysts. Lastly, we have also trained a translation machine for the generation of enzymes for specific catalytic reactions. Our experimental data shows remarkable success, with high expression rates.
About the speaker: Noelia Ferruz is a chemist with a PhD in Computational Biophysics. After a short stay at Pfizer (Boston), she joined Prof Höcker's lab (Bayreuth, Germany) as a postdoc, working on developing computational methods for protein design. Noelia is currently a Group Leader at the Centre for Genomic Regulation in Barcelona, Spain focusing on the implementation of generative AI methods for protein design. This 2024, she has been awarded with an ERC Starting Grant. .
Punt Avui: La nova fornada d’èxit de científiques barcelonines. 13th September 2024.
El Pais: Noelia Ferruz, the chemist creating AI with ‘supernatural’ powers. 6th September 2024.
YouTube: Noelia Ferruz - Premio Investigación Fundación Princesa de Girona. Novembre 2024.