Novozymes Enzyme Stability Prediction

Help identify thermostable mutations in enzymes

Project Goal

Enzymes are proteins that act as catalysts in the chemical reactions of living organisms. The goal of this competition is to predict the thermostability of enzyme variants. The experimentally measured thermostability (melting temperature) data includes natural sequences, as well as engineered sequences with single or multiple mutations upon the natural sequences.

Understanding and accurately predicting protein stability is a fundamental problem in biotechnology. Its applications include enzyme engineering for addressing the world’s challenges in sustainability, carbon neutrality, and more. Improvements to enzyme stability could lower costs and increase the speed scientists can iterate on concepts.

[Source: Kaggle]


Exploratory Data Analysis

Model 1: XGBoost with Bayesian Optimization

Model 2: Simple Neural Networks

Model 3: BERT for Proteins (Prot_Bert)

I am currently a physics doctoral candidate and an aspiring data scientist who enjoys problem-solving and connecting the dots: be it ideas from different disciplines or applications from different industries. I am deeply interested in fundamental physics, machine learning, and AI research and their applications across disciplines.

EmailLinkedInGitHub

Questions?

Contact [email] to get more information on the project