Project Title : Study and analysis of advanced techniques for Secondary Structure Prediction

Project Title: Study and Analysis of Advanced Techniques for Secondary Structure Prediction.

Team Members: Rituraj Bhuyan, Steven VanLandingham, Upanita Goswami

A brief description of the project:

We will do a study different algorithms for predicting secondary structure predictions like Genetic Algorithms, Neural Networks,Hidden Markovian Models and Statistical Methods and do a comparative analysis of the various techniques and see which work better for which situations, why this is true, and how we can make it better. For example will study many genetic algorithms for secondary structure prediction, specifically what sort of operations seem to yield better results when creating solutions from an initial population of solutions. We will look at algorithms that use both closed and open operators, that is, algorithms that take a solution and always turn it into another feasible solution (closed), and algorithms that have the possibility for a solution to turn infeasible (open) after the operator. We will look at what sort of penalty schemes are used for determining how to penalize infeasible solutions, such as linear, non-linear, and constant penalties. Also, we will learn different existing tools like “GOR”, “HNN”, “NetSurfP”, “PredictProtein”, “HMMTOP” etc which are based on these various techniques to predict proteins and use them to run on different datasets from Protein Data Bank and analyze the results. Finally, we will try to propose our own technique to predict secondary structures based on our analysis of the different algorithms.

Plan of action:

  1. Study of various algorithms based on Machine Learning techniques for secondary structure Prediction. Useful links:

    http://www.ncbi.nlm.nih.gov/, http://bioinformatics.oxfordjournals.org/

  2. Compare Genetic Algorithms based Methods, Neural Network based Methods, Hidden Markovian Models and Statistical Methods of secondary structure proteins. We will get the various methods from the following link:

    http://ca.expasy.org/tools/, http://en.wikipedia.org/wiki/List_of_protein_structure_prediction_software,

    http://molbiol-tools.ca/Protein_secondary_structure.htm

  1. FASTA files from PDB and SCOP Databases. Useful Links:

    http://www.rcsb.org/pdb/home/home.do, http://scop.mrc-lmb.cam.ac.uk/scop/

  2. We will measure accuracy of the prediction of alpha helices, Beta Sheets, loops and turns of protein sequences using various techniques mentioned above.

  3. Propose a new technique for secondary structure prediction.

Work Load Distribution:

Steven : Study and analysis of genetic algorithm based techniques.

Rituraj: Study and analysis of statistical and Markov Models based techniques.

Upanita: Study and analysis of statistical and Neural Network based techniques.

Team Work: Exchange of ideas, Brainstorming of various techniques, collection of data and performing experiments and devise a new technique for predicting secondary structure of proteins.

Tentative papers list to be studied:

  1. A Permutation-Based Genetic Algorithm for Predicting RNA Secondary Structure - A Practicable Approach.

  2. Hsiang Chi Huang, "An Evolutionary Approach to Finding Schemas for 3-Class Protein Secondary Structure Prediction," .

  3. Prediction of RNA Secondary Structures by Genetic Algorithms (2002) by Ming-cheng Lin, Ming-Cheng Lin Chang-Biau, Kuo-si Huang.

  4. Analysis of an optimal hidden Markov model for secondary structure prediction”

  5. Segmental Semi Markov Model for protein secondary structure prediction”. (http://www.sciencedirect.com/).

  6. “Protein secondary structure prediction with a neural network”. L H Holley and M Karplus.

  7. “Protein Secondary Structure Analysis using Neural Network”. K Nakata.

  8. Neural networks for protein structure prediction: hype or hit? Burkhard Rost CUBIC Columbia University.