Convolutional Neural Network and Probabilistic model for Transcript Factor Binding Sites motif finding

--Jinli Zhang. GGB program. PhD student

Convolutional Neural Network

The 1D convolutional neural network has gained its popularity in natural language processing. The training data can be used to learn high-dimensional vectors, which contain patterns, which can be used as classifiers to predict classes of the unseen input. The pattern learning process is very similar to DNA motif finding - discover the string pattern concealing in DNA fragments enriched from ChIP-seq data.

In this project, I will reimplement CNN and a probabilistic model to find DNA motifs. Comparing the prediction precision from these two different models.

Project progress

  • Jan/04-Jan/15

Reading paper about convolutional neural network and motif finding, learned how the probabilistic model was used to finding the motif.

  • Jan/16 - Jan/31

    • download datasets(bed file) from Encodes, prepare .fasta files as input for PWD matrix and CNN.

    • watching a Youtube video about how Gibbs sampling was used to do the motif finding.

  • Feb/1 - Feb/13

    • implementing Gibbs sampling and Convolutional neural network on small testing data.

    • reading how Hidden Markov Model was used to do the motif finding.

    • implementing Expectation and maximization on small testing data and watching youtube https://www.youtube.com/watch?v=kq5NAd4pnkU&t=587s

  • Feb/14 - Feb/27

  • Feb/27- end