I apply Deep Learning and other Machine Learning techniques for solving hard problems in the domain of Proteomics, Geomics, and other Computational Biology problems. I work in the Bioinformatics research group of David R. Cheriton School of Computer Science, University of Waterloo. I work in collaboration with Bioinformatics Solutions Inc. to have a better insight in diverse Proteomics problem and data. My current research works involve application of deep learning in Quantitative Proteomics, that has considerable promise in disease biomarker detection. Besides that, I have been working on proteomics data from COVID-19 patients for longitudinal analysis using different Machine Learning algorithms, which should help the doctors in therapeutic decision making.
My past research experience involves Stringolgy, Meta-Heuristics Algorithms, Distributed Computing Systems, Human Computer Interaction. In past, I have been a member in the AlEDA (http://teacher.buet.ac.bd/msrahman/aleda/) and Human Technology Interaction group, dept. of CSE, BUET.
Research Experience:
Research in Bioinformatics:
Application of Deep Learning in Quantitative Proteomics (as a part of Ph.D. research work)
Supervisor: Dr. Ming Li
Work Summary: We are the first to propose a deep learning based model, DeepIso, that combines recent advances in Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) to detect peptide features from Liquid Chromatography Mass Spectrometry (LC-MS) data, a key step in quantitative proteomics that holds considerable promise for disease biomarker detection. Next, we develop PointIso, through point cloud based models and attention based segmentation. It allows arbitrary precision and high dimensional data processing to obtain more desirable properties for the analysis of complex peptide mixtures. Our models are already showing better accuracy than the other existing algorithms based on the benchmark dataset obtained by widely used Orbitrap instrument. We are extending this model to support 4D LC-MS data obtained from more advanced mass spectrometer, namely, TimsTOF instrument. Besides that, we are working to incorporate our model in the pipeline of Label Free Quantification (LFQ) to make it more appealing in the proteomics society.
Publications:
Zohora, F.T., Rahman, M.Z., Tran, N.H., Xin, L., Shan, B. and Li, M., 2021. Deep neural network for detecting arbitrary precision peptide features through attention based segmentation. Scientific reports, 11(1), pp.1-16.
Zohora, F.T., Rahman, M.Z., Tran, N.H., Xin, L., Shan, B. and Li, M., 2019. Deepiso: A deep learning model for peptide feature detection from lc-ms map. Scientific reports,9(1), pp.1-13.
Zohora, F.T., Tran, N.H., Zhang, X., Xin, L., Shan, B. and Li, M., 2017. Deepiso: a deep learning model for peptide feature detection. arXiv preprint arXiv:1801.01539
Application of Stringology for diagnosing the genetic disorder Allelic Heterogeneity (as a part of M.Sc.Engg. thesis work)
Supervisor: Dr. M. Sohel Rahman
Work Summary: Allelic heterogeneity is a case where a normal gene mutates in different orders resulting in two different gene sequences causing two different genetic diseases. This is considered to be the greatest challenge for molecular genetic diagnosis. But since the clinical diagnosis is extremely expensive it is worth investigating whether a tractable/polynomial time algorithm exists to detect the possibility of allelic heterogeneity. And this is the main goal of this work. For the first time we propose the mapping of determining allelic heterogeneity to the consensus string matching problem. The algorithm can detect the existence of a common ancestor gene sequence for non-overlapping transposition and inversion mutation given two input DNA sequences. Existence of such common ancestor gene sequence indicates the possibility of being allelic heterogeneous.
Publications:
Fatema Tuz Zohora, M. Sohel Rahman, "Application of Consensus String Matching in the Diagnosis of Allelic Heterogeneity - (Extended Abstract)", Bioinformatics Research and Applications, Lecture Notes in Computer Science Volume 8492, 2014, pp 163-175 (in the Proceedings of International Symposium on Bioinformatics Research and Applications (ISBRA 2014), Zhangjiajie, China, 2014)
Zohora, Fatema Tuz, and M. Sohel Rahman. "An efficient algorithm to detect common ancestor genes for non-overlapping inversion and applications." Theoretical Computer Science (2016).
Zohora, Fatema Tuz, and M. Sohel Rahman. "Application of consensus string matching in the diagnosis of allelic heterogeneity involving transposition mutation." International journal of data mining and bioinformatics 13.4 (2015): 360-377.
Application of Metaheuristics techniques in the well-known genome rearrangement problem: Sorting Unsigned Permutation by Reversal
Supervisor: Dr. M. Sohel Rahman
Work Summary: In this work a modified version of the basic Scatter Search algorithm is proposed. Experimental results show that the proposed method outperforms previous works for short permutations. Now I am working on it to get a better result for long permutations as well by changing the knobs.
Research in Stringology:
Analysis the closest string /consensus string problem under the inversion and transposition distance metric (as a part of M.Sc.Engg. thesis work)
Supervisor: Dr. M. Sohel Rahman.
Work Summary: In this work, NP-hardness of the closest string problem under the distance metric: inversion is proven. Now we are working to prove the NP-hardness of the problem under the transposition metric. Besides that, an algorithm for the relaxed closest string problem is proposed which can return the existence of consensus string or closest string under transposition and inversion distance metrics, given two input strings. This work is motivated by the importance of consensus string / closest string problem in the field of bioinformatics, computational geometry, networking and so on.
Algorithm for computing the Longest Common Almost Increasing Subsequence (LCAIS) (as a part of B.Sc.Engg. thesis work)
Supervisor: Dr. M. Sohel Rahman
Work Summary: I Have been researching on variations of well-known LCS problem in the field of theoretical computer science and implemented an algorithm to find out Longest Common Almost Increasing Subsequence. Apart from being interesting from theoretical point of view, the LCAIS problem seems to have some practical motivation as well. LCAIS is useful when we plan to compare the similarity of two related activities based on their historical snapshots. Additionally, we can always cite the presence of noise as a motivation of LCAIS as a relaxed version of LCIS.
Publication: (in order of author's last name)
Johra Muhammad Moosa, M. Sohel Rahman, and Fatema Tuz Zohora, "Computing a Longest Common Subsequence that is Almost Increasing on Sequences Having No Repeated Elements", Journal of Discrete Algorithms, Elsevier, Volume 20:12–20, 2013
Research in Distributed Computing System:
Not so Synchronous RPC: RPC with Silent Synchrony Switch for Avoiding Repeated Marshaling of Data
Supervisor: Md Yusuf Sarwar Uddin
Work Summary: For the first time, we prevent the repeated marshalling in Remote Procedure Call (RPC) by introducing a behavioral pattern that we refer to as usually synchronous, but conditionally asynchronous RPC (CA-RPC). CA-RPC establishes handshaking between client stub and server stub to understand whether connection was failed after demarshalling of data at server stub. If so, the client stub then runs a pulling thread to retrieve the result without remarshalling the same data. Beside that, our strategy enhance the performance by a synchrony switch at runtime under some condition. These two strategies made our pattern suitable for cases where RPC calls involve larger data objects, say pictures, which make data marshalling to have signifi.cant overhead over low rate data connections. Our results show that CA-RPC has better performance than traditional RPC in those environments.
Publication: Fatema Tuz Zohora, Md. Yusuf Sarwar Uddin, and Johra Muhammad Moosa, "Not so Synchronous RPC: RPC with Silent Synchrony Switch for Avoiding Repeated Marshalling of Data", Distributed Computing and Networking. Springer Berlin Heidelberg, 2014. 544-549. (In Proceedings of International Conference on Distributed Computing and Networking (ICDCN), Coimbatore, India)
Research in Human Computer Interaction:
Development of the Concept and Use of Money using Computer Games for Children with Autism
Supervisor: Md. Mustafizur Rahman, Hasan Shahid Ferdous, Syed Ishtiak Ahmed
Work Summary: I have been researching the impact of multimedia in the learning process of autistic children as a member of the group ‘Human Technology Interaction (HTI)’ created and maintained by the Department of Computer Science and Engineering of BUET, dedicated to improve human life with technology. After working with the children of Autism Welfare Foundation (AWF), Bangladesh, for about six months, we developed an educational game to teach them the concept of money and how to make use of it in the shopping mall.
Publication: (in order of first names)
Arshia Zernab Hassan, Bushra Tasnim Zahed, Fatema Tuz Zohora, Johra Muhammad Moosa, Tasmiha Salam, and Md. Mustafizur Rahman, Hasan Shahid Ferdous, Syed Ishtiak Ahmed, Developing the Concept of Money by Interactive Computer Games for Autistic Children. In Proceedings of IEEE Symposium on Multimedia, IEEE Computer Society Press, California, USA, 2011.
Research in Data Mining as a part of M.Sc.Engg. course work:
-Survey on mining social media stream
-Survey on context-aware anomaly detection in indoor location traces