Research


Presented at 19th International Conference on Bioinformatics (InCoB 2020). This research work was published in Computational Biology and Chemistry journal published by Elsevier. This is a Scopus indexed journal with an impact factor of 2.87.


Abstract

DNA Replication plays the most crucial part in biological inheritance, ensuring an even flow of genetic information from parent to offspring. The beginning site of DNA Replication which is called the Origin of Replication (ORI), plays a significant role in understanding the molecular mechanisms and genomic analysis of DNA. Hence, it is paramount to accurately identify the origin of replication to gain a more accurate understanding of the biochemical and genomic properties of DNA. In this paper, We have proposed a new approach named OriC-ENS that uses sequence-based feature extraction techniques, K-mer, K-gapped Mono-Di, and Di Mono, and an ensemble classification technique that uses majority voting for the identification of Origin of Replication. We have used three SVM classifiers, one for the K-mer features and two more for K-Gapped Mono-Di and K-Gapped Di-mono features. Finally, we used majority voting to combine the prediction by each predictor. Experimental results on the S. Cerevisiae dataset have shown that our method achieves an accuracy of 91.62 % which outperforms other state-of-the-art methods by a significant margin. We have also tested our method using other evaluation metrics such as Matthews Correlation Coefficient (MCC), Area Under Curve(AUC), Sensitivity, and Specificity, where it has achieved a score of 0.83, 0.98, 0.90, and 0.92 respectively. We have further evaluated our model on an independent test set collected from OriDB, consisting of the sequences of Schizosaccharomyces pombe where we have seen that our model can predict the origin of replication efficiently and with great precision. We have made our python-based source code available at https://github.com/MehediAzim/OriC-ENS.

Abstract

Since December 2019, the novel coronavirus(COVID-19) has caused over 700,000 deaths with more than 10 million people being infected. Bangladesh, the most densely populated country in the world, is now under community trans-mission of the COVID-19 outbreak. This has created huge health, social, and economic burdens. Till the 7th of August 2020, Bangladesh has reported over 250,000 infected cases and 3000 deaths. To prevent further detriment in our scenario, predicting future consequences are very important. Studies have shown that machine learning(ML) models work extremely well in providing precise information regarding COVID-19 to the authorities thus enabling them to make decisions accordingly. However, to the best of our knowledge, no ML models have been applied that can help in determining the pandemic circumstance for Bangladesh demographics. In this study, we explore different machine learning algorithms that can provide more accurate estimations for predicting future cases which includes infections and deaths due to COVID-19 for Bangladesh. Based on this the government and policymakers can make a decision about the lockdown, resource mobilization, etc. Our study shows that in predicting the pandemic situations, amidst many predicting models the Facebook Prophet Model provided the best accuracy. We believe that using this information the authorities can take decisions that will lead to the saving of countless lives of the people. Additionally, this will also help to reduce the immeasurable economic burden our country is facing due to the present status quo. Furthermore, this study will help analysts to construct predicting models for future explorations.


Early Prediction of Diabetes Mellitus using Machine Learning Techniques Mazharul Islam Leon, Sayed Mehedi Azim and Md Ifraham Iqbal

(Under Review)

Abstract

Diabetes, a chronic disease that affects over 25 million in the US alone has taken its toll on ordinary people, particularly those of the older generation. 10.8% of the women in the US are affected by it. After some time, diabetes can lead to fatal diseases and deaths. In this study, a method is proposed through which we can provide diabetes diagnostic aid with the use of Machine Learning(ML). The Pima Indian Diabetes Dataset(PIDD) was used for this study. As medical data is sensitive in most cases and all the attributes in the PIDD are significant for diabetes diagnosis, our objective is to create an intelligent model that can predict the presence of diabetes without any feature selection. We developed a Neural Network(NN) model and trained it on the PIDD. Testing and evaluation demonstrated that the accuracy of 94.87% the network achieved is an improvement on the results of previous related studies. The model likewise gives sensitivity and specificity scores higher than 0.90 while being computationally more efficient. We believe this system can be used to diagnose diabetes. Additionally, it can be used as an assistive system for medical practitioners. There is no permanent solution to curing diabetes but with early prediction using this system, the impact can be limited thus, liberating the patients from an immense economic and health burden