RESEARCH INTEREST
Bioinformatics
Data Science
Data Mining
Soft Computing
Machine Learning
Bioinformatics
Data Science
Data Mining
Soft Computing
Machine Learning
Machine Learning for Dengue Outbreak Prediction (M.Tech.)
Feature Selection and Disease Prediction using Soft Computing Techniques (Ph.D.)
Kindly cite the article & diagram as:
Iqbal, N., & Kumar, P. (2023). From Data Science to Bioscience: Emerging era of bioinformatics applications, tools and challenges. Procedia Computer Science, 218, 1516-1528. https://doi.org/10.1016/j.procs.2023.01.130. Elsevier. (ISSN: 1877-0509) (Web of Science- CPCI, Scopus)
----------------------------------------------------
#Feature Selection #Disease Prediction #Optimization
With the gradual technological changes in the current era, data science with soft computing approaches has played a tremendous role in various areas, including bioinformatics and the health sector. Biological data mining is applied to process interesting hidden knowledge that plays a significant role in the field of bioinformatics and helps medical professionals. The etymology of bioinformatics was coined in the late twentieth century in the direction of the investigation and finding solutions for informatics processing in biotic systems. In the twenty-first century, the data science era started and played a vital role in solving a large domain, especially in biological problem solutions. In this study, we explore the role of data science and soft computing approaches in bioinformatics disciplines with various associated applications. We represented the role of various simple soft computing techniques, such as machine learning, fuzzy sets, evolutionary computation and Bayesian networks, as well as complex techniques, such as formal concept analysis and principal component analysis, for dimension reduction. We propose an integrated framework for the prediction of outbreaks, followed by omics data processing and dimension reduction. We also shed light on the active research zones in bioinformatics as well as their current trends and challenges.
Abstract: Dengue disease patients are increasing rapidly and actually dengue has recorded in every continent today according to the World Health Organization (WHO) record. By WHO report the number of dengue outbreak cases announced every year has expanded from 0.4 to 1.3 million during the period of 1996 to 2005 and then it has reached to 2.2 to 3.2 million during the year of 2010 to 2015 respectively. Consequently, it is fundamental to have a structure that can adequately perceive the pervasiveness of dengue outbreak in a large number of specimens momentarily. At this critical moment, the capability of seven prominent machine learning systems was assessed for the forecast of the dengue outbreak. These methods are evaluated by eight miscellaneous performance parameters. LogitBoost ensemble model is reported as the topmost classification accuracy of 92% with sensitivity and specificity of 90 and 94 % respectively.
Kindly cite the article & diagram as:
Iqbal, N., & Islam, M. (2019). Machine learning for dengue outbreak prediction: A performance evaluation of different prominent classifiers. Informatica, 43(3). https://doi.org/10.31449/inf.v43i3.1548
Abstract: In today's market there is cut throat competition in the banks and struggling hard to gain competitive advantage over each other. The banking industry has undergone tremendous changes in the way business conducted. They realizes the needs and techniques of data mining which is helpful tool to gather, store, capture data and convert into knowledge. The application of data mining enhances the performance of telemarketing process in banking industry. It also provide an insight how these techniques effectively used in banking industry to make the decision making process easier and productive. This work describes a data mining approach to extract valuable knowledge and information from a bank telemarketing campaign data. At this time, the potential of five data mining methods was explored for forecasting of term deposit subscription. The presentation of these techniques was evaluated on fourteen different classifier parameters. The overall better performance achieved by J48 decision tree which reported 91.2% correctly classified with sensitivity, specificity and lowest error rate of 53.8, 95.9 and 8.8 % respectively.
Kindly cite the article & diagram as:
Farooqi, R., & Iqbal, N. (2019). Performance evaluation for competency of bank telemarketing prediction using data mining techniques. International Journal of Recent Technology and Engineering, 8(2), 5666-5674. https://doi.org/10.35940/ijrte.A1269.078219
SARS CoV-2, the novel coronavirus behind the COVID-19 infection, has caused destruction around the world with human life, detecting a range of complexity which has knocked medical care specialists to investigate new innovative solutions and diagnosis strategies. The soft computing-based approach has assumed a significant role in resolving complex issues, and numerous societies have been shifted to implement and convert these innovations in response to the encounters created by the COVID-19 pandemic. To perform genome-wide association studies using RNA-Seq of COVID-19 and identify gene biomarkers, classification, and prediction using soft computing techniques of Coronavirus disease studies to fight this emergency pandemic in the epidemiological domain, and disease prognosis. The RNA-Seq profiles of both healthy and COVID-19 positive patients’ samples were considered. We have proposed an integrated pipeline from bioinformatics in-silico phase for -omic profile data processing to dimension reduction using various prominent techniques such as formal concept analysis and principal component analysis followed by machine learning phase for prediction of the disease. In this experimental research, we have applied different eminent machine learning techniques to implement an effective integrated model using Classifier Subset Evaluator (CSE) followed by principal component
analysis (PCA) for dimension reduction to select the highly significant features and then to do the classification and prediction of Coronavirus disease, different eminent classifiers have been applied on the selected features. In this analysis, the Hoeffding Tree model found the topmost performance classifier with a classification accuracy of 99.21% as well as sensitivity and specificity of 99% and 100% respectively.
Kindly cite the article & diagram as:
Iqbal, N., & Kumar, P. (2021). Coronavirus Disease Predictor: An RNA-Seq based pipeline for dimension reduction and prediction of COVID-19. In Journal of Physics: Conference Series. (Vol. 2089, p. 012025). IOP Publishing. https://doi:10.1088/1742-6596/2089/1/012025.
Background: The world has been battling the continuous COVID-19 pandemic spread by the SARS-CoV-2 virus for last two years. The issue of viral disease prediction is constantly a matter of interest in virology and the study of disease transmission over the long years.
Objective: In this study, we aimed to implement genome association studies using RNA-Seq of COVID-19 and reveal highly expressed gene biomarkers and prediction based on the machine learning model of COVID-19 analysis to combat this pandemic.
Method: We collected RNA-Seq gene count data for both healthy (Control) and non-healthy (Treated) COVID-19 cases. In this experiment, a sequence of bioinformatics strategies and statistical techniques, such as fold-change and adjusted p-value, were processed to identify differentially expressed genes (DEGs). We filtered biomarker sets of high DEGs, moderate DEGs, and low DEGs using DESeq2, Limma Trend, and Limma Voom methods based on intersection and union operations and applied machine learning techniques to predict COVID-19.
Result: Through experimental analysis, 67 potential biomarkers were extracted, comprising 49 up-regulated and 18 down-regulated genes, using statistical techniques and a set-theory consensus strategy. We trained the machine learning models on 12 different biomarker sets and found that the SVM model performed better than the other classifiers with 99.07% classification accuracy for moderate DEGs.
Conclusion: Our study revealed that identified differentially expressed genes of the moderate DEGs biomarker set, |log2FC| ≥ 2 with adjusted p-value < 0.05, work significantly as input features to implement a machine learning model using a kernel-based SVM technique to predict COVID-19.
Kindly cite the article & diagram as:
Iqbal, N., & Kumar, P. (2022). Integrated COVID-19 predictor: Differential expression analysis to reveal potential biomarkers and prediction of coronavirus using RNA-Seq profile data. Computers in Biology and Medicine, 105684. https://doi.org/10.1016/j.compbiomed.2022.105684
The world's enormous challenge is to combat the COVID-19 pandemic, which has spread by the SARS-CoV-2 virus over the past three years. Many waves are formed with diverse variations categorized as variants of concern and variants of interest by the World Health Organization (WHO). Moreover, the emergence of a new coronavirus and its susceptibility to various pandemic conditions throughout the world strongly imply that the virus is taking new shapes, resulting in divergent variations, and predicting the virus becomes more challenging. Objective: This article proposes an integrated intelligent prediction model to predict coronavirus variants of concern, such as alpha, beta, gamma, delta, and omicron, using RNA-Seq and machine learning techniques. Method: We proposed machine learning-based SARS-CoV-2 variants prediction model with 2028 RNA-Seq of alpha, beta, gamma, delta, and omicron were gathered to train prominent classifiers for the prediction of any unseen sequences of these variants. In this order, the feature vector was created from the motif of the sequence obtained through K-mer and N-gram methodologies to build prediction model. Result: The findings show that the developed prediction model based on Logistic Regression (LR) performs better by reaching an aggregated accuracy of 99% during training and testing compared to other models. We also validated the models using unseen variant sequences and found a significant prediction rate.Conclusion: Our experiment showed that the suggested SARS-CoV-2 variants prediction model is capable of predicting the SARS-CoV-2 variants of concern using a logistic regression trained model.
Kindly cite the article & diagram as:
N. Iqbal & A. Bhardwaj, "Decoding SARS-CoV-2 Variants: An in-silico approach to RNA-Seq feature extraction using K-mers and N-grams," 2024 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 2024, pp. 1-10, https://doi.org/10.1109/SCEECS61402.2024.10481950. (IEEE, Scopus)