Bilal Mustafa Ph.D

Bioinformatician / Data Scientist

Bioinformatics / Data Scientist

Skilled in developing custom scripts and algorithms for data processing, quality control, and downstream analysis in the field of bioinformatics.

Open to collaborations and excited to contribute to cutting-edge bioinformatics projects.

2024- Present

Postdoc.

Bioinformatics Analysis and tool development for multi-OMICs Data and statistical analyses

2022 - 2023

Collaborative Research.

Designing bioinformatics pipeline for complex genetics and statistical analyses

2020 - 2022

Incheon National University, Incheon, South Korea.

Post doctoral researcher;

Design and analyze genomic and transcriptomic experiments, focusing on genome assemblies and RNA-Seq experiments on model organisms and disease vector insects

2017 - 2021

Gachon University, Incheon, South Korea

Doctoral researcher;

Genomic and Transcriptomic data analysis for cancer patients

2014 - 2018

National Testing Services, Islamabad, Pakistan.

Software Developer/Programmer;

Database Management (SQL Server) and analysis.

Key skills and experience

Extensive research experience in cancer genomics, transcriptomics, and the identification of predictive and prognostic biomarkers in cancer.
Proficient in analyzing and interpreting genomic and transcriptomic data to uncover valuable insights and contribute to advancements in cancer research.
Strong expertise in insect genomics and transcriptomics, studying the genetic basis of disease vectors and their interactions with pathogens.
Skilled in performing genome assemblies of model organisms, leveraging cutting-edge tools and techniques to generate high-quality genome sequences.
Proficient in data processing, analysis, and visualization using R, Python, SQL, and Bash scripting, enabling efficient and reproducible research workflows.
Experienced in implementing statistical analysis and machine learning algorithms to derive meaningful conclusions from complex biological datasets.
Proven ability to communicate scientific findings through presentations, scientific reports, and publications in reputable journals.
A collaborative mindset, actively engaging in interdisciplinary research projects and effectively collaborating with team members.
Committed to staying updated with the latest advancements in the field of bioinformatics and adopting innovative approaches to address research challenges.

Data Analysis

I have extensive experience in comprehensive data analysis across various biological domains, specializing in cancer genomics, transcriptomics, and insect genomics. I am proficient in processing and cleaning large-scale biological datasets using scripting languages such as R, Python, SQL, and Bash. Additionally, I have applied statistical methods, machine learning algorithms, and data visualization techniques to extract meaningful insights from complex biological data and effectively communicate research findings to both technical and non-technical audiences. I am skilled in integrating and analyzing multi-omics data, leveraging bioinformatics tools, and utilizing public databases such as TCGA, GTEx, NCBI, and ENCODE to retrieve, integrate, and compare data. Collaboration and interdisciplinary teamwork are integral to my work, where I have contributed my data analysis expertise to various research projects with impactful outcomes.

Technologies

I have hands-on experience with a diverse range of technologies in the field of bioinformatics. This includes Next-Generation Sequencing (NGS) data analysis, where I have processed and analyzed high-throughput sequencing data from platforms such as Illumina. Additionally, I have worked with Nanostring data, utilizing their advanced gene expression profiling technology. I am also proficient in handling DNA sequencing data generated from short-read sequencers, enabling me to extract valuable insights from genomic information. Furthermore, I have expertise in Third Generation sequencing technologies, specifically Oxford Nanopore, which offers long-read sequencing capabilities. Alongside these technologies, I have worked extensively with genetic variation data, particularly in genotype to phenotype association studies, using large-scale datasets such as UK Biobank. This experience has allowed me to explore the genetic basis of phenotypic traits and contribute to the understanding of complex genetic associations.

Programming & Tools

I possess a strong programming background, particularly in R, Python, and SQL, which are essential languages in the field of bioinformatics. I have utilized these languages extensively for data processing, analysis, visualization, and machine learning tasks. This expertise allows me to effectively work with diverse biological datasets and apply machine learning algorithms to extract valuable insights. Additionally, I am skilled in Bash scripting, enabling me to automate data processing and analysis workflows. Alongside programming, I have practical experience working with various bioinformatics tools such as Maestro for molecular docking, CLC Workbench for data analysis and visualization, IBM Watson for Genomics for genomic analysis, QIIME for microbiome analysis, Regenie for genetic association analysis, and PLINK for analyzing genetic variation data. These tools, combined with my machine learning proficiency, enhance my ability to address complex bioinformatics challenges.

Projects

In my previous projects, I have worked extensively on diverse datasets in the field of bioinformatics. This includes analyzing bladder cancer data obtained from TCGA, conducting comprehensive integrative analysis of mRNA and miRNA data from liver cancer patients using in-house bulk RNAseq data, and leveraging data from the Nanostring platform to investigate rectal cancer, anorectal malignant melanoma, small intestinal adenocarcinoma, and pancreas cancer. Additionally, I have undertaken de-novo genome assemblies for honey bees and six species of malaria-causing mosquitoes found in South Korea. Furthermore, I have worked on cardiovascular disease data from UK Biobank, performing genetic association studies to explore the underlying genetics of cardiovascular diseases.

Highlight project experience

Patent:

Title: ANALYSIS METHOD FOR PREDICTION OF RESPONSE TO PREOPERATIVE

CHEMORADIOTHERAPY IN RECTAL CANCER PATIENT.

PCT international application number PCT/KR2021/001605 filed on 08 February 2021 (08.02.2021)

In this study, we aimed to develop a predictive model for treatment response outcomes in locally advanced rectal cancer (LARC) patients undergoing preoperative chemoradiotherapy (PCRT) and subsequent surgery. We conducted a gene expression study using formalin-fixed paraffin-embedded (FFPE) tumor biopsy samples from 156 LARC patients. Through our analysis, we identified a nine-gene signature (FGFR3, GNA11, H3F3A, IL12A, IL1R1, IL2RB, NKD1, SGK2, and SPRY2) that effectively differentiated responders from non-responders in both the training cohort (accuracy = 86.9%) and the validation cohort (accuracy = 81.0%). This signature demonstrated independence from other clinical and pathological features, making it a robust predictor of PCRT response. Moreover, its practicality in clinical settings using FFPE samples and FDA-approved hardware and reagents further enhances its potential for guiding tailored therapies and improving oncologic outcomes for LARC patients.

https://doi.org/10.3390/cancers12040800

Conference Presentations and Abstracts

Genotype By Genotype (GxG) interaction analysis in the Ukbiobank CAD. Maria M, Mustafa B, Örd T, Kaikkonen-Määttä M. SATY Meeting, (2023)
The relationships between microbiome diversity and epidemiology in domestic species of malaria-mediated mosquitoes of Korea. Jeong-Hyeon Lee, Dong-In Kim, Giyoun Han, Bilal Mustafa, Sujin Lee, Myeong-Lyeol Lee, and Hyung Wook Kwon, RDA Annual Conference, South Korea (2021)
The effect of umami receptor (AmGr10) expression levels on growth and transcriptome in the hypopharyngeal glands of the western honey bees, Apis mellifera. Giyoun Han, Bilal Mustafa, Sujin Lee, and Hyung Wook Kwon, Spring International Conference of KSAE, South Korea (2021)

Academic, Scientific and Social Impact of my Research

I have established a successful career as a bioinformatician in the field of eukaryotic genomics, driven by a strong dedication to scientific research that has had a significant impact on clinical, academic, and social fronts. My achievements include:

Authoring publications in high-impact scientific journals, reflecting the quality and impact of my research. I earned a Bachelor’s of Science in Bioinformatics and completed a Masters leading to a Ph.D. in Health Sciences and Technology, with a focus on gene expression profiling in cancer.
Contributing to improving our understanding of the disease mechanism, the underlying biochemical pathways, and their clinical impact. I have achieved this through the development of gene signatures that aid in determining prognosis and personalized treatments (submitted for patent).
Making significant molecular insights into disease vectors and social insects by generating gapless and telomere-to-telomere genome assemblies.
Contributing to the research community's understanding of molecular changes in model organisms by identifying new and complete gene sequences through genome annotations.
Addressing the social aspect of my research through personalized medicine, which maintains the quality of life for responders and non-responders.
Disseminating my findings at various international forums, conferences, and seminars, ensuring their widespread impact and accessibility.

Overall, my dedication to advancing scientific research and my track record of publications and impactful findings demonstrate my ability to drive progress and make meaningful contributions to the field of eukaryotic genomics.