Recent advances in high-throughput technologies have enabled collection of immense amounts of biological data. DNA microarray and sequencing technology are used to quantify the amount of mRNA (called gene expression) for thousands of genes in parallel in a single biological experiment. Datasets are then assembled which contain such information from hundreds or thousands of different individual samples. While traditional computational approaches have limited applications because of scalability and noise reduction issues, we at Boolean lab demonstrate that discrete characterization of gene expression enables large-scale analysis of this high throughput microarray datasets that greatly reduces the noise.
Gene expression is one example of popular data that has been utilized for studying human diseases at the transcriptional level. Traditionally, the relationships between pairs of genes are identified by using a correlation coefficient. Boolean implication analysis is an alternative way to establish relationship between genes. The Boolean analysis assigns a parameter (e.g. RNA level of a gene) with only two values, i.e., high/low, 1/0, or positive/negative. Applying the Boolean principle, it is possible to determine the relationship between the expression levels of any pair of genes. The Boolean principle dictates only six different relationships: two are symmetric (equivalent or opposite) and four are asymmetric (low => low, high => low, low => high, and high => high). Previous studies on Boolean implication has been shown to produce results in B cell differentiation, bladder cancer, colon cancer, and prostate cancer.
Our recent findings fundamentally change the people think of discovering critical genes that determine cell fate and neoplastic potential. Dr. Sahoo’s MiDReG tool successfully predicted B cell precursor genes (Sahoo et al. PNAS 2010) and identified a new precursor for B and T cells (Inlay, Bhattacharya, Sahoo et al. Genes Dev 2009). Dr. Sahoo’s Boolean approach have been used to simultaneously identify markers of stem and progenitor cells and prognostic biomarkers in human bladder cancer (Volkmer, Sahoo, Chin et al. PNAS 2012), colon cancer (Dalerba, Kalisky, Sahoo et al. Nat Biotech 2011; Dalerba, Sahoo et al. NEJM 2016), and prostate cancer (Sahoo et al. Oncotarget 2018). These studies elucidate differentiation hierarchy that are preserved in both normal and cancer tissues.
Researchers at the Boolean Lab are creating disease models using Boolean implication networks. Our methods can model disease propagation by identifying patterns in large datasets, separating signal from noise and assessing them against fundamental principles underlying human biology. Analyzing our networks allows us to identify clinically relevant targets and predict trial outcomes.
ML group within the Boolean Lab focuses on using and developing computational techniques for solving problems in healthcare, in general. Some of these techniques are simple statistical models, decision trees, ensembles, and more complex deep learning (DL) models.
At the Boolean Lab, we maintain a two-way correspondence between computational and experimental biology. We validate gene targets using quantitative (RT-qPCR and LAMP) and qualitative (IHC) methods.