Debashis Sahoo

Boolean Lab UCSD

Assistant Professor Department of Pediatrics Department of Computer Science and Engineering University of California San Diego 9500 Gilman Drive, MC 0703Biomedical Research Facility II, Room 2119 La Jolla, CA 92093-0703Phone: 858-246-1803 Fax: 858-246-0019

Dr. Sahoo has expertise in developing computational approaches. Dr. Sahoo worked on identifying simple Boolean relationships between gene expression values. Boolean analysis is a mathematics of two values such as 0/1, low/high. Dr. Sahoo analyze large publicly available biological datasets to identify strong Boolean rules in a particular domain such as one tissue, one species or across species between mammals. These rules are called Boolean invariant if all the data collected in that domain follow the same Boolean formula.

Dr Sahooo has developed several computational approaches for analyzing gene expression dataset: (1) StepMiner, a tool that identifies step-wise transitions in the time course microarray datasets (Sahoo et al. NAR 2007), (2) BooleanNet, a method of discovering Boolean implications between genes using these large numbers of gene expression datasets (Sahoo et al. Genome Biology 2008), (3) MiDReG (Mining Developmentally Regulated Genes), a tool that uses Boolean implications to predict genes in developmental pathways (Sahoo et al. PNAS 2010), and (4) a web-based tool HEGEMON (Hierarchical Exploration of Gene Expression Microarrays ONline) that can tackle the analysis of bigdata using a browser (Dalerba, Kalisky, Sahoo et al. Nat Biotech 2011; Volkmer, Sahoo, Chin et al. PNAS 2012; Dalerba, Sahoo et al. NEJM 2016).

Among the successes was Dr. Sahoo’s previous work where he identified the expression level of CDX2 as a predictive biomarker for favorable response to conventional chemotherapy among stage II colon cancer patients (Dalerba, Sahoo et al. NEJM 2016). This discovery was made possible by mining patient derived gene expression data with a mathematical principle known as Boolean analysis that has not been commonly applied in the cancer genomics field. The Boolean analysis assigns a parameter (e.g. RNA level of a gene) with only two values, i.e., high/low, 1/0, or positive/negative. Applying the Boolean principle, it is possible to determine the relationship between the expression levels of any pair of genes (Sahoo et al. Genome Biology 2008). The Boolean principle dictates only six different relationships: two are symmetric (equivalent or opposite) and four are asymmetric ( low => low, high => low, low => high, and high => high). Dr. Sahoo used Boolean analysis to search for genes (X’s) whose expression fulfilled the “X low => ALCAM high” relationship, where ALCAM (CD166) was used as a marker of colon epithelial precursor cells residing at the bottom of colonic crypts. ALCAM (CD166) is also expressed on human colon-cancer cells with enriched tumorigenic capacity in mouse xenotransplantation models. His Boolean analyses yield 16 gene X’s, and among them is CDX2 for which clinical grade diagnostic assays were readily available. From mining a large pooled database of randomized-adjuvant therapy trials of stage II colon cancer, Dr. Sahoo’s Boolean analysis successfully developed CDX2 as a biomarker for identifying stage II tumors with favorable response after chemotherapy treatment. It should be noted that other previous analyses of the same clinical trials data were unable to deduce a biomarker for therapeutic response.

Dr. Sahoo’s research focus is on large-scale Boolean analysis of biological systems. His background in the formal analysis of large digital systems not only provides unique interpretation and simplification of complex biological systems, but also permits large-scale integration of information from multiple data sources. Recent advances in high-throughput technologies have enabled collection of immense amounts of biological data. DNA microarray and sequencing technology are used to quantify the amount of mRNA (called gene expression) for thousands of genes in parallel in a single biological experiment. Datasets are then assembled which contain such information from hundreds or thousands of different individual samples. While traditional computational approaches have limited applications because of scalability and noise reduction issues, Dr. Sahoo has demonstrated that discrete characterization of gene expression enables large-scale analysis of this high throughput microarray datasets that greatly reduces the noise.

Dr. Sahoo’s work fundamentally change the way people think about discovering critical genes that determine cell fate and neoplastic potential. Dr. Sahoo’s MiDReG tool successfully predicted B cell precursor genes (Sahoo et al. PNAS 2010) and identified a new precursor for B and T cells (Inlay, Bhattacharya, Sahoo et al. Genes Dev 2009). Dr. Sahoo’s Boolean approach have been used to simultaneously identify markers of stem and progenitor cells and prognostic biomarkers in human bladder cancer (Volkmer, Sahoo, Chin et al. PNAS 2012), colon cancer (Dalerba, Kalisky, Sahoo et al. Nat Biotech 2011; Dalerba, Sahoo et al. NEJM 2016), and prostate cancer (Sahoo et al. Oncotarget 2018). These studies elucidate differentiation hierarchy that are preserved in both normal and cancer tissues.