Abstract
With the recent advances in Artificial Intelligence and computer vision, object detection techniques are gaining traction, especially in Intelligent Transportation Systems. Typically, object detection for Intelligent Transportation Systems uses traffic cameras or various other vision-based sensors. These techniques play a pivotal role in tackling challenging problems like traffic density estimation, tracking traffic violations, lane changes, detecting speeds of individual vehicles, vehicle classification, and counting. Unlike foreign conditions, Indian traffic is heterogeneous, and the lane discipline is not maintained here. Here, we propose a custom real-time count-detect-track pipeline for Indian traffic setup. First, we train the object detection model with a data augmentation strategy by combining low-quality images from Pascal VOC and a custom dataset consisting of images of Road Traffic in Indian conditions. Then, we convert the model to Tensor RT to make it more portable. This model is integrated with the SORT tracker, and then the directional counts are estimated.
Abstract
Road crashes have become a public health issue in India. They caused more than 150,000 deaths yearly for the last five years. Several states in India do not maintain a comprehensive database on accidents. However, all states are required to file FIRs for accidents reported to police. In the study police First Information Reports (FIRs) in the state of Tamil Nadu are used to extract information about road crashes using natural language processing tools and we build a model to predict road crash severity. The descriptive field in FIRs are the main source of road crash data in text format. We extract the following information from FIR text: age, gender, profession of victim, date and time of road crash, road facility involved, causes of road crash, vehicle type, body parts injured, and Indian Penal Code (IPC) under which FIR was registered. We report road crash severity in four categories - fatal injury, grievous injury, minor injury, and no injury. We observe underreporting for non-injury type crashes and hence ignore it in further analysis. We develop a neural network model to predict road crash severity from information extracted from the FIRs.
Abstract
Injury severity modeling forms an integral part of road safety research. High monetary and societal expenses associated with vehicular crashes highlight the need for a better prediction and understanding of factors that affect the severity of accidents. This study compares Logistic Regression (LR), Random Forest (RF), and Gradient Boosting Machine (GBM) for their predictive performance and interpretations for the case of motorcycle crashes in highways in the state of Tamil Nadu. Hyper-parameters of the models were tuned using the Randomized grid search cross-validation method, and the predictive performance of the models was evaluated using Matthew's Correlation Coefficient (MCC). Univariate interpretations from the models were drawn using Individual Conditional Expectation (ICE) plots, and bivariate interaction effects were quantified using Friedman's H statistic followed by their interpretations using 2-dimensional Partial Dependence plots. The RF model had better predictive performance compared to GBM and LR model in injury severity prediction. Collision with Heavy Motor vehicles, absence of shoulder, and the median was the primary cause of crashes resulting in severe and fatal injury. A pairwise interaction of colliding vehicle type variable with central divider, shoulder type, and age of two-wheeler rider variables was also significant in all three models but with a differing magnitude of interaction strength. The study provided new insights into the complex relationships between variables in determining injury severity.
Abstract
SARS-CoV-2 virus has brought the world and economy to a standstill. Even with the increasing vaccination rates, the COVID-19 infection has been at a all-time high now. The second wave of COVID-19 was deadlier than the first wave; India witnessed an unprecedented increase in infections and deaths during the period of March-May 2021. Therefore, we are forced to use our available knowledge in microbiology and population behavior to suppress the infection spread and control the pandemic. The knowledge of behavior of a specific population, i.e. how well people adhere to voluntary physical distancing and the amount of essential travel in a certain area can aid the government in framing relevant policies. The mobility data can also warn health workers about the locality that could be the next hotspot by accessing incoming risk of infection and take timely measures to contain the spread by increased testing and stay at home orders. Given the limited availability of testing kits, vaccines and life support devices like mechanical ventilators, the data could also be used to channel the resources to the locations where maximum need is predicted next.
We partnered with Facebook Data for Good and analyzed Facebook’s user location data to study the population behavior of India, and particularly Tamil Nadu, during this lockdown. We believe that making people compliant to lockdown is vital to lower the infection rate. So, we monitored the population movement and studied the behavioral patterns. We started this study with three main objectives: a) to measure the compliance of lockdown, b) to study movement patterns and c) to find the disease hotspots in a region. We covered all these objectives in this Study and we were able to reveal some interesting information about how COVID-19 and the lockdown impacts population movement.
Abstract
Contextual bandits algorithms have become essential in real-world user interaction problems in recent years. However, these algorithms represent context as attribute value representation, which makes them infeasible for real-world domains like social networks, which are inherently relational. We propose Relational Boosted Bandits (RB2), a contextual bandits algorithm for relational domains based on (relational) boosted trees. RB2 enables us to learn interpretable and explainable models due to the more descriptive nature of the relational representation. We empirically demonstrate the effectiveness and interpretability of RB2 on tasks such as link prediction, relational classification, and recommendation.
Abstract
In this project, we consider the multi-task learning framework where a single agent is trained to perform multiple tasks so that it would be able to generalise better across different tasks. One of the methods used in training the RL agent to achieve this is using a distillation learning-based method which consolidates multiple policies, learned to perform each of different tasks specifically, into a single policy using the principles of transfer learning. This paper ( “Learning to Multi-Task by Active Sampling” - https://arxiv.org/abs/1702.06053 ) presents an efficient on-line learning method for training a multi-task RL Agent using active learning principles that doesn’t require the learning of individual task-specific policies before training the multi-task agent. In this project, we are extending the work presented in this paper and the goals of this project are: 1.To develop an explainability mechanism for this multi-task learning framework to be able to identify whether the tasks in the multi-task instance share any sub-tasks and explain about the performance of multi-task agent based on this. This is done by observing the activation distributions of the neurons when each of the tasks is performed and their correlation to the actions taken by the multi-task agent. 2. To develop an efficient meta-learning method to extend the multi-task agent already trained on a set of tasks to a new task. This is done by drawing insights about the aspects shared across the tasks from part 1 and using these to increase the efficiency of fine-tuning the multi-task agent to perform the new task.
7. Reinforcement Learning for Unified Allocation and Patrolling in Signaling Games with Uncertainty Video
Aravind Venugopal
Abstract
Green Security Games (GSGs) have been successfully used in the protection of valuable resources such as fisheries, forests and wildlife. While real-world deployment involves both resource allocation and subsequent coordinated patrolling with communication and real-time, uncertain information, previous game models do not fully address both of these stages simultaneously. Furthermore, adopting existing solution strategies is difficult since they do not scale well for larger, more complex variants of the game models. We therefore first propose a novel GSG model that combines defender allocation, patrolling, real-time drone notification to human patrollers, and drones sending warning signals to attackers. The model further incorporates uncertainty for real-time decision-making within a team of drones and human patrollers. Second, we present CombSGPO, a novel and scalable algorithm based on reinforcement learning, to compute a defender strategy for this game model. CombSGPO performs policy search over a multi-dimensional, discrete action space to compute an allocation strategy that is best suited to a best-response patrolling strategy for the defender, learnt by training a multi-agent Deep Q-Network. We show via experiments that CombSGPO converges to better strategies and is more scalable than comparable approaches. Third, we provide a detailed analysis of the coordination and signaling behavior learnt by CombSGPO, showing group formation between defender resources and patrolling formations based on signaling and notifications between resources. Importantly, we find that strategic signaling emerges in the final learnt strategy. Finally, we perform experiments to evaluate these strategies under different levels of uncertainty.
Abstract
We study a contextual bandit setting where the agent has the ability to perform interventions on targeted subsets, apart from possessing qualitative causal side-information. This novel formalism captures intricacies in real-world scenarios such as software product experimentation where targeted experiments can be conducted. However, this fundamentally changes the set of options that the agent has, compared to standard contextual bandit settings, creating a need for new techniques. This is also the first work that integrates causal side-information in a contextual bandit setting, where the agent aims to learn a policy that maps contexts to arms (as opposed to just identifying one best arm). We propose a new algorithm, which we show empirically outperforms baselines on purely synthetic and real world-inspired synthetic data. We also prove a bound on its simple regret that theoretically guards performance.
Systems Biology and Healthcare
9. Using AI to Improve Maternal and Child Health Outcomes by Increasing Program Engagement through Targeted Interventions Video
Siddharth Nishtala
Abstract
India accounts for 12% of maternal deaths and 16% of child deaths globally. Lack of access to preventive care information is a significant problem contributing to high maternal and child mortality numbers, especially in low-income households. We partner with ARMMAN, a non-profit based in India, to further the use of call-based information programs by early-on identifying women who might not engage on these programs that are proven to affect health parameters positively. We analyzed anonymized call-records of over 300,000 women registered in an awareness program created by ARMMAN that uses cellphone calls to regularly disseminate health related information. We built machine learning based models to predict the long term engagement pattern from call logs and beneficiaries' demographic information, and discuss the applicability of this method in the real world through a pilot validation. Through a randomized controlled trial, we show that using our model's predictions to make interventions boosts engagement metrics by 61.37%.
10. Sequence neighborhoods enable reliable prediction of pathogenic mutations in cancer genomes Video
Shayantan Banerjee
Abstract
Identifying cancer-causing mutations from sequenced cancer genomes hold much promise for targeted therapy and precision medicine. “Driver” mutations are primarily responsible for cancer progression, while “passengers” are functionally neutral. Although several computational approaches have been developed for distinguishing between driver and passenger mutations, very few have concentrated on using the raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models. Using experimentally validated cancer mutation data in this study, we explored various string-based feature representation techniques to incorporate information on the neighborhood bases immediately 5′ and 3′ from each mutated position. Density estimation methods showed significant distributional differences between the neighborhood bases surrounding driver and passenger mutations. Binary classification models derived using repeated cross-validation experiments provided comparable performances across all window sizes. Integrating sequence features derived from raw nucleotide sequences with other genomic, structural, and evolutionary features resulted in the development of a pan-cancer mutation effect prediction tool, NBDriver, which was highly efficient in identifying pathogenic variants from five independent validation datasets. An ensemble predictor obtained by combining the predictions from NBDriver with three other commonly used driver prediction tools (FATHMM (cancer), CONDEL, and MutationTaster) significantly outperformed existing pan-cancer models in prioritizing a literature-curated list of driver and passenger mutations. Using the list of true positive mutation predictions derived from NBDriver, we identified a list of 138 known driver genes with functional evidence from various sources. Overall, our study underscores the efficacy of using raw nucleotide sequences as features to distinguish between driver and passenger mutations from sequenced cancer genomes.
11. Analysis of Radiomic Features in Correlation with Tumor Aggression in Lung PET-CT Scans Video
Tejas Umesh
Abstract
Lung cancer is the most common form of cancer. Modalities like PET and CT scans in medical imaging allow doctors to visualise and extract more information from the tumors. In this study, we aim to determine which radiomic features, that best represent the tumor, correlate with the aggressiveness of the lung tumor.
We performed our analysis on 56 subjects, out of which 39 were diagnosed with various degrees of aggressive lung cancer, and 17 had low-grade or non-existent aggression. We segmented the tumors for each patient from their lung PET and CT scans, using binary thresholding, and extracted 87 different radiomic features respectively. This resulted in a total of 174 radiomic features, per subject. The extracted features include 1st order, 2nd order, and other higher order statistical and radiomic features. We used a Random Forest model to select which features correlate with the aggressiveness of the tumor out of the entire set. This resulted in 31 important features selected out of the 174. Furthermore, using all the extracted radiomic features as input, we built a Support Vector Machine (SVM) model to perform a binary classification task of predicting the aggression of the input tumor. We then built our model again, with only the 31 important features, and performed the same classification. We compared the model predictions to validate that our chosen features do indeed capture the entire space relating to the aggression of the tumor, out of all the total set.
Abstract
The recent years have seen several advances in the use of bioprocessing to produce a variety of chemicals. Yet, there are many challenges in making the bioprocesses more economically feasible. Several computational methods have been developed to predict strategies to engineer microbes for over-producing a given product. However, it may be more advantageous (economically, or otherwise), to co-produce two or more products. This proves useful in case of biofuels, where we need to co-produce a pool of metabolites, rather than a single one. In this work, we propose an approach to identify potential intervention strategies to co-optimize for a set of metabolites. Knock-out targets can be obtained for metabolites independently using existing algorithms like OptKnock (Burgard, Pharkya, & Maranas, 2003) etc. However, the identification of amplification targets is more challenging. In order to identify amplification targets in addition to knock-out targets, we use a methodology based on Flux Scanning through Enforced Objective Flux (FSEOF) (Choi, Lee, Kim, & Woo, 2010). We study the flux variability through all the reactions in the cell, as increasing amounts of product flux are enforced by the constraints and classify them as knock-out targets and amplification targets based on their response to the increase in product flux. We have applied this idea for the various exchange metabolites in Saccharomyces cerevisiae and identified the targets which are shared across multiple metabolites. Pyruvate decarboxylase is one of the targets identified which, when knocked out, can improve the production of three metabolites, namely isobutyl alcohol, succinate and pyruvate simultaneously. We have also examined the ability of Saccharomyces cerevisiae to co-produce various other metabolites like aspartate and isobutyl acetate; formate and indole-3-ethanol.
Abstract
Understanding the patterns of co-occurrence of somatic mutations in cancer
The genetic variations in the cancer genome affect the normal functioning of the cell by dysregulating biological pathways [1]. A single driver mutation alone cannot result in pathway disruption leading to cancer. Two or more genetic alterations should occur together and interact. These interactions could be co-occurrence or mutual exclusivity of the alterations [2]. The objective is to identify the patterns of co-occurrence between the somatic mutations in cancer and correspondingly their collective impact on biological pathways. Altered cancer gene profile in the form of a binary matrix obtained from the COSMIC database for breast cancer data. The binary matrix contains genes in columns, different samples as rows and, the values 1 or 0 indicate the presence or absence of an altered gene in that sample. Jaccard similarity estimated between all possible gene pairs of the binary matrix across samples. The gene pairs with a higher Jaccard similarity value indicate they are similar in the distribution of mutations across the samples, thereby suggesting possible co-occurrence. Bootstrap test performed to validate the significance of the similarity analysis. The Jaccard similarity value distribution of the original cancer matrix and bootstrapped matrix compared using statistical tests. The distributions were found to be significantly different. The gene pairs having a Jaccard similarity value above 0.5 extracted and network constructed using Cytoscape. Genes of the similarity network enriched using the STRING application in the Cytoscape. The genes that co-occur in the enriched pathways are identified.
References
[1] Li, H. T., et al. “Identification of Driver Pathways in Cancer Based on Combinatorial Patterns of Somatic Gene Mutations.” Neoplasma, vol. 63, no. 01, 2016, pp. 57–63., doi:10.4149/neo_2016_007.
[2] Kuipers, Jack, et al. “Mutational Interactions Define Novel Cancer Subgroups.” Nature Communications, vol. 9, no. 1, 2018, doi:10.1038/s41467-018-06867-x.
14. iCOMIC: a graphical interface-driven pipeline for cancer genome data analysis workflows Video
Keerthika M
Abstract
Despite the tremendous increase in omic data generated by modern sequencing technologies, their analysis can be very tricky and often requires substantial bioinformatics expertise. To address this concern, we have developed a user-friendly pipeline to analyse (cancer) genomic data that takes in raw sequencing data (FASTQ format) as input, and outputs insightful statistics on the nature of the data. Our pipeline, the iCOMIC toolkit, is capable of analyzing both whole-genome and transcriptome data and is embedded in ‘Snakemake’, a workflow management system. iCOMIC is characterized by a user-friendly GUI, that offers several advantages, including the execution of analyses with minimal steps thereby eliminating the need for complex command line arguments. The toolkit features many independent core workflows for both whole genomic and transcriptomic data analysis. Even though all the necessary, well-established tools are integrated into the pipeline to enable ‘out-of-the-box’ analysis, we provide the user with the necessary means to replace modules or to alter the ‘pipeline’ as needed. Importantly, we have integrated important algorithms for the prediction of driver and passenger mutations based on mutational context, as well as for the prediction of tumor suppressor genes and oncogenes, developed in-house.
Abstract
Identifying behaviour patterns, and the variations in them, can provide insights into the functioning of neural circuitry and also serve as an indication of neurological diseases. However, capturing such a novel behavioural repertoire from unstructured datasets proves to be prohibitively expensive. In this study, an unsupervised learning approach is considered to automatically identify and characterize various action primitives allowing for an in-depth study of the dynamics of naturalistic behavior.
This work studies open-field, naturalistic behaviour of mice. Pose information extracted from unstructured video data is used to derive geometric features used to best represent the animal. Density based clustering is used to identify distinct behaviour clusters by applying it to a lower dimensional embedding of features extracted from millions of frames. This labelled dataset is used to train a neural network to classify frames into identified behaviour clusters.
The trained model is used to study how the identified behaviours are recruited in mice across different genetic strains. Preliminary statistics about each behaviour are extracted and behavioural state maps are created to quantify the transitions between behaviours.
Abstract
Given the low success rate, slow pace, huge costs involved in the discovery of novel drugs through de-novo or traditional methods, drug repositioning/repurposing of existing approved drugs to treat common , novel emerging and re-emerging diseases is an effective strategy as it is highly efficient, time-saving, low-cost development and minimum risk of failure. Various data-driven approaches have been suggested and the recent advances of AI approaches like DL & ML models led to high gains in property prediction performance at par with experimental approaches. A very important component of the drug repurposing pipeline is the computational technique used for property prediction. In this work, we investigated a graph neural network approach named Directed Message Passage Neural Network (D-MPNN) which models the molecules based on the 2D graph structure for property prediction. The D-MPNN model is evaluated on different datasets which include prediction of growth inhibition activity against E.coli; prediction whether molecule binds to the SARS-COV-2 3CL protease and prediction of cleavage of an octamer (peptide) by HIV-1 protease. The D-MPNN performance was compared against the respective baselines. We also implemented a neural network model that models a molecule as a set of functional sub-structures to compare it against the D-MPNN model based on the prediction of growth inhibition against E.coli.
Abstract
The last two decades have seen a drastic change in computer aided drug design. With recent developments in machine learning, there have been significant advancements in the field of bioinformatics. We attempt to generate molecules with desired set of properties based on various machine learning techniques which use protein-ligand 3D structure, prediction of binding affinity, and quantitative structure activity relationships. We have implemented a novel approach which uses reinforcement learning to generate chemical libraries with a bias towards the compounds with desired properties. We used two machine learning models, the first one being known as generator which aims at generating molecules in the form of simplified molecular-input line-entry system (SMILES) and the second being the predictor whose purpose is to predict the logP values of the generated compounds. Following this, we used an RL agent to bias the generation of chemical structures towards those with the desired properties. Synthetic accessibility score (SAS) have been tabulated for the generated compounds and the results have been discussed
Abstract
Hydrothermal vents are deep-sea habitats where the mixing of hot subsurface fluids with cold ocean water result in temperature and chemical gradients. This mixing also forms hydrothermal ‘plumes, ’ which substantially impact broader deep-sea microbial communities and biogeochemistry. In this project, we study the hydrothermal plume microbial community from the Guaymas Basin, Gulf of California. We characterize metabolic dependencies within this community by combining omics-based approaches and in silico modeling. Given the extreme nature in which these communities thrive, we explore mechanisms responsible for the evolution of these dependencies and mine gene clusters that equip this community with bioactive compound production capabilities.
Rachita K. Kumar
Abstract
Recent studies have provided interesting insights into the persistence and succession of microorganisms aboard the International Space Station (ISS), notably the dominance of Klebsiella pneumoniae. However, the interactions between the various microbes on the ISS, and how it shapes the microbiome remain to be clearly understood. In this study, we apply a computational approach to predict possible interactions in the ISS microbiome, and shed further light on its organisation.
Through a combination of a systems-based graph-theoretical approach, and a constraint-based community metabolic model approach, we demonstrate several key interactions in the ISS microbiome. These complementary approaches provide insights into the metabolic interactions and dependencies present amongst various microbes in a community, highlighting key interactions and keystone species. Our results show that the presence of K. pneumoniae is beneficial to many other microbes it co-exists with, notably those from Staphylococcus and Pantoea spp. Microorganisms from these two species were also found to be highly dependent on other organisms in the community, for survival. Species belonging to Enterobacteriaceae family were found to be the most beneficial, for the survival of other organisms in the ISS microbiome. Notably, our studies pointed towards an amensalistic interaction between K. pneumoniae and Aspergillus fumigatus, which we could verify through experiments.
Our study underscores the importance of K. pneumoniae in the ISS, and its potential contribution to the survival of other microbes, including pathogens, aboard the ISS. Our integrated modelling approach, combined with experiments demonstrates immense potential for understanding the organisation of other such microbiomes, unravelling key organisms and their interdependencies.
Network Science
Abstract
Multiplex networks are complex graph-structures in which a set of entities are connected to each other via multiple types of relations, each relation representing a distinct layer. Such graphs have been used to investigate many complex biological, social, and technological systems. In this work, we present a novel semi-supervised approach for structure-aware representation learning on multiplex networks. Our approach relies on maximizing the mutual information between local node-wise patch representations and label correlated structure-aware global graph representations to jointly model the nodes and cluster structures. Specifically, it leverages a novel cluster-aware, node-contextualized global graph summary generation strategy for effective joint-modeling of node and cluster representations across the layers of a multiplex network. Empirically, we demonstrate that the proposed architecture outperforms state-of-the-art methods in a range of tasks: classification, clustering, visualization, and similarity search on seven real-world multiplex networks for various experiment settings.
Abstract
Many real-world systems involve higher-order interactions and thus demand complex models such as hypergraphs. For instance, a research article could have multiple collaborating authors, and therefore the co-authorship network is best represented as a hypergraph. In this work, we focus on the problem of hyperedge prediction. This problem has immense applications in multiple domains, such as predicting new collaborations in social networks, discovering new chemical reactions in metabolic networks, etc. Despite having significant importance, the problem of hyperedge prediction hasn’t received adequate attention, mainly because of its inherent complexity. In a graph with n nodes the number of potential edges is , whereas in a hypergraph, the number of potential hyperedges is . To avoid searching through the huge space of hyperedges, current methods restrain the original problem in the following two ways. One class of algorithms assume the hypergraphs to be k-uniform where each hyperedge can have exactly k nodes. However, many real-world systems are not confined only to have interactions involving k components. Thus, these algorithms are not suitable for many real-world applications. The second class of algorithms requires a candidate set of hyperedges from which the potential hyperedges are chosen. In the absence of domain knowledge, the candidate set can have possible hyperedges, which makes this problem intractable. More often than not, domain knowledge is not readily available, making these methods limited in applicability. We propose HPRA - Hyperedge Prediction using Resource Allocation, the first of its kind algorithm, which overcomes these issues and predicts hyperedges of any cardinality without using any candidate hyperedge set. HPRA is a similarity-based method working on the principles of the resource allocation process. In addition to recovering missing hyperedges, we demonstrate that HPRA can predict future hyperedges in a wide range of hypergraphs. Our extensive set of experiments shows that HPRA achieves statistically significant improvements over state-of-the-art methods.
Abstract
Network reconstruction is a combined task of identifying the entities present and the relation between the identified entities. The constructed network can used for summarizing the information present in billions of paper, the biologists can use this network and back-trace the research articles/paper which contain specific information of his interest. Today, the amount of information content available is very huge and also a lot of information gets appended/ modified daily, which makes the task of network construction manually infeasible, demands automatic construction, using text mining techniques.
In this paper, we discuss the Named Entity Recognition (NER) task (task of identifying the enti-ties present) and Relation Extraction (RE) task (Extracting all the relations present between the enti-ties) and construct the Network on various datasets. We also discuss in detail about the architecture ofBERT/BioBERT which was modified for NER and RE task, the pre-processing to be done as well. Wewill discuss about Ex-bert as well where only the extension vocabulary is trained. Another outcome ofthis paper is a dataset of 1.6 crore (version 1) and 18 lakh(version 2) examples, which can be used forpre-training DNNs like BERT/BioBERT/Ex-BERT.
Abstract
Biological networks, economic networks, water and power distribution networks etc. can be visualized as a chain of interconnected physical or abstract edges through which different entities flow from one node to other nodes. Furthermore, flow through the edges in such networks is conserved at each incident node. This property of conservation leads to a set of networks which is referred to as conserved networks. Reconstruction of network topology from data is one of the important problems in network science. In this work, we propose to develop algorithms for reconstructing multiple conserved networks from flow data.
The network data can belong to multiple networks and hence it is important to cluster the data belonging to different networks and then apply the reconstruction algorithms for each of the individual networks. To achieve the segregation of data belonging to different networks, we have applied different subspace clustering such as Generalized Principal Component Analysis, Sparse Subspace Clustering, Elastic Net Subspace Clustering and Sparse Subspace Clustering with Orthogonal Matching Pursuit. The effectiveness of each of the algorithms is determined for data with varying dimensions. Some of these algorithms also proved to be effective in cases where noise is present in the data.
System Identification
Abstract
Convergent Cross Mapping (CCM) was introduced as a technique to identify causal links in weakly coupled deterministic non-linear systems where other causal definitions, like the celebrated Granger Causality, fail due to their limited applicability to stochastic systems. CCM tests causality by how well the cause can be recovered from the observed effect variable. It makes use of univariable state space reconstruction for recovering time series.
One of the major drawbacks of this method is its inability to distinguish between direct and indirect causal links that is necessary for reconstructing the causal network from observed time series. In this work, we propose a two-step method to solve this issue: First, we perform the regular CCM analysis and identify all the effects (both direct and indirect) linked to a cause. Next, we perform a multivariable state-space reconstruction using the identified effect variables and use it to recover the cause variable. We then evaluate how much of an improvement in the recovery is seen as compared to the univariable case. A large improvement indicates that the effect is an indirect one, while the converse indicates a direct effect. The efficacy of this method is illustrated using simulated examples.
Abstract
Detecting the onset of p-waves in seismic signals is a standard method in early-warning earthquake detection systems. This work develops a framework to detect p-waves by combininģ time series modeling of seismic noise and the Bayesian Online Changepoint Detection (BOCD) framework developed by Adams & MacKay, 2007; Fearnhead & Liu, 2007. The non-Gaussian and non-linear models used in this domain produce non-tractable posteriors which do not have closed-form analytical expressions as required by the BOCD framework. This work uses particle Markov Chain Monte Carlo (MCMC) methods to overcome this. Unlike existing methods, the Bayesian framework used here also yields the probability of a p-wave detection. This can be used to gauge the system’s confidence in detections and avoid false positives. Moreover, in contrast to the existing methods, the limited assumptions about the p-wave characteristics and the absence of moving window-based techniques demand minimal location-based tuning in adopting this method. The performance of the method is illustrated on several real-life seismic events of varying signal-to-noise ratios (SNR) and compared with existing methods.
26. Localgini: A novel Thresholding Algorithm for Developing Context-specific Models from Omics Data Video
Pavan Kumar S
Abstract
Genome-scale metabolic models (GEMs) allow us to reconstruct metabolic networks for better understanding of biological processes. Typically, GEMs cover up all the reactions in an organism. However, under specific conditions in a specific tissue, a subset of these reactions are active. Hence, context-specific models are developed using GEMs and expression data. For developing context-specific models, we need to make the following three decisions: 1) Thresholding on omics data to define whether a gene is active/inactive 2) constraints on the models 3) type of model extraction method (algorithm) to be used. Constraining the model requires some additional information like metabolomics data or context-specific knowledge, while thresholding methods and algorithms solely rely on omics data and GEM. Most of the existing procedures on thresholding depend on giving a few arbitrary numbers to the gene expression data to define the importance of reactions, under the assumption that all the genes have the same expression pattern. Housekeeping reactions which are required to be present in all the contexts for cellular maintenance and the enzymes translated by stable mRNAs are not captured by these thresholding methods. In this work, we present a new approach called “Localgini” which gives a gene-specific threshold by interpreting the distribution of gene expression among different contexts. This method considers both the magnitude of gene expression and the Gini coefficient of the gene to define a particular gene to be active/inactive. Localgini-based thresholding is used to build context-specific models for NCI60 cancer cell lines and 32 tissues. Then, Localgini approach is compared with other thresholding algorithms in the literature. It is shown that Localgini based models are able to capture more housekeeping functions than the existing thresholding algorithms. Gene essentiality predictions of the models are also improved. Active reactions defined by Localgini are more self-consistent compared to the existing standard algorithms in the literature.
Computer Vision
Amrit Diggavi Seshadri
Abstract
We propose a novel Multi-Headed Spatial Dynamic Memory Generative Adversarial Network (MSDM-GAN) for the task of text-to-image generation. Synthesizing high-quality, realistic images from text-descriptions is a challenging task, and current methods synthesize images from text in a multi-stage manner, typically by first generating a rough initial image and then refining image details at subsequent stages. However, existing methods that follow this paradigm suffer three important limitations. Firstly, they synthesize initial images without attempting to separate image attributes at a word-level. As a result, object attributes of initial images (that provide a basis for subsequent refinement) are inherently entangled and ambiguous in nature. Secondly, by using common text-representations for all regions, current methods prevent us from interpreting text in fundamentally different ways at different parts of the image. Different image regions are therefore only allowed to assimilate the same type of information from text at each refinement stage. Finally, current methods modify an image only once at each refinement stage, limiting the scope of improvement at each stage to only a few closely related aspects of the image. To address these shortcomings, our proposed method introduces three novel components: (1)An initial generation stage that explicitly disentangles image attributes at a word-level. (2)A spatial dynamic memory module for refinement of images. (3)An iterative multi-headed mechanism to address multiple aspects of the image at each refinement stage. Experiment results demonstrate that our MSDM-GAN significantly out-performs the previous state-of-the-art, decreasing the lowest reported Fréchet Inception Distance by 21.58% on the CUB dataset and by 4.21% on the COCO dataset.
Abstract
3D hand pose estimation from depth images is a highly complex task. Current state-of-the art 3D hand pose estimators only focused on the accuracy of the model based on the test data but overlooked the resulting hand pose's anatomical correctness. In this paper, we present the Single Shot Corrective CNN (SSC-CNN) framework to tackle the problem at the architecture level. In contrast to previous works which uses post-facto pose filters, SSC-CNN predicts the hand pose that implicitly conforms to the human hand's biomechanical bounds and rules in a single forward pass. The model was trained and tested on the HANDS2017 and MSRA datasets. Experiments show that our proposed model shows comparable accuracy to the state-of-the-art models. However, the previous methods have high anatomical errors whereas our model is free from such errors. Experiments also show that the ground truth provided in the datasets used also suffer from anatomical errors and an Anatomical Error Free (AEF) version of the datasets namely AEF-HANDS2017 and AEF-MSRA was created. Future works include incorporating biomechanically constrained velocity bounds in the network architecture.
Abstract
The ability to capture good quality images in the dark and near-zero lux conditions has been a long-standing pursuit of the computer vision community. The seminal work by Chen et al. has especially caused renewed interest in this area, resulting in methods that build on top of their work in a bid to improve the reconstruction. However, for practical utility and deployment of low-light enhancement algorithms on edge devices such as embedded systems, surveillance cameras, autonomous robots and smartphones, the solution must respect additional constraints such as limited GPU memory and processing power. With this in mind, we propose a deep neural network architecture that aims to strike a balance between the network latency, memory utilization, model parameters, and reconstruction quality.
The key idea is to forbid computations in the High-Resolution (HR) space and limit them to a Low-Resolution (LR) space. However, doing the bulk of computations in the LR space causes high frequency artifacts in the restored image. We thus propose a new architecture to limit this effect.
State-of-the-art algorithms on dark image enhancement need to pre-amplify the image before processing it. However, they generally use ground truth information to find the amplification factor even during inference, restricting their applicability for unknown scenes. In contrast, we propose a simple yet effective light-weight mechanism for automatically determining the amplification factor from the input image that can be used off the shelf with existing pretrained models.
We show that we can enhance a full resolution, $2848 \times 4256$, extremely dark single-image in the ballpark of $3$ seconds even on a CPU. We achieve this with $2-7\times$ fewer model parameters, $2-3\times$ lower memory utilization, $5-20\times$ speed up and yet maintain a competitive image reconstruction quality compared to the state-of-the-art algorithms.
Other Topics
Abstract
Recent studies on interpretability of attention distributions have led to notions of faithful and plausible explanations for a model's predictions. Attention distributions can be considered a faithful explanation if a higher attention weight implies a greater impact on the model's prediction. They can be considered a plausible explanation if they provide a human-understandable justification for the model's predictions. In this work, we first explain why current attention mechanisms in LSTM based encoders can neither provide a faithful nor a plausible explanation of the model's predictions. We observe that in LSTM based encoders the hidden representations at different time-steps are very similar to each other (high conicity) and attention weights in these situations do not carry much meaning because even a random permutation of the attention weights does not affect the model's predictions. Based on experiments on a wide variety of tasks and datasets, we observe attention distributions often attribute the model's predictions to unimportant words such as punctuation and fail to offer a plausible explanation for the predictions. To make attention mechanisms more faithful and plausible, we propose a modified LSTM cell with a diversity-driven training objective that ensures that the hidden representations learned at different time steps are diverse. We show that the resulting attention distributions offer more transparency as they (i) provide a more precise importance ranking of the hidden states (ii) are better indicative of words important for the model's predictions (iii) correlate better with gradient-based attribution methods.
31. A Causal Approach for Unfairness Prioritization and Discrimination Removal Video
Pavan Ravishankar
Abstract
In budget-constrained scenarios it is imperative to prioritize sources of unfairness before mitigating their underlying unfairness, considering that resources are limited. Unlike previous works that only make cautionary claims of discrimination and de-biases data after its generation, this paper also attempts to prioritize the unfair sources essential for mitigating their unfairness in the real world. We assume that a non-parametric markovian causal model, representative of the data generation procedure, along with the sensitive nodes, that result in unfairness, is given. We quantify edge flow, which is the belief flowing along an edge, using path-specific effects and use it to quantify edge unfairness. We then prove that cumulative unfairness is non-existent in any decision, like judicial bail, towards any sensitive groups, like race, when edge unfairness is absent, given an error-free model of conditional probability tables. We prove this result for the non-trivial non-parametric model setting of conditional probability when the cumulative unfairness cannot be expressed in terms of edge unfairness. We then measure the potential to mitigate the cumulative unfairness when edge unfairness is decreased. Based on these measures, we propose an unfair edge prioritization algorithm that prioritizes the unfair edges that can then be used by policymakers to mitigate the unfairness in the real world. We also propose a discrimination removal procedure that de-biases data distribution. The experimental section validates the specifications used for quantifying the above measures.
Abstract
Hidden Markov models (HMMs) have been found to be useful in modeling complex time series data across a number of applications, and an appropriate distance measure between HMMs is of both theoretical and practical interest. Clustering of sequential or temporal data, is typically more challenging than traditional clustering as such data requires dynamic rather than static processing of measures. A number of algorithms have been proposed to improve clustering performance, less attention has been paid to the definition of the distance measure between HMMs. At present, Kullback-Leibler (KL) divergence is widely used as a measure to discriminate between two HMMs. But all the metrics are pre-defined(distance depends only on the two HMMs at hand) Learned metric is very useful for most of the scenarios.In most of the application data comes from the same source, So it is good to have learned metric. In this work, we attempt to exploit the graph properties of HMMs using graph neural networks to learn an appropriate distance metric for computing the distance between two HMMs. We examine the performance of (1) A simple autoencoder, (2) A modified variational graph autoencoder and (3) Diff Pooling based metrics in comparison to existing Metrics. We were surprised to see Diff pooling based supervised metric performed exceptionally well. It gives 85% accuracy on the FSDD dataset while the rest reached at max 34%.
Abstract
Automated medical coding is a process of codifying clinical notes to appropriate diagnosis and procedure codes automatically from the standard taxonomies such as ICD (International Classification of Diseases) and CPT (Current Procedure Terminology). The manual coding process involves the identification of entities from the clinical notes followed by querying a commercial or non-commercial medical codes Information Retrieval (IR) system that follows the Centre for Medicare and Medicaid Services (CMS) guidelines. We propose to automate this manual process by automatically constructing a query for the IR system using the entities auto-extracted from the clinical notes. We propose \textbf{GrabQC}, a \textbf{Gra}ph \textbf{b}ased \textbf{Q}uery \textbf{C}ontextualization method that automatically extracts queries from the clinical text, contextualizes the queries using a Graph Neural Network (GNN) model and obtains the ICD Codes using an external IR system. We also propose a method for labelling the dataset for training the model. We perform experiments on two datasets of clinical text in three different setups to assert the effectiveness of our approach. The experimental results show that our proposed method is better than the compared baselines in all three settings.
Abstract
A classification tree is grown by repeated partitioning of a dataset based on a predefined split criterion. The node split in the growth process depends only on the class ratio of the data chunk that gets split in every internal node of the tree. In a classification tree learning task, when the class ratio of the unlabeled part of the dataset is available, it becomes feasible to use the unlabeled data alongside the labeled data to train the tree in a semi-supervised style. Our motivation is to facilitate the usage of the abundantly available unlabeled data for building classification trees, as it is laborious and expensive to acquire labels. In this paper, we propose a semi-supervised approach to growing classification trees, where we adapted the Maximum Mean Discrepancy (MMD) method for estimating the class ratio at every node split. In our experimentation using several binary and multiclass classification datasets, we observed that our semi-supervised approach to growing a classification tree is statistically better than traditional decision tree algorithms in 31 of 40 datasets.
Abstract
When an agent encounters a continual stream of new tasks in the lifelong learning setting, it leverages the knowledge it gained from the earlier tasks to help learn the new tasks better. In such a scenario, identifying an efficient knowledge representation becomes a challenging problem. Most research works propose to either store a subset of examples from the past tasks in a replay buffer, dedicate a separate set of parameters to each task or penalize excessive updates over parameters by introducing a regularization term. While existing methods employ the general task-agnostic stochastic gradient descent update rule, we propose a task-aware optimizer that adapts the learning rate based on the relatedness among tasks. We utilize the directions taken by the parameters during the updates by additively accumulating the gradients specific to each task. These task-based accumulated gradients act as a knowledge base that is maintained and updated throughout the stream. We empirically show that our proposed adaptive learning rate not only accounts for catastrophic forgetting but also exhibits knowledge transfer. We also show that our method performs better than several state-of-the-art methods in lifelong learning on complex datasets.
Abstract
This research formulates, and numerically quantifies the optimal response that can be discovered in a design space characterized by linear and two-way interaction effects. In an experimental design setup this can be conceptualized as the response of the best treatment combination of a 2^k full factorial design. Assuming prior distributions for the strength of main effects and interaction effects, this study enables practitioners to make estimates of the maximum possible improvement that is possible through design space exploration. For basic designs up to two factors, we construct the full distribution of the optimal treatment. Whereas, for more than two factors, we construct formulations for a lower bound.
Abstract
Sample Compression schemes (defined by Littlestone and Warmuth) are an underlying feature in all machine learning algorithms. In the original set of samples, a sample compression scheme of size k means that there is a subset of k samples from which we can recover the labels of all the other training samples in the dataset. Compression has been shown to imply learnability for binary classifiers. VC Dimension has also been shown to imply learning capacity. We also see that every class of VC Dimension d has sample compression scheme of size of the order exponential in d and hence it is proved that learnability implies compression. We are developing a new updated bagging algorithm in this project, taking ideas from Shay Moran and Amir Yehudayoff’s sample compression setup, which samples subsets of size O(d) where d is VC Dimension from a uniform distribution resulting in it being significantly less computationally expensive compared to the current bagging algorithm and gives similar accuracy to the current bagging algorithm. Also, we explore another variant of bagging algorithm which is a modification to our original algorithm having even better accuracy while also taking much less time than existing bagging algorithm. Basically, now we use boosting for sampling the subsets used in our algorithm instead of taking subsets from a uniform distribution to get even better accuracy.
Depen Morwani
Abstract
We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss. We analyse both standard weight normalization (SWN) and exponential weight normalization (EWN), and show that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate. We extend these results to gradient descent, and establish asymptotic relations between weights and gradients for both SWN and EWN. We also show that EWN causes weights to be updated in a way that prefers asymptotic relative sparsity. The asymptotic convergence rate of the loss for EWN is given by $\Theta(\frac{1}{t(\log t)^2})$, and is independent of the depth of the network. We demonstrate our results for SWN and EWN on synthetic data sets. Experimental results on simple data sets and architectures support our claim on sparse EWN solutions, even with SGD. This demonstrates its potential applications in learning prunable neural networks.