Complex AI models such as deep neural networks have obtained state-of-the-art results on many machine learning tasks. However, they lack transparency and intelligibility due to their black-box nature. In this project we design and develop novel methods for explaining black-box AI model, with a focus on the models applied to the biomedical domain.
The first paper published as a part of this project describes a post-hoc explanation method that discovers how a black-box classifier relates inputs to class labels. The new explanation method improved fidelity and interpretability of explanations on various black-box classifiers and several tabular and textual data classification tasks. The paper can be accessed via this link.
In the second paper, a method is proposed for explaining black-box models for biomedical text classification. Sources of domain knowledge are used to extract biomedical concepts from the input text. Semantic relations are then extracted between the input text and class labels in different decision subspaces of the black-box.
Deep neural language models have shown a great capability in encoding lexical, semantic, and syntactic properties of language. In this project, we investigate the efficacy of these models for summarizing biomedical text documents, especially scientific articles. We use well-known and state-of-the-art neural models to design and develop novel text summarization systems.
We utilize the well-known BERT language model to quantify the informative content of sentences in a document. Sentences are represented as contextual embeddings, then are clustered to identify those groups of sentences that share a similar context. The most informative sentences within each cluster are extracted to build the final summary. The paper can be accessed via this link.
We design and develop a text summarizer that uses word embedding models trained on large corpora of biomedical text. The input document is modelled as a graph in which sentences are connected to each other based on the similarity between their embeddings. Graph ranking algorithms are utilized to extract informative and important sentences. The paper can be accessed via this link.
Shallow word-based and positional features could not effectively measure the informative content of text, especially in domain-specific summarization. In this project, we adopt a concept-based approach to biomedical text summarization. We combine biomedical concept extraction and different machine learning-based text modelling approaches to develop summarization systems that can effectively deal with peculiarities of biomedical text.
We propose novel feature extraction approaches based on biomedical concepts that appear in the input text. A Bayesian classification heuristic is used to produce a summary in which the distribution of important concepts follows the distribution in the original text. The paper can be accessed via this link.
We apply a frequent itemset mining method on concepts extracted from a document to discover important topics. Frequent itemsets are used to quantify the informativeness of sentences and show how sentences cover the main topics. The final summary is generated by including those sentences that best cover the main topics of the text. The paper can be accessed via this link.
We use biomedical concepts and frequent itemset mining to discover main subtopics within an input text. The similarity between sentences is quantified based on the itemsets they have in common. A hierarchical clustering method divides the sentences into groups, where sentences in each cluster share the same subtopics. The most informative sentences within each cluster are extracted to generate the final summary. The paper can be accessed via this link.
We utilized a meaningfulness measure to discover significant topics of an input text. The input text is modelled as a small-world network in which sentences are connected together with respect to the meaningful topics they share. A degree measure is used to identify central nodes that correspond to important sentences of the input text. The paper can be accessed via this link.
We propose a new multi-agent Grid job scheduling method based on a reinforcement learning mechanism. Our method tackles the problem of single point of failure by adopting a multi-agent scheduling approach. It also utilizes an efficient coordination mechanism between agents with limited communication. CLDS can maintain a high degree of load balancing under different system scales and loads. The paper can be accessed via this link.