Ranking-based approach to construct explanations for multiple-choice elementary science QA

About

Explanation construction for multiple-choice elementary science QA involves appropriately ranking a list of candidate explanation sentences from an unordered collection of explanation sentences for a question and its correct answer (QA).

Specifically, given a question, its known correct answer, and a list of n explanation sentences, the goal is to (1) determine whether an explanation sentence is relevant as justification for the QA, and if so, (2) rank the relevant explanation sentences in order by their role in forming a logical discourse fragment.

Shown below on the left is an example depicting the task also showing the lexical hop phenomenon (the underlined words) that is a characteristic of the corpus between Question and Answer pair not just with correct facts, but also with incorrect fact candidates. On the right, is a lecture about the project.

Our Feature-rich Task System in a Natural Language Engineering Journal Article

In this work, we designed a feature-rich support vector machine (SVM) classifier testing both a pointwise regression algorithm versus a pairwise learning-to-rank algorithm. Our features were designed in six main feature categories: Bags of lexical features (70,949 total features); ConceptNet (294,249 total features); Open Information Extraction relations (36,989 total features); Multihop inference specific features (2,620 total features); Offline TF-IDF ranking features (750,283 total features); and BERT embeddings. Two main takeaways of this system: 1) it outperforms the currently well-known neural approaches; and 2) the features themselves offer poignant insights to the computational task. More details can be found in our publication below.

Jennifer D’Souza, Isaiah Onando Mulang' and Sören Auer (2022). Ranking facts for explaining answers to elementary science questions. Natural Language Engineering, 1-26. https://doi.org/10.1017/S1351324921000358

Psycholinguistic Focus Words for Explanation Generation

In this work, concentrating exclusively on enhancing the lexical match between a question, answer, and explanation, we encode a novel lightweight feature based on the psycholinguistic concept of focus words that has been defined by Brysbaert et al. Loosely, a focus word can be defined as a word which is not too tangible to be experienced directly by the five natural senses (i.e., smell, touch, sight, taste, and hearing), while as well not too abstract (e.g., acquirable) that the meaning may not be illustrated without using other words. We noted that in elementary science, the content words in sentences which defined the semantics of the QA and explanation were indeed focus words such as break down, fall, decompose, organism, dead. Thus we demonstrate for the first time the application of focus words in the context of contemporary neural-based transformer models for the task of explanation generation. We observe that employing focus words in neural-based models enhances the lexical attention capability within transformer-based BERT models. More details can be found in our paper listed below.

Isaiah Onando Mulang’, Jennifer D’Souza, and Sören Auer. 2020. Fine-tuning BERT with Focus Words for Explanation Regeneration. In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics (*SEM), pages 125–130, Barcelona, Spain (Online). Association for Computational Linguistics.