1. AutOnMCQ - Automated Ontology-based MCQ Generation
Ontologies are knowledge representation structures which can be used as a platform for building many intelligent applications. Recently, due to the advancement in semantic web technologies and ease in publishing knowledge in the form of ontologies, many researchers focus their research on utilizing these knowledge structures in (e-learning) educational applications. One major research area, under the broad areas of e-learning, is the ontology based assessment systems, where the ontologies are used to generate multiple choice questions (MCQs) to conduct assessment tests, for assessing the knowledge and skill of learners.
Objective questions like MCQs are widely adopted in large-scale assessment tests than their counterpart, subjective questions (e.g., essay or short answer). Most of the country-wide and world-wide tests, and tests conducted as part of online courses like MOOCS (Massive Open Online Courses) typically consist mainly of MCQs. They have advantages, such as enabling questioner to cover a large content area, and are easier to administer and score using computer. However, research shows that designing valid questions and responses is a demanding skill that can be time consuming.
The existing approaches which use ontologies expressed in the Web Ontology Language (OWL) for MCQ generation, are limited to questions of type What is C? or Which of the following is an example of C? (where C is a concept symbol). Also, there are no systematic methods for generating distracting answers (distractors) from ontologies. Distractor generation process has to be given much importance, since the generated distractors determine the quality and hardness of an MCQ. In this project, we propose two new MCQ generation approaches, which are more useful and realistic in conducting assessment tests, and the corresponding distractor generating techniques. Our distractor generation techniques, unlike other methods, consider Open World Assumption, so that the generated MCQs will be always precise (ensures falsity of distracting answers). Furthermore, we present a measure to determine the difficulty level (a value between 0 and 1) of the generated MCQs. The proposed system is implemented, and experiments on specific ontologies have shown the effectiveness of the approaches. We also did an empirical study by generating question items from a real-world ontology and validated our arguments by human experts.
System-generated MCQs: Download
Test Questions generated from Plant-Protection ontology: Download question-set file
Participant's response sheets: Score_Sheet1 Score_Sheet2 Score_Sheet3 Score_Sheet4 Score_Sheet5 Score_Sheet6 Score_Sheet7
Technical report on "Verbalization of node-label-sets"
Publication:
VinuE.V., P. Sreenivasa Kumar, A novel approach to generate MCQs from domain ontology: Considering DL semantics and open-world assumption, Web Semantics: Science, Services and Agents on the World Wide Web (2015), http://dx.doi.org/10.1016/j.websem.2015.05.00
2. ATG (Automatic Test Generation) systems
ATG system and its extended version E-ATG are automated assessment test generation systems which can be employed in a pedagogical environment.
Currently, the system is built as an autonomous tool which takes an OWL ontology as input, and generates required number of (knowledge-level) questions as output. Unlike the conventional method of frequency based selection of predicates for relevant question generation, we adopted various heuristics that can mimics the selection process of a human-expert, for selecting the relevant questions for assessment. The unique stem-difficulty estimation feature of the system helps in generating question-sets of a required difficulty-level.
More details of the project can be found at: Project info.
3. Augmenting Linked Data Ontologies with New Object Properties
3a. DARO (Detecting Arbitrary Relations for enriching Ontology of Linked Data):
Although several RDF knowledge bases are available through the Linked Open Data (LOD) initiative, the ontology schema of such linked datasets is not very rich. In particular, they lack object properties. The problem of finding new object properties between any two given classes has not been investigated in detail in the context of Linked Data. In this project, we present DARO (Detecting Arbitrary Relations for enriching Ontology of Linked Data) - an unsupervised solution to enrich the LOD cloud with new object properties (and their instances) between two given classes. DARO first identifies text patterns from the web corpus that can potentially represent relations between individuals. These text patterns are then clustered based on semantic similarity to capture the object properties between the two given classes. We have empirically evaluated our approach on several pairs of classes and found that the system can indeed be used for enriching the linked datasets with new object properties and their instances. We have compared DARO with newOntExt which is an offshoot of the NELL (Never-Ending Language Learning) effort. Our experiments reveal that DARO gives better results than newOntExt as a recall-oriented system.
Results on input class-pairs from the NELL KB: All_class_pairs_under_Animal_domain
All_class_pairs_under_Sports_domain
All_class_pairs_under_Construction_domain
Rivers_Cities Languages_Countries Vegetables_Diseases CEOs_Companies
Results on input class-pairs from LOD : Empires_Rulers Religions_Countries Writers_Novels Actors_Movies
3b. Identifying relation-gaps:
We propose a methodology to predict potential pairs of classes which could be connected by object properties but are not yet connected. We claim that evidence obtained from external textual resources and their Word2Vec representations can be made use of, for this purpose. Our approach gives results that are complementary to those given by the traditional techniques found in the literature. Hence our method can be used in combination with the traditional techniques for maximum benefits.
Results of experiments conducted on input classes from the DBpedia dataset: Results
Publication:
1. Subhashree, S., Kumar, P.S.: Augmenting linked data ontologies with new object properties. New Gener. Comput. 38(1), 125-152 (2020), https://doi.org/10.1007/s00354-020-00085-0.
4. Ontology Enrichment using Question-Answer Datasets
Ontologies are knowledge representation structures that are used to model a domain in the form of concepts, entities and relations between them. Existing works on domain ontology enrichment in the literature mainly make use of the web corpus. Community generated Question-Answer data, inspite of being a rich source of information on various topics, has not been exploited much for the purpose of ontology enrichment. Such datasets especially provide good scope to extract new entities and relations and hence help to broaden the coverage of the domain ontology under consideration. In this work, we propose a novel approach to extract triples from Question-Answer pairs for the purpose of ontology enrichment, particularly focussing upon T-Box enrichment. Some initial experiments have been conducted on two domain ontologies and the preliminary results obtained reveal the potential of the system to convert Question-Answer pairs to meaningful triples that can be added to the ontologies, thus enhancing the quality of the ontologies.
5. DOPLEX (discovering Disjoint Object Property pairs using LEXical evidence)
Although Knowledge Graphs (KGs) have turned out to become a popular and powerful tool in the industry world, the major focus of most researchers has been only on adding more and more triples to the A-Boxes of the KGs. An often overlooked but an important part of a KG is its T-Box. If the T-Box contains incorrect statements or if certain correct statements are absent in it, it can lead to inconsistent knowledge in the KG or to information loss. In this work, we propose a novel system, DOPLEX, based on Probabilistic Soft Logic (PSL) to detect disjointness between pairs of object properties present in the KG. Current approaches mainly rely on checking the absence of common triples and miss out on exploiting the semantics of property names. In the proposed system, in addition to checking common triples, PSL is used to determine if property names imply disjointness. Our evaluation demonstrates that the proposed approach discovers disjoint property pairs with better precision when compared to the state-of-the-art system without compromising much on the number of disjoint pairs discovered.
Evaluation on the NELL KG:
NELL_filtered_object_properties NELL_filtered_object_property_pairs NELL_filtered_mutexpredicates
DOPLEX_output_of_first_phase DOPLEX_output_of_PSL_phase DOPLEX_full_output
Intersection_DOPLEXfulloutput_and_NELLfilteredmutexpredicates
Results of the manual evaluations:
Sample_of_DOPLEX_bonus_pairs Sample_of_NELL_mutexpredicates_minus_DOPLEX_output
........................................................................................................................................................
Evaluation on the PATTY dataset:
(Person, University) (Musician, Artist) (Organization, City)