"Ensuring Robustness in LLM-based Research: Reproducibility, Interoperability, and Reliable Evaluation" --- BLAH9
Supported by BLAH9, we are calling on discussion and collaboration of "Ensuring Robustness in LLM-based Research: Reproducibility, Interoperability, and Reliable Evaluation". Project issues and discussion points are as below:
How to effectively use multicultural LLMs and PubAnnotation , to build a more knowledge-supportive Ontology.
Interoperate human expert with LLM and Pubannotation, providing concrete definition and literature annotation of each concept.
Showcase the user-friendly web design to explain Ontology Description with concrete explanations and concepts.
Pipeline designed in this project, can be implemented to other species and research areas.
The development of specialized ontologies such as the Rice trait Ontology facilitates the organization and standardization of knowledge in the area of plant biology. Studies of rice, which is one of the most important food crops in the world, have been conducted for many years. Ontologies are also important tools for the integration and retrieval of phylogenetic information enabling the more comprehensive investigation of biological systems.
Yet even now the task of constructing and maintaining high-quality ontologies, for example: Rice Trait Ontology (RTO) [Yao, 2022], is a daunting task or remains one of the biggest challenges due to the volume of biological information and the rate. Expert can't interact every time with many tool to get the proper definition of trait ontology particularly in specialized fields. Many trait Ontology have same definition in different fields , so it can be confusing and ambiguous for the expert to have a solid definition. Ontologies are often highly hierarchical, with multiple levels of parent-child relationships. Navigating this complexity to find the exact term that matches a specific experimental condition or biological context can be difficult.
Leveraging tools like Large Language Models (LLM) and PubAnnotation has made ontology development more efficient and accurate. LLMs generate concise, context-rich definitions for ontology terms based on existing literature, while PubAnnotation allows researchers to collaboratively annotate scientific papers with definitions, comments, and contextual information linked directly to ontology terms. This structured approach creates a clear relationship between traits, definitions, and relevant literature. For human experts, these tools provide comprehensive definitions and literature annotations, requiring only verification to ensure reliability, ultimately enhancing the accuracy and robustness of the ontology as a trustworthy resource for researchers.
Utilizing PubAnnotation enhances the reliability of data curation in research by enabling collaborative annotation of scientific literature. Researchers can add definitions, comments, and contextual information, which can be linked directly to ontology terms. This structured approach fosters a comprehensive understanding of traits and their relationships within the ontology, ensuring that the curated data is accurate and trustworthy. By integrating these annotations, researchers can efficiently verify concepts, leading to a more robust and reliable ontology resource.
BLAH9 (the 9th Biomedical Linked Annotation Hackathon) : BLAH is an annual hackathon events to promote the development of BioNLP community, which contains the biomedical literature annotation and mining resources sharing and linking. In this year, the BLAH9 is organized with a special theme which is "Ensuring Robustness in LLM-based Research: Reproducibility, Interoperability, and Reliable Evaluation". The registration, timeline and more information about BLAH9 can be found here.
Ontologies are understood to be organized structures that depict how different concepts, entities and terms in a field of knowledge interact with one another. In particular, when building ontologies through the application of LLMs or BioNLP tools, the purpose is to facilitate the extraction, verification and structuring of these terms so as to keep the ontology up to date and relevant. More information about Ontologies can be found here.
Website of RTO: Rice Trait Ontology
Here is some Ontology concept example : "Plant Morphology Trait" and it's definition which gives concrete definition and concepts : A plant trait (TO:0000387) which is a morphological quality of a plant anatomical entity (PO:0025131) or a constituent cellular component (GO:0005575) contained therein.
Enable the human experts to interoperate with LLM and PubAnnotation to provide concrete definition and literature annotation of each concepts.
Focus on testing and understanding the two platform APIs (KIMI and PubAnnotation) to support knowledge-driven ontology development
Test the results from API individually, as results can sometimes be ambiguous. Ensure that queries return accurate and reliable information
Both platform work in different way, so it is crucial to construct tailored queries for each to obtain solid results. The goal is to create web services that can seamlessly interact with a web interface.
Provide a unified interface for domain experts to interact with and discover specific ontologies, offering solid and precise definitions
Interface will integrate both LLM (Large Language Models) and PubAnnotation for generating concrete definitions and annotating literature.
Human experts will collaborate with the LLM and PubAnnotation platforms to verify concepts, update definitions, and refine descriptions.
Project aims to enhance interoperability between humans and machines. Experts will evaluate whether the machine-generated definitions are accurate and reliable
Performance evaluation will follow a day-by-day plan, spanning five days, to track efficiency improvements and results .
The current concepts and methodologies can be reproduced in different studies and research areas, especially those encountering issues with ontology concepts
Figure 1: Attempt of LLM-aided prompt and annotation
Focus on testing and understanding the KIMI API
Evaluate the accuracy and relevance of the results of the API , focusing on ensuring they represent key concepts in Rice trait Ontology.
Examine whether the recovers the RTO concepts and we can achieve our goal.
Setup webservice to examine and filter the data to get concrete definition
Aim to extract concrete definitions and ensure clarity in the information provided.
Conduct comprehensive testing of the PubAnnotation API to understand its functionalities and capabilities.
Run large-scale term extraction and relationship identification using the PubAnnotation API, integrating LLM functionalities to process a broad range of scientific literature.
Evaluate the efficiency and effectiveness of the PubAnnotation API in managing and handling large-scale data.
Assess how well the API supports the extraction and annotation processes, ensuring that it meets the project's goals.
Establish connections between data generated by KIMI rice ontology terms and external biological databases (PubAnnotation) to ensure semantic consistency and relevance.
Verify that the extracted terms and relationships follow global ontology standards , enabling smooth integration with other biological datasets and tools.
Implement a validation process to assess the consistency, accuracy, and completeness of the linked data, ensuring that the newly generated terms are biologically meaningful and aligned with existing knowledge.
Create a well-structured and intuitive interface that allows users to easily navigate, query, and explore the rice ontology data.
The webpage should enable researchers and domain experts to access, review, and contribute to the rice ontology, promoting continuous improvement and updates.
Develop tools within the interface that guide experts through the process of reviewing and refining ontology concepts. This may include providing suggested terms, relationships, or alerts for potential inconsistencies that require human intervention.
Integrate features that allow for the quick validation and curation of ontology terms, helping experts ensure that the ontology remains biologically accurate and up-to-date.
Establish clear guidelines and documentation for how the ontology development process, and validation can be reproduced by other researchers. This includes sharing the tools, and workflows used in the project.
Conduct a full evaluation of the ontology development process, including data quality, term coverage, integration with existing biological databases, and the usability of the interface. Ensure that all aspects meet the project's scientific and technical goals.
Based on the evaluation, outline potential next steps for scaling the project, including extending the approach to other crops or domains, and ensuring long-term sustainability.
*Specific scheduling may be flexible according to hackathon discussions.
Xinzhi Yao, Yun Liu, Qidong Deng, Yusha Liu, Xinchen Ma, Yufei Shen, Qianqian Peng, Zaiwen Feng, Jingbo Xia*. RTO, A Specific Crop Ontology for Rice Trait Concepts. Annual International Conference on International Society for Computational Biology (ISMB), Madison, WI, 10-14 July 2022 (Session Bio-Ontologies COSI) .
College of Informatics
Huazhong Agricultural Univ
Wuhan, Hubei 430070, China
Muhammad Ahmad Javeed, ahmadjaved870@gmail.com
Jingbo Xia, xiajingbo.math@gmail.com