"how far natural language processing (NLP) technology can help fight Covid-19, if forces are joined? "
--- BLAH 7
TOPIC 1. The mechanism of BioNLP corpus working for Covid-19. (A cross-disciplinary topic)
Along with the rapid increasing of Covid-19 literature, huge information of the biological activities and the mutations of Covid-19 were extracted by a AGAC-based pipeline. (Data available in "DATA" section)
Supported by BLAH7 hackathon, we are calling on discussion and collaboration of using these data in scientific discovery over text mining Covid-19. The sub-topics cover but not restricts to the following aspects:
i) What's next after extracting tons of information in Covid-19 literature by applying AGAC?
ii) How to make the extracted mutation information or biological process via text mining contribute to fighting Covid-19?
ii) Analysis of AGAC corpus design for drug, mutation, gene, protein, pathway mining, for the sake of containing Covid-19.
TOPIC 2. To improve the performance of AGAC track, a task in BioNLP-OST 2019. (A NLP topic)
AGAC track is one of the shared task provided in BioNLP-OST 2019. Three sub-tasks are included in AGAC track, including named entity recognition, relation extraction and triple link discovery. Due to the difficulty of these tasks, there are still room to improve.
The sub-topics cover but not restricts to the following aspects:
i) State-of-art NLP technique in improving AGAC track performance.
ii) Self-learning strategy in corpus expanding.
BLAH7 (the 7th Biomedical Linked Annotation Hackathon) : BLAH is an annual hackathon events to promote the development of BioNLP community, which contains the biomedical literature annotation and mining resources sharing and linking. In this year, the BLAH7 is organized as an online event, and with a special theme which is "Covid-19" answering a question "how far natural language processing (NLP) technology can help fight Covid-19, if forces are joined? ". The registration, timeline and more information about BLAH7 can be found here.
AGAC (Active Gene Annotation Corpus) : AGAC is a corpus aims to annotate the functional mutations and the subsequent biological processes. The "Variation" named entities and other 4 bio-concept named entities are able to help recognized the mutation semantic and the subsequent biological processes in text, while the effect direction of the mutation on biological processes are annotated by 3 regulatory named entities. Besides, 2 types of thematic relation annotate the semantic relation between labels. More information about AGAC can be found here.
The bio-concept named entities well annotate the diverse biological processes in literature, which is the further information about the effect of the mutations on subsequent biological processes. When focusing on the changed biological proces, the AGAC annotations can be partly used to extract the information in different research scenario.
Therefore, when you are interested in the biological processes that mentioned in covid-19 literature, the bio-concept named entities are helpful to extract these information. If reserving all annotations of AGAC, it is able to extracted the variation of the covid-19 and the changed biological processes of this virus.
Extracted mutation and biological process over Covid-19 literature can be found here.
The information of AGAC Track in BioNLP-OST 2019 and AGAC corpus can be found here.
Introduction of AGAC and related works: HZAU BioNLP AGAC
Tasks setting of AGAC track and training data: BioNLP OST 2019 (AGAC Track)
BLAH7 registration, timeline and more information: BLAH7
Covid-19-relevant documents: LitCovid
Yuxing Wang, et. al. Guideline Design of an Active Gene Annotation Corpus for the Purpose of Drug Repurposing. 2018 11th CISP-BMEI 2018, Oct, 2018, Beijing.
Yuxing Wang, et al. An Active Gene Annotation Corpus and Its Application on Anti-epilepsy Drug Discovery. BIBM 2019: International Conference on Bioinformatics & Biomedicine, San Diego, U.S, Nov, 2019.
Yuxing Wang, Kaiyin Zhou, Mina Gachloo, Jingbo Xia*. An Overview of the Active Gene Annotation Corpus and the BioNLP OST 2019 AGAC Track Tasks. BioNLP Open Shared Task 2019, workshop in EMNLP-IJCNLP 2019, Hong Kong.
College of Informatics
Huazhong Agricultural Univ
Wuhan, Hubei 430070
China
Jingbo Xia, xiajingbo.math@gmail.com
Kaiyin Zhou, kaiyinzhouhazu@gmail.com
Yuxing Wang, yuxingwang.www@gmail.com
Mina Gachloo, m_gachloo@yahoo.com