AGAC corpus contains eight trigger labels and two themetic roles.
Trigger words:
Themetic roles:
NOTE: Important annotation guideline: only the sentence that simultaneously referred to specific mutation and the biology function or disease will be annotated in AGAC. See the thorough annotation guideline in xxxx.
AGAC track consists of three task: trigger words NER, themetic roles extraction and mutation-disease knowledge discovery. The participants may chose any one task described below, but Task 2 requires Task 1, and Task 3 can be performed indepandently or based on Task 1 and Task 2.
Task 1: Trigger words NER
Task 2: Themetic roles identification
Task 3: "Gene;Function change;disease" link discovery
SHP-2;GOF;juvenile myelomonocytic leukemia. MLH1;REG;Lynch syndrome.GIRK2;COM;hyperkinetic movement disorder.Sample data for task 1, 2, and 3:
{ "target": "http://pubannotation.org/docs/sourcedb/PubMed/sourceid/25805808", "sourcedb": "PubMed", "sourceid": "25805808", "text": "Loss-of-function de novo mutations play an important role in severe human neural tube defects.\nBACKGROUND: Neural tube defects (NTDs) are very common and severe birth defects that are caused by failure of neural tube closure and that have a complex aetiology. Anencephaly and spina bifida are severe NTDs that affect reproductive fitness and suggest a role for de novo mutations (DNMs) in their aetiology.\nMETHODS: We used whole-exome sequencing in 43 sporadic cases affected with myelomeningocele or anencephaly and their unaffected parents to identify DNMs in their exomes.\nRESULTS: We identified 42 coding DNMs in 25 cases, of which 6 were loss of function (LoF) showing a higher rate of LoF DNM in our cohort compared with control cohorts. Notably, we identified two protein-truncating DNMs in two independent cases in SHROOM3, previously associated with NTDs only in animal models. We have demonstrated a significant enrichment of LoF DNMs in this gene in NTDs compared with the gene specific DNM rate and to the DNM rate estimated from control cohorts. We also identified one nonsense DNM in PAX3 and two potentially causative missense DNMs in GRHL3 and PTPRS.\nCONCLUSIONS: Our study demonstrates an important role of LoF DNMs in the development of NTDs and strongly implicates SHROOM3 in its aetiology.", "project": "AGAC2_PubMed_2","denotations": [ { "id": "T8", "span": { "begin": 771, "end": 778 }, "obj": "Protein" }, { "id": "T7", "span": { "begin": 779, "end": 789 }, "obj": "NegReg" }, { "id": "T6", "span": { "begin": 790, "end": 794 }, "obj": "Var" }, { "id": "T9", "span": { "begin": 823, "end": 830 }, "obj": "Gene" }, { "id": "T10", "span": { "begin": 936, "end": 939 }, "obj": "NegReg" }, { "id": "T11", "span": { "begin": 940, "end": 944 }, "obj": "Var" }, { "id": "T12", "span": { "begin": 961, "end": 965 }, "obj": "Disease" }, { "id": "T3", "span": { "begin": 1224, "end": 1227 }, "obj": "NegReg" }, { "id": "T1", "span": { "begin": 1228, "end": 1232 }, "obj": "Var" }, { "id": "T2", "span": { "begin": 1255, "end": 1259 }, "obj": "Disease" }, { "id": "T5", "span": { "begin": 1284, "end": 1291 }, "obj": "Gene" } ], "relations": [ { "id": "R1", "pred": "CauseOf", "subj": "T1", "obj": "T3" }, { "id": "R10", "pred": "ThemeOf", "subj": "T12", "obj": "T10" }, { "id": "R11", "pred": "ThemeOf", "subj": "T5", "obj": "T1" }, { "id": "R2", "pred": "ThemeOf", "subj": "T2", "obj": "T3" }, { "id": "R5", "pred": "CauseOf", "subj": "T6", "obj": "T7" }, { "id": "R6", "pred": "ThemeOf", "subj": "T8", "obj": "T7" }, { "id": "R7", "pred": "ThemeOf", "subj": "T9", "obj": "T6" }, { "id": "R8", "pred": "ThemeOf", "subj": "T9", "obj": "T11" }, { "id": "R9", "pred": "CauseOf", "subj": "T11", "obj": "T10" } ]} The format of the data is JSON. "target" is the adress of the annotated text. "sourcedb" is where the text original from, all the text in AGAC corpus are from PubMed. "sourceid" is pmid of the text. "text" contains the raw abstract.
"denotations" for Task 1:
"denotations" contains the trigger word annotations corresponding to Task 1. Each trigger word annotation has an "id"; a "span": its position in the abstract; an "obj": the trigger label it belongs to.
"relations" for Task 2:
"relations" contains the themetic roles between the trigger words, which corresponds to Task 2. Each relation contains an "id"; a "pred": the themetic roles; "subj" and "obj": the trigger word "id" that the relation associates, and the derection of the relation is from "subj" to "obj".
Note that Task 2 requires the result of Task 1.
Triples for Task 3:
25805808;SHROOM3;LOF;Neural tube defects
Triples showed above is the result of Task 3, which is required to be extracted from the sample text.
The format of triples is:
pmid;gene;function channge;disease.
Task 1: Please submit JSON file, with the same format of the above example. Exclude the "Relations" section.
Task 2: Please submit JSON file, with the same format of the above example.
Task 3: Please submit the triples in a plain text, one triple per line.