Question Answer (QA) systems for biomedical experiments facilitate cross-disciplinary communication, and serve as a foundation for downstream tasks, e.g., laboratory automation. High Information Density (HID) and Multi-Step Reasoning (MSR) pose unique challenges for biomedical experimental QA. While extracting structured knowledge, e.g., Knowledge Graphs (KGs), can substantially benefit biomedical experimental QA. Existing biomedical datasets focus on general or coarsegrained knowledge and thus fail to support the fine-grained experimental reasoning demanded by HID and MSR. To address this gap, we introduce Biomedical Protocol Information Extraction Dataset (BioPIE), a dataset that provides procedure-centric KGs of experimental entities, actions, and relations at a scale that supports reasoning over biomedical experiments across protocols. We evaluate information extraction methods on BioPIE, and implement a QA system that leverages BioPIE, showcasing performance gains on test, HID, and MSR question sets, showing that the structured experimental knowledge in BioPIE underpins both AI-assisted and more autonomous biomedical experimentation.
Figure 1. Illustration of BioPIE. (A) An annotated example of a biomedical experimental protocol for plasmid DNA preparation, illustrating how diverse laboratory operations are decomposed into structured procedural entities and relations under our annotation schema, independent of domain-specific biological semantics. (B) Statistics of entity types and relation types in the BioPIE dataset. (C) Representative entity and relation labels in our annotation scheme, with definitions and examples.
Table 1. Test F1 scores of different baselines on our proposed dataset. “Joint” denotes joint IE, while “Pipeline” refers to performing NER and RE separately. “Rel” and “Rel+” indicate relation extraction from original text under boundary and strict evaluation, respectively, and “RE” denotes relation extraction with gold entities, applicable only to pipeline methods.
Figure 2. BioPIE enables knowledge integration in lab automation. BioPIE can be used to extract large volumes of biomedical protocols into structured knowledge, which can then be used by knowledge QA systems.