NERBio
Demo systems
NERBio with Batch Processing for gene mention recognition and protein-protein interaction article classification (since 2012): http://bws.iis.sinica.edu.tw/NERBio
NERBio for BioCreAtIvE II (since 2007): http://asqa.iis.sinica.edu.tw/biocreative2NERBio (since 2005): http://qa.iis.sinica.edu.tw/BioNER/Default.aspx
Resources
Related Annotation Tools
NLProt: a tool for finding protein-names in natural language-text. NLProt is based on Support Vector Machines (SVMs), which are trained on contextual-features of named entities (NEs) in scientific language. Additionally, simple filtering rules and a protein-name dictionary are used to increase performance. NLProt reached a precicion (accuracy) of 70% at a recall (coverage) of 85% after running it on the 166 abstracts of EMBL and Cell (Nov/Dec 2003).
ABNER: a software tool for molecular biology text analysis. At ABNER's core is a statistical machine learning system using linear-chain conditional random fields (CRFs) with a variety of orthographic and contextual features. ABNER 1.5 includes two models trained on the NLPBA and BioCreative corpora, for which performance is roughly state of the art (F1 scores of 70.5 and 69.9 respectively).
Corpus
GENETAG: a corpus of 20K MEDLINE sentences for gene/protein NER. 15K GENETAG sentences were used for the BioCreAtIvE Task 1A Competition. GENETAG Train, Test and Round1 data and ancillary programs are freely available at ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe/GENETAG.tar.gz webcite.