IASL-IISR Gene Mention/Normalization ToolOverview
IASL-IISR Gene Mention/Normalization Tool is developed for normalizing genes mentioned in a biomedical article.
The system uses selected word conjunctions, term normalization, and global patterns to improve the performance of biomedical named entity recognition  and exploits contextual information  to deal with the ambiguity problem in gene nromalization.
The tool is one of the core components in our rank 1 system  in the BioCreAtIvE II.5 Interactor Normalization Task, and PubMed-EX , and BIOSMILE Web Search  services.
For any inquiry, please contact me.
IASL-IISR Gene Normalization Tool uses several external resources. You have to download those files before using our system.
After downloads these files, place them into the "AMBISOURCE" folder.
How to Use
The processing files must be placed in the "docs" folder. Please use the following format to arrange sentences of an abstract. The processed results will generate in the "gns" folder.
For example, the file for the article (PMID: 20479501
) "Multistage gene normalization and SVM-based ranking for protein interactor extraction in full-text articles" must be formated as follows.
| Multistage gene normalization and SVM-based ranking for protein interactor extraction in full-text articles|
The interactor normalization task (INT) is to identify genes that play
the interactor role in protein-protein interactions (PPIs), to map these
genes to unique IDs, and to rank them according to their normalized
confidence. INT has two subtasks: gene normalization (GN) and interactor
ranking. The main difficulties of INT GN are identifying genes across
species and using full papers instead of abstracts. To tackle these
problems, we developed a multistage GN algorithm and a ranking method,
which exploit information in different parts of a paper. Our system
achieved a promising AUC of 0.43471. Using the multistage GN algorithm,
we have been able to improve system performance (AUC) by 1.719 percent
compared to a one-stage GN algorithm. Our experimental results also show
that with full text, versus abstract only, INT AUC performance was 22.6
Currently, the binary packages of IASL-IISR Gene Normalization Tool are available for download as follows.
You can test the package for free, but please cite [1-3] if you use our package in your research.
Please contact me when you need a binary package for other platforms, or a source package.
- Tsai RT-H, Sung C-L, Dai H-J, Hung H-C, Sung T-Y, Hsu W-L: NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. BMC Bioinformatics 2006, 7(Suppl 5):S11.
- Lai P-T, Bow Y-Y, Huang C-H, Dai H-J, Tsai RT-H, Hsu W-L: Using Contextual Information to Clarify Gene Normalization Ambiguity. In: The IEEE International Conference on Information Reuse and Integration (IEEE IRI 2009). Las Vegas, USA; 2009: 1-5.
- Dai H-J, Lai P-T, Tsai RT-H: Multi-stage gene normalization and SVM-based ranking for protein interactor extraction in full-text articles. IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010, 7(3):412-420.
- Tsai RT-H, Dai H-J, Lai P-T, Huang C-H: PubMed-EX: A web browser extension to enhance PubMed search with text mining features. Bioinformatics 2009, 25:3031-3032.
- Dai H-J, Huang C-H, Lin RTK, Tsai RT-H, Hsu W-L: BIOSMILE web search: a web application for annotating biomedical entities and relations. Nucl Acids Res 2008, 36(Web Server issue):W390-W398.