Softwares

IASL-IISR Gene Mention/Normalization Tool

Overview
IASL-IISR Gene Mention/Normalization Tool is developed for normalizing genes mentioned in a biomedical article.

The system uses selected word conjunctions, term normalization, and global patterns to improve the performance of biomedical named entity recognition [1] and exploits contextual information [2] to deal with the ambiguity problem in gene nromalization.

The tool is one of the core components in our rank 1 system [3] in the BioCreAtIvE II.5 Interactor Normalization Task, and PubMed-EX [4], and BIOSMILE Web Search [5] services.

For any inquiry, please contact me.
 
System Requirements
IASL-IISR Gene Normalization Tool uses several external resources. You have to download those files before using our system.
After downloads these files, place them into the "AMBISOURCE" folder.

How to Use
The processing files must be placed in the "docs" folder. Please use the following format to arrange sentences of an abstract. The processed results will generate in the "gns" folder.

File Format
"TITLE"\n
\n
"ABSTRACT"\n

For example, the file for the article (PMID: 20479501) "Multistage gene normalization and SVM-based ranking for protein interactor extraction in full-text articles" must be formated as follows.

 Multistage gene normalization and SVM-based ranking for protein interactor extraction in full-text articles

The interactor normalization task (INT) is to identify genes that play the interactor role in protein-protein interactions (PPIs), to map these genes to unique IDs, and to rank them according to their normalized confidence. INT has two subtasks: gene normalization (GN) and interactor ranking. The main difficulties of INT GN are identifying genes across species and using full papers instead of abstracts. To tackle these problems, we developed a multistage GN algorithm and a ranking method, which exploit information in different parts of a paper. Our system achieved a promising AUC of 0.43471. Using the multistage GN algorithm, we have been able to improve system performance (AUC) by 1.719 percent compared to a one-stage GN algorithm. Our experimental results also show that with full text, versus abstract only, INT AUC performance was 22.6 percent higher.


Download
Currently, the binary packages of IASL-IISR Gene Normalization Tool are available for download as follows.
You can test the package for free, but please cite [1-3] if you use our package in your research. 
Please contact me when you need a binary package for other platforms, or a source package.

Reference
  1. Tsai RT-H, Sung C-L, Dai H-J, Hung H-C, Sung T-Y, Hsu W-L: NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. BMC Bioinformatics 2006, 7(Suppl 5):S11.
  2. Lai P-T, Bow Y-Y, Huang C-H, Dai H-J, Tsai RT-H, Hsu W-L: Using Contextual Information to Clarify Gene Normalization Ambiguity. In: The IEEE International Conference on Information Reuse and Integration (IEEE IRI 2009). Las Vegas, USA; 2009: 1-5.
  3. Dai H-J, Lai P-T, Tsai RT-H: Multi-stage gene normalization and SVM-based ranking for protein interactor extraction in full-text articles. IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010, 7(3):412-420.
  4. Tsai RT-H, Dai H-J, Lai P-T, Huang C-H: PubMed-EX: A web browser extension to enhance PubMed search with text mining features. Bioinformatics 2009, 25:3031-3032.
  5. Dai H-J, Huang C-H, Lin RTK, Tsai RT-H, Hsu W-L: BIOSMILE web search: a web application for annotating biomedical entities and relations. Nucl Acids Res 2008, 36(Web Server issue):W390-W398.