BioCaster

The BioCaster ran from 2006 to 2012.

About

BioCaster is a research project aimed at providing advanced search and analysis of Internet news and research literature for public health workers, clinicians and researchers interested in communicable diseases. The portal is currently under development at the National Institute of Informatics by Dr. Nigel Collier with the cooperation of colleagues at the National Institute of Infectious Diseases, National Institute of Genetics, Okayama University, Vietnamese National University at Ho Chi Minh City and Kasetsart University. Based on text mining technology we aim to provide intelligent tools to help users obtain a clearer picture about actual and potential disease outbreaks in a timely manner.

Detecting and tracking infectious disease outbreaks involves having access to information from a variety of sources. Increasingly this means monitoring many hundreds of Internet news feeds simultaneously. However three difficulties exist in finding information using traditional search methods: firstly the massive volume of dynamically changing unstructured news data available on the Internet makes it extremely difficult for governments and public health workers to obtain a clear picture of the outbreak. Secondly, the initial reports of an outbreak are contained in only a few news articles which will usually be overlooked using simple keyword indexing methods. Thirdly, the initial reports of an infectious disease will usually be reported in local none-English news media. In order to capture outbreak information in the most timely manner it is therefore crucial for computer systems to have an understanding of several languages.

The BioCaster system has two major components: a web/database server and a backend cluster computer equipped with text mining technology which continuously scans hundreds of RSS newsfeeds from local and national news providers. Since the text mining system has a detailed knowledge about the important concepts such as diseases, pathogens, symptoms, people, places, drugs etc. this allows us to semantically index relevant parts of news articles, enabling users to have quicker and highly precise access to information. The knowledge we use comes from annotated text collections, gazetteer lists of nomenclature and the BioCaster ontology, all of which are currently under development. We are making the BioCaster ontology available for public access and feedback in the hope that it will be useful to those interested in the field. Software resources are also expected to be released as the project progresses.

We gratefully acknowledge grant-in-aid support for parts of the BioCaster project from the Japan Science and Technology Agency's PRESTO program, the Transdisciplinary Research Integration Center fund at the Research Organization for Information Systems (ROIS), and the Japan Society for the Promotion of Science.

Publications

  • Collier, N., Doan, S., Matsuda Goodwin, R., McCray, J., Conway, J., Shigematsu, M. and Kawazoe, A. (2010), “Navigating the Information Storm: Web-based Global Health Surveillance in BioCaster”, invited contribution under preparation for ‘BioSurveillance: A Health Protection Priority”, Kass-Hout, T. and Zhang, X. (eds) (in press).
  • Doan, S., Conway, M. and Collier, N. "An Empirical Study of Sections in Classifying Disease Outbreak Reports", invited chapter in Annals of Information Systems, Special Issue "Web-based Applications in Health Care & Biomedicine", Springer, 2010 (in press).
  • Conway, M., Kawazoe, A., Chanlekha, H. and Collier, N. (2010), “Developing a disease outbreak corpus”, under review for the Journal of Medical Internet Research.
  • Collier, N. (2010), “What’s unusual in online disease outbreak news?”, under review for the Journal of Biomedical Semantics.
  • Hartley, D., Nelson N., Walters R., Arthur R., Yangarber R., Madoff L., Linge J., Mawudeku A., Collier N., Brownstein J., Thinus, G. and Lightfoot N. (2010), “The landscape of international event-based biosurveillance”, Emerging Health Threats Journal, 3:e3.[html]
  • Conway, M., Doan, S., Kawazoe, A. and Collier, N. (2009), “Classifying disease outbreak reports using n-grams and semantic features”, International Journal of Medical Informatics (in press): DOI 10.1016/j.ijmedinfo.2009.03.0101. [pubmed]
  • Doan, S., Kawazoe, A., Conway, M. and Collier, N. (2009), “Towards role-based filtering of disease outbreak reports”, Journal of Biomedical Informatics, Elsevier, DOI: 10.1016/j.jbi.2008.12.009). [html][pubmed]
  • Conway, M., Doan, S., Kawazoe, A. and Collier, N. (2009), “Using hedges to enhance a disease outbreak report text mining system”, Proc. BioNLP 2009, pp. 142-143. [pdf]
  • Collier, N. Doan, S., Kawazoe, A., Matsuda Goodwin, R., Conway, M., Tateno, Y., Ngo, Q., Dien, D., Kawtrakul, A., Takeuchi, K., Shigematsu, M. and Taniguchi, K. (2008), “BioCaster: detecting public health rumors with a Web-based text mining system”, Bioinformatics, 24(24):2940-2941, Oxford University Press, DOI: 10.1093/bioinformatics/btn534. [html] [pubmed]
  • Kawazoe, A., Jin, L., Shigematsu, M., Bekki, D., Barrero, R., Taniguchi, K. and Collier, N. (2008), “The development of a schema for the annotation of terms in the BioCaster disease detection/tracking system”, Journal of Applied Ontology, IOS Press. [html]
  • Conway, M., Doan, S., Kawazoe, A. and Collier, N. (2008), "Classifying disease outbreak reports using n-grams and semantic features", Proc. 3rd International Symposium on Semantic Mining in Biomedicine (SMBM 2008), Turku, Finland, September 2-3, pp. 29-36. [pdf]
  • Kawazoe, A., Chanlekha, H., Shigematsu, M. and Collier, N. (2008), “Structuring an event ontology for disease outbreak detection”, BMC Bioinformatics, 9 (Suppl 3): S8, DOI: 10.1186/1471-2105-9-S3-S8. [pdf][html][pubmed]
  • Doan, S., Hung-Ngo, Q., Kawazoe, A. and Collier, N. (2008), "Global Health Monitor - a Web-based system for detecting and mapping infectious diseases", Proc. International Joint Conference on Natural Language Processing (IJCNLP), Companion Volume, Hyderabad, India, January 7-12, pp. 951-956
  • Collier, N., Kawazoe, A., Son, D., Shigematsu, M., Taniguchi, K., Jin, L., McCrae, J., Chanlekha, H., Dien, D., Hung, Q., Nam, V., Takeuchi, K. and Kawtrakul, A. (2007), “Detecting Web rumours with a multilingual ontology-supported text classification system”, Advances in Disease Surveillance, 4: 242, ISDS.
  • Collier, N., Kawazoe, A., Jin, L., Shigematsu, M., Dien, D. Barrero, R., Takeuchi , K.and Kawtrakul, A. (2006), “A multilingual ontology for infectious disease surveillance: rationale, design and challenges”, Language Resources and Evaluation, 40(3-4): 405-413, Springer Netherlands, DOI: 10.1007/s10579-007-9019-7. [html]
  • Kawazoe, A., Jin, L., Shigematsu, M., Barerro, R., Taniguchi , K. and Collier, N. (2006), "The development of a schema for the annotation of terms in the BioCaster disease detection/tracking system", Olivier Bodenreider (ed)., Proc. International Workshop on Biomedical Ontology in Action (KR-MED 2006), Baltimore, Maryland, USA, November 8th, pp. 77-85. [pdf]
  • Collier, N., Kawazoe, A. Shigematsu, M., Taniguchi, K., Jin, L., McCrae, J., Dien, D., Hung, Q., Takeuchi, K., Kawtrakul, A. (2007), "Ontology-driven influenza surveillance from Web rumours", Proc. Options for the Control of Influenza VI (Options 2007), Toronto, Ontario, Canada, June 17-23.

Members

  • Son Doan (NII, now at Vanderbilt University Medical Center))
  • Ai Kawazoe (NII, now at Tsuda College))
  • Reiko Matsuda Goodwin (Fordham University)
  • Mike Conway (NII, now at Pittsburgh University)
  • Quoc Hung-Ngo (VNU)
  • Mika Shigematsu (NIID)
  • Kiyosu Taniguchi (NIID)
  • Dinh Dien (VNU)
  • Asanee Kawtrakul (Kasetsart University and NECTEC)
  • Koichi Takeuchi (Okayama University)
  • Nigel Collier (NII and JST)

Funding

BioCaster has been partly funded by various grants in aid. The core text mining system and the bio-geographic interface was supported by grants from the Japan Society for the Promotion of Science (JSPS); the first stage of the BioCaster Ontology for infectious disease detection was supported by a grant in aid from the Research Organization for Information System's Transdisciplinary Research Center. Work on event alerting has been supported by the Japan Science and Technology Agency (JST).