Text mining for chemical health risk assessment

Given the double exponential growth rate of biomedical literature over recent years, there is a pressing need to develop technology that can make information in published literature more accessible and useful for scientists. Such technology can be based on text mining. Drawing on techniques from natural language processing, information retrieval and data mining, text mining can automatically retrieve, extract and discover novel information even in huge collections of written text. 

The CRAB project develops text mining technology to support one of the most literature-dependent areas of biomedicine: chemical health risk assessment. This task is complex and time-consuming, requiring a thorough review of existing scientific data on a particular chemical. Covering human, animal, cellular and other mechanistic data from various fields of biomedicine, this is highly varied and therefore difficult to harvest from literature databases via manual means. 

The project has developed a tool that automates literature review and analysis in chemical risk assessment by extracting relevant scientific data in published literature and classifying it according to multiple qualitative dimensions. Developed in close collaboration with risk assessors, the tool allows navigating the classified dataset in various ways and sharing the data with other users. 

Currently applicable to cancer, the tool could be straightforwardly adapted to support the assessment and study of other important health risks related to chemicals (e.g. allergy, asthma, reproductive disorders, among many others).