admetSAR - Absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties play key roles in the discovery/development of drugs, pesticides, food additives, consumer products, and industrial chemicals. This information is especially useful when to conduct environmental and human hazard assessment. The most critical rate limiting step in the chemical safety assessment workflow is the availability of high quality data. This paper describes an ADMET structure-activity relationship database, abbreviated as admetSAR. It is an open source, text and structure searchable, and continually updated database that collects, curates, and manages available ADMET-associated properties data from the published literature. In admetSAR, over 210 000 ADMET annotated data points for more than 96 000 unique compounds with 45 kinds of ADMET-associated properties, proteins, species, or organisms have been carefully curated from a large number of diverse literatures. The database provides a user-friendly interface to query a specific chemical profile, using either CAS registry number, common name, or structure similarity. In addition, the database includes 22 qualitative classification and 5 quantitative regression models with highly predictive accuracy, allowing to estimate ecological/mammalian ADMET properties for novel chemicals.
AffinDB - AffinDB is a database of affinity data for structurally resolved protein-ligand complexes from the Protein Data Bank (PDB). It is freely accessible at http://www.agklebe.de/affinity. Affinity data are collected from the scientific literature, both from primary sources describing the original experimental work of affinity determination and from secondary references which report affinity values determined by others. AffinDB currently contains over 730 affinity entries covering more than 450 different protein-ligand complexes. Besides the affinity value, PDB summary information and additional data are provided, including the experimental conditions of the affinity measurement (if available in the corresponding reference); 2D drawing, SMILES code and molecular weight of the ligand; links to other databases, and bibliographic information. AffinDB can be queried by PDB code or by any combination of affinity range, temperature and pH value of the measurement, ligand molecular weight, and publication data (author, journal and year). Search results can be saved as tabular reports in text files. The database is supposed to be a valuable resource for researchers interested in biomolecular recognition and the development of tools for correlating structural data with affinities, as needed, for example, in structure-based drug design.
AIST Chemical ID Converter Hyperlink Management System - Cross database chemical search tool and chemical conversion tool
Alkemio - The PubMed® database of biomedical citations allows the retrieval of scientific articles studying the function of chemicals in biology and medicine. Mining millions of available citations to search reported associations between chemicals and topics of interest would require substantial human time. We have implemented the Alkemio text mining web tool and SOAP web service to help in this task. The tool uses biomedical articles discussing chemicals (including drugs), predicts their relatedness to the query topic with a naïve Bayesian classifier and ranks all chemicals by P-values computed from random simulations. Benchmarks on seven human pathways showed good retrieval performance (areas under the receiver operating characteristic curves ranged from 73.6 to 94.5%). Comparison with existing tools to retrieve chemicals associated to eight diseases showed the higher precision and recall of Alkemio when considering the top 10 candidate chemicals. Alkemio is a high performing web tool ranking chemicals for any biomedical topics and it is free to non-commercial users.
BindingDB - BindingDB (http://www.bindingdb.org) is a publicly accessible database currently containing approximately 20,000 experimentally determined binding affinities of protein-ligand complexes, for 110 protein targets including isoforms and mutational variants, and approximately 11,000 small molecule ligands. The data are extracted from the scientific literature, data collection focusing on proteins that are drug-targets or candidate drug-targets and for which structural data are present in the Protein Data Bank. The BindingDB website supports a range of query types, including searches by chemical structure, substructure and similarity; protein sequence; ligand and protein names; affinity ranges and molecular weight. Data sets generated by BindingDB queries can be downloaded in the form of annotated SDfiles for further analysis, or used as the basis for virtual screening of a compound database uploaded by the user. The data in BindingDB are linked both to structural data in the PDB via PDB IDs and chemical and sequence searches, and to the literature in PubMed via PubMed IDs.
BiNChe - Chemical ontology enrichment tool developed by ChEBI
CARLSBAD database - Many bioactivity databases offer information regarding the biological activity of small molecules on protein targets. Information in these databases is often hard to resolve with certainty because of subsetting different data in a variety of formats; use of different bioactivity metrics; use of different identifiers for chemicals and proteins; and having to access different query interfaces, respectively. Given the multitude of data sources, interfaces and standards, it is challenging to gather relevant facts and make appropriate connections and decisions regarding chemical-protein associations. The CARLSBAD database has been developed as an integrated resource, focused on high-quality subsets from several bioactivity databases, which are aggregated and presented in a uniform manner, suitable for the study of the relationships between small molecules and targets. In contrast to data collection resources, CARLSBAD provides a single normalized activity value of a given type for each unique chemical-protein target pair. Two types of scaffold perception methods have been implemented and are available for datamining: HierS (hierarchical scaffolds) and MCES (maximum common edge subgraph). The 2012 release of CARLSBAD contains 439 985 unique chemical structures, mapped onto 1,420 889 unique bioactivities, and annotated with 277 140 HierS scaffolds and 54 135 MCES chemical patterns, respectively. Of the 890 323 unique structure-target pairs curated in CARLSBAD, 13.95% are aggregated from multiple structure-target values: 94 975 are aggregated from two bioactivities, 14 544 from three, 7 930 from four and 2214 have five bioactivities, respectively. CARLSBAD captures bioactivities and tags for 1435 unique chemical structures of active pharmaceutical ingredients (i.e. 'drugs'). CARLSBAD processing resulted in a net 17.3% data reduction for chemicals, 34.3% reduction for bioactivities, 23% reduction for HierS and 25% reduction for MCES, respectively. The CARLSBAD database supports a knowledge mining system that provides non-specialists with novel integrative ways of exploring chemical biology space to facilitate knowledge mining in drug discovery and repurposing.
CDRUG - Cancer is the leading cause of death worldwide. Screening anticancer candidates from tens of millions of chemical compounds is expensive and time-consuming. A rapid and user-friendly web server, known as CDRUG, is described here to predict the anticancer activity of chemical compounds. In CDRUG, a hybrid score was developed to measure the similarity of different compounds. The performance analysis shows that CDRUG has the area under curve (AUC) of 0.878, indicating that CDRUG is effective to distinguish active and inactive compounds.
ChEBI - Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. The molecular entities in question are either natural products or synthetic products used to intervene in the processes of living organisms. Genome-encoded macromolecules (nucleic acids, proteins and peptides derived from proteins by cleavage) are not as a rule included in ChEBI. In addition to molecular entities, ChEBI contains groups (parts of molecular entities) and classes of entities. ChEBI includes an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified.
Chem2Bio2RDF - Background:Recently there has been an explosion of new data sources about genes, proteins, genetic variations, chemical compounds, diseases and drugs. Integration of these data sources and the identification of patterns that go across them is of critical interest. Initiatives such as Bio2RDF and LODD have tackled the problem of linking biological data and drug data respectively using RDF. Thus far, the inclusion of chemogenomic and systems chemical biology information that crosses the domains of chemistry and biology has been very limited. Results: We have created a single repository called Chem2Bio2RDF by aggregating data from multiple chemogenomics repositories that is cross-linked into Bio2RDF and LODD. We have also created a linked-path generation tool to facilitate SPARQL query generation, and have created extended SPARQL functions to address specific chemical/biological search needs. We demonstrate the utility of Chem2Bio2RDF in investigating polypharmacology, identification of potential multiple pathway inhibitors, and the association of pathways with adverse drug reactions.Conclusions:We have created a new semantic systems chemical biology resource, and have demonstrated its potential usefulness in specific examples of polypharmacology, multiple pathway inhibition and adverse drug reaction - pathway mapping. We have also demonstrated the usefulness of extending SPARQL with cheminformatics and bioinformatics functionality.
ChEMBL - ChEMBL is an Open Data database containing binding, functional and ADMET information for a large number of drug-like bioactive compounds. These data are manually abstracted from the primary published literature on a regular basis, then further curated and standardized to maximize their quality and utility across a wide range of chemical biology and drug-discovery research problems. Currently, the database contains 5.4 million bioactivity measurements for more than 1 million compounds and 5200 protein targets.
ChemGPS-NP - Internet has become a central source for information, tools, and services facilitating the work for medicinal chemists and drug discoverers worldwide. In this paper we introduce a web-based public tool, ChemGPS-NP(Web) (http://chemgps.bmc.uu.se), for comprehensive chemical space navigation and exploration in terms of global mapping onto a consistent, eight dimensional map over structure derived physico-chemical characteristics. ChemGPS-NP(Web) can assist in compound selection and prioritization; property description and interpretation; cluster analysis and neighbourhood mapping; as well as comparison and characterization of large compound datasets. By using ChemGPS-NP(Web), researchers can analyze and compare chemical libraries in a consistent manner. In this study it is demonstrated how ChemGPS-NP(Web) can assist in interpreting results from two large datasets tested for activity in biological assays for pyruvate kinase and Bcl-2 family related protein interactions, respectively. Furthermore, a more than 30-year-old suggestion of "chemical similarity" between the natural pigments betalains and muscaflavins is tested.
Chemical Toxicity Database - This database collection contains about 150,000 compounds for chemical toxicity database (including a large number of chemical drugs) toxicological aspects of data, such as acute toxicity, chronic toxicity, genotoxicity, carcinogenicity and reproductive toxicity and irritation data and data sources
ChemIDPlus - NLM toxicology data search tool
ChemMapper - SUMMARY: ChemMapper is an online platform to predict polypharmacology effect and mode of action for small molecules based on 3D similarity computation. ChemMapper collects over 350,000 chemical structures with bioactivities and associated target annotations (as well as over 3,000,000 non-annotated compounds for virtual screening). Taking the user-provided chemical structure as the query, the top most similar compounds in terms of 3D similarity are returned with associated pharmacology annotations. ChemMapper is designed to provide versatile services in a variety of chemogenomics, drug repurposing, polypharmacology, novel bioactive compounds identification, and scaffold hopping studies
ChemProt - Systems pharmacology is an emergent area that studies drug action across multiple scales of complexity, from molecular and cellular to tissue and organism levels. There is a critical need to develop network-based approaches to integrate the growing body of chemical biology knowledge with network biology. Here, we report ChemProt, a disease chemical biology database, which is based on a compilation of multiple chemical-protein annotation resources, as well as disease-associated protein-protein interactions (PPIs). We assembled more than 700,000 unique chemicals with biological annotation for 30,578 proteins. We gathered over 2-million chemical-protein interactions, which were integrated in a quality scored human PPI network of 428,429 interactions. The PPI network layer allows for studying disease and tissue specificity through each protein complex. ChemProt can assist in the in silico evaluation of environmental chemicals, natural products and approved drugs, as well as the selection of new compounds based on their activity profile against most known biological targets, including those related to adverse drug events. Results from the disease chemical biology database associate citalopram, an antidepressant, with osteogenesis imperfect and leukemia and bisphenol A, an endocrine disruptor, with certain types of cancer, respectively.
ChemSpider - ChemSpider is a free chemical structure database providing fast access to over 25 million structures, properties and associated information. By integrating and linking compounds from more than 400 data sources, ChemSpider enables researchers to discover the most comprehensive view of freely available chemical data from a single online search. It is owned by the Royal Society of Chemistry.
ConcensusPathDB - ConsensusPathDB is a meta-database that integrates different types of functional interactions from heterogeneous interaction data resources. Physical protein interactions, metabolic and signaling reactions and gene regulatory interactions are integrated in a seamless functional association network that simultaneously describes multiple functional aspects of genes, proteins, complexes, metabolites, etc. With 155,432 human, 194,480 yeast and 13,648 mouse complex functional interactions (originating from 18 databases on human and eight databases on yeast and mouse interactions each), ConsensusPathDB currently constitutes the most comprehensive publicly available interaction repository for these species. The Web interface at http://cpdb.molgen.mpg.de offers different ways of utilizing these integrated interaction data, in particular with tools for visualization, analysis and interpretation of high-throughput expression data in the light of functional interactions and biological pathways.
Connectivity Map -To pursue a systematic approach to the discovery of functional connections among diseases, genetic perturbation, and drug action, we have created the first installment of a reference collection of gene-expression profiles from cultured human cells treated with bioactive small molecules, together with pattern-matching software to mine these data. We demonstrate that this "Connectivity Map" resource can be used to find connections among small molecules sharing a mechanism of action, chemicals and physiological processes, and diseases and drugs. These results indicate the feasibility of the approach and suggest the value of a large-scale community Connectivity Map project.
COPICAT - Since tens of millions of chemical compounds have been accumulated in public chemical databases, fast comprehensive computational methods to predict interactions between chemical compounds and proteins are needed for virtual screening of lead compounds. Previously, we proposed a novel method for predicting protein–chemical interactions using two-layer Support Vector Machine classifiers that require only readily available biochemical data, i.e. amino acid sequences of proteins and structure formulas of chemical compounds. In this article, the method has been implemented as the COPICAT web service, with an easy-to-use front-end interface. Users can simply submit a protein–chemical interaction prediction job using a pre-trained classifier, or can even train their own classification model by uploading training data. COPICAT's fast and accurate computational prediction has enhanced lead compound discovery against a database of tens of millions of chemical compounds, implying that the search space for drug discovery is extended by >1000 times compared with currently well-used high-throughput screening methodologies.
CoPub - In this article, we present CoPub 5.0, a publicly available text mining system, which uses Medline abstracts to calculate robust statistics for keyword co-occurrences. CoPub was initially developed for the analysis of microarray data, but we broadened the scope by implementing new technology and new thesauri. In CoPub 5.0, we integrated existing CoPub technology with new features, and provided a new advanced interface, which can be used to answer a variety of biological questions. CoPub 5.0 allows searching for keywords of interest and its relations to curated thesauri and provides highlighting and sorting mechanisms, using its statistics, to retrieve the most important abstracts in which the terms co-occur. It also provides a way to search for indirect relations between genes, drugs, pathways and diseases, following an ABC principle, in which A and C have no direct connection but are connected via shared B intermediates. With CoPub 5.0, it is possible to create, annotate and analyze networks using the layout and highlight options of Cytoscape web, allowing for literature based systems biology. Finally, operations of the CoPub 5.0 Web service enable to implement the CoPub technology in bioinformatics workflows.
CPI-Predict - Elucidation of chemical-protein interactions (CPI) is the basis of target identification and drug discovery. It is time-consuming and costly to determine CPI experimentally, and computational methods will facilitate the determination of CPI. In this study, two methods, multitarget quantitative structure-activity relationship (mt-QSAR) and computational chemogenomics, were developed for CPI prediction. Two comprehensive data sets were collected from the ChEMBL database for method assessment. One data set consisted of 81 689 CPI pairs among 50 924 compounds and 136 G-protein coupled receptors (GPCRs), while the other one contained 43 965 CPI pairs among 23 376 compounds and 176 kinases. The range of the area under the receiver operating characteristic curve (AUC) for the test sets was 0.95 to 1.0 and 0.82 to 1.0 for 100 GPCR mt-QSAR models and 100 kinase mt-QSAR models, respectively. The AUC of 5-fold cross validation were about 0.92 for both 176 kinases and 136 GPCRs using the chemogenomic method. However, the performance of the chemogenomic method was worse than that of mt-QSAR for the external validation set. Further analysis revealed that there was a high false positive rate for the external validation set when using the chemogenomic method. In addition, we developed a web server named CPI-Predictor, , which is available for free. The methods and tool have potential applications in network pharmacology and drug repositioning.
CTD - BACKGROUND: The etiology of many chronic diseases involves interactions between environmental factors and genes that modulate physiological processes. Understanding interactions between environmental chemicals and genes/proteins may provide insights into the mechanisms of chemical actions, disease susceptibility, toxicity, and therapeutic drug interactions. The Comparative Toxicogenomics Database (CTD; http://ctd.mdibl.org) provides these insights by curating and integrating data describing relationships between chemicals, genes/proteins, and human diseases. To illustrate the scope and application of CTD, we present an analysis of curated data for the chemical arsenic. Arsenic represents a major global environmental health threat and is associated with many diseases. The mechanisms by which arsenic modulates these diseases are not well understood. METHODS: Curated interactions between arsenic compounds and genes were downloaded using export and batch query tools at CTD. The list of genes was analyzed for molecular interactions, Gene Ontology (GO) terms, KEGG pathway annotations, and inferred disease relationships. RESULTS: CTD contains curated data from the published literature describing 2,738 molecular interactions between 21 different arsenic compounds and 1,456 genes and proteins. Analysis of these genes and proteins provide insight into the biological functions and molecular networks that are affected by exposure to arsenic, including stress response, apoptosis, cell cycle, and specific protein signaling pathways. Integrating arsenic-gene data with gene-disease data yields a list of diseases that may be associated with arsenic exposure and genes that may explain this association. CONCLUSION: CTD data integration and curation strategies yield insight into the actions of environmental chemicals and provide a basis for developing hypotheses about the molecular mechanisms underlying the etiology of environmental diseases. While many reports describe the molecular response to arsenic, CTD integrates these data with additional curated data sets that facilitate construction of chemical-gene-disease networks and provide the groundwork for investigating the molecular basis of arsenic-associated diseases or toxicity. The analysis reported here is extensible to any environmental chemical or therapeutic drug.
DART - An adverse drug reaction (ADR) often results from interaction of a drug or its metabolites with specific protein targets important in normal cellular function. Knowledge about these targets is both important in facilitating the study of the mechanisms of ADRs and in new drug discovery. It is also useful in the development and testing of rational drug design and safety evaluation tools. The Drug Adverse Reaction Database (DART) is intended to provide comprehensive information about adverse effect targets of drugs described in the literature. Moreover, proteins involved in adverse effect targets of chemicals not yet confirmed as ADR targets are also included as potential targets. This database gives physiological function of each target, binding drugs/agonists/antagonists/activators/inhibitors, IC(50) values of the inhibitors, corresponding adverse effects, and type of ADR induced by drug binding to a target. Cross-links to other databases are also introduced to facilitate the access of information about the sequence, 3-dimensional structure, function, and nomenclature of each target along with drug/ligand binding properties, and related literature. The database currently contains entries for 147 ADR targets and 89 potential targets. A total of 187 adverse reaction conditions, 257 drugs, and 1080 ligands known to bind to each of these targets are also currently described. Each entry can be retrieved through multiple search methods including target name, target physiological function, adverse effect, ligand name, and biological pathways. A special page is provided for contribution of new or additional information.
DGA - Disease and Gene Annotations database (DGA, http://dga.nubic.northwestern.edu) is a collaborative effort aiming to provide a comprehensive and integrative annotation of the human genes in disease network context by integrating computable controlled vocabulary of the Disease Ontology (DO version 3 revision 2510, which has 8043 inherited, developmental and acquired human diseases), NCBI Gene Reference Into Function (GeneRIF) and molecular interaction network (MIN). DGA integrates these resources together using semantic mappings to build an integrative set of disease-to-gene and gene-to-gene relationships with excellent coverage based on current knowledge. DGA is kept current by periodically reparsing DO, GeneRIF, and MINs. DGA provides a user-friendly and interactive web interface system enabling users to efficiently query, download and visualize the DO tree structure and annotations as a tree, a network graph or a tabular list. To facilitate integrative analysis, DGA provides a web service Application Programming Interface for integration with external analytic tools.
DGIdb - The Drug-Gene Interaction database (DGIdb) mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development. It provides an interface for searching lists of genes against a compendium of drug-gene interactions and potentially 'druggable' genes.
DINIES - DINIES (drug-target interaction network inference engine based on supervised analysis) is a web server for predicting unknown drug-target interaction networks from various types of biological data (e.g. chemical structures, drug side effects, amino acid sequences and protein domains) in the framework of supervised network inference. The originality of DINIES lies in prediction with state-of-the-art machine learning methods, in the integration of heterogeneous biological data and in compatibility with the KEGG database. The DINIES server accepts any 'profiles' or precalculated similarity matrices (or 'kernels') of drugs and target proteins in tab-delimited file format. When a training data set is submitted to learn a predictive model, users can select either known interaction information in the KEGG DRUG database or their own interaction data. The user can also select an algorithm for supervised network inference, select various parameters in the method and specify weights for heterogeneous data integration. The server can provide integrative analyses with useful components in KEGG, such as biological pathways, functional hierarchy and human diseases. DINIES (http://www.genome.jp/tools/dinies/) is publicly available as one of the genome analysis tools in GenomeNet.
DITOP - MOTIVATION: Drug-induced toxicity related proteins (DITRPs) are proteins that mediate adverse drug reactions (ADRs) or toxicities through their binding to drugs or reactive metabolites. Collection of these proteins facilitates better understanding of the molecular mechanisms of drug-induced toxicity and the rational drug discovery. Drug-induced toxicity related protein database (DITOP) is such a database that is intending to provide comprehensive information of DITRPs. Currently, DITOP contains 1501 records, covering 618 distinct literature-reported DITRPs, 529 drugs/ligands and 418 distinct toxicity terms. These proteins were confirmed experimentally to interact with drugs or their reactive metabolites, thus directly or indirectly cause adverse effects or toxicities. Five major types of drug-induced toxicities or ADRs are included in DITOP, which are the idiosyncratic adverse drug reactions, the dose-dependent toxicities, the drug-drug interactions, the immune-mediated adverse drug effects (IMADEs) and the toxicities caused by genetic susceptibility. Molecular mechanisms underlying the toxicity and cross-links to related resources are also provided while available. Moreover, a series of user-friendly interfaces were designed for flexible retrieval of DITRPs-related information.
DRAR-CPI - Identifying new indications for existing drugs (drug repositioning) is an efficient way of maximizing their potential. Adverse drug reaction (ADR) is one of the leading causes of death among hospitalized patients. As both new indications and ADRs are caused by unexpected chemical–protein interactions on off-targets, it is reasonable to predict these interactions by mining the chemical–protein interactome (CPI). Making such predictions has recently been facilitated by a web server named DRAR-CPI. This server has a representative collection of drug molecules and targetable human proteins built up from our work in drug repositioning and ADR. When a user submits a molecule, the server will give the positive or negative association scores between the user’s molecule and our library drugs based on their interaction profiles towards the targets. Users can thus predict the indications or ADRs of their molecule based on the association scores towards our library drugs. We have matched our predictions of drug–drug associations with those predicted via gene-expression profiles, achieving a matching rate as high as 74%. We have also successfully predicted the connections between anti-psychotics and anti-infectives, indicating the underlying relevance of anti-psychotics in the potential treatment of infections, vice versa.
DrugBank - DrugBank is a richly annotated resource that combines detailed drug data with comprehensive drug target and drug action information. Since its first release in 2006, DrugBank has been widely used to facilitate in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education. The latest version of DrugBank (release 2.0) has been expanded significantly over the previous release. With approximately 4900 drug entries, it now contains 60% more FDA-approved small molecule and biotech drugs including 10% more 'experimental' drugs. Significantly, more protein target data has also been added to the database, with the latest version of DrugBank containing three times as many non-redundant protein or drug target sequences as before (1565 versus 524). Each DrugCard entry now contains more than 100 data fields with half of the information being devoted to drug/chemical data and the other half devoted to pharmacological, pharmacogenomic and molecular biological data. A number of new data fields, including food-drug interactions, drug-drug interactions and experimental ADME data have been added in response to numerous user requests. DrugBank has also significantly improved the power and simplicity of its structure query and text query searches.
DrugMatrix - DrugMatrix is the scientific communities' largest molecular toxicology reference database and informatics system. DrugMatrix is populated with the comprehensive results of thousands of highly controlled and standardized toxicological experiments in which rats or primary rat hepatocytes were systematically treated with therapeutic, industrial, and environmental chemicals at both non-toxic and toxic doses. Following administration of these compounds in vivo, comprehensive studies of the effects of these compounds were carried out at multiple time points and in multiple target organs. These studies included extensive pharmacology, clinical chemistry, hematology, histology, body and organ weights, and clinical observations. Additionally, a curation team extracted all relevant information on the compounds from the literature, the Physicians' Desk Reference, package inserts, and other relevant sources. The heart of the DrugMatrix database is large-scale gene expression data generated by extracting RNA from the toxicologically relevant organs and tissues and applying these RNAs to the GE Codelink™ 10,000 gene rat array and more recently the Affymetrix whole genome 230 2.0 rat GeneChip® array. DrugMatrix contains toxicogenomic profiles for 638 different compounds; these compounds include FDA approved drugs, drugs approved in Europe and Japan, withdrawn drugs, drugs in preclinical and clinical studies, biochemical standards, and industrial and environmental toxicants. Contained in the database are 148 scorable genomic signatures derived using MOSEK computational software that cover 96 distinct phenotypes. The signatures are informative of organ-specific pathology (e.g., hepatic steatosis) and mode of toxicological action (e.g., PXR activation in the liver). The phenotypes cover a number of common target tissues in toxicity testing (including liver, kidney, heart, bone marrow, spleen and skeletal muscle). The primary value that DrugMatrix provides to the toxicology community is in its capacity to use toxicogenomic data to perform rapid toxicological evaluations. Further value is provided by DrugMatrix ontologies that help characterize mechanisms of pharmacological/toxicological action and identify potential human toxicities. Overall, DrugMatrix allows a toxicologist to formulate a comprehensive picture of toxicity with greater efficiency than traditional methods.
Drug2Gene - Drug2Gene is an integrated database of relations between drugs and targets or more generally speaking- between compounds and genes. Mostly, the relations are meant to be bindings and are based on bioactivity data (like IC50s, potency, etc.), but there are also other relation types, e.g. compound X induces gene Y.
DTome - BACKGROUND: Understanding drug bioactivities is crucial for early-stage drug discovery, toxicology studies and clinical trials. Network pharmacology is a promising approach to better understand the molecular mechanisms of drug bioactivities. With a dramatic increase of rich data sources that document drugs' structural, chemical, and biological activities, it is necessary to develop an automated tool to construct a drug-target network for candidate drugs, thus facilitating the drug discovery process. RESULTS:We designed a computational workflow to construct drug-target networks from different knowledge bases including DrugBank, PharmGKB, and the PINA database. To automatically implement the workflow, we created a web-based tool called DTome (Drug-Target interactome tool), which is comprised of a database schema and a user-friendly web interface. The DTome tool utilizes web-based queries to search candidate drugs and then construct a DTome network by extracting and integrating four types of interactions. The four types are adverse drug interactions, drug-target interactions, drug-gene associations, and target-/gene-protein interactions. Additionally, we provided a detailed network analysis and visualization process to illustrate how to analyze and interpret the DTome network. The DTome tool is publicly available at http://bioinfo.mc.vanderbilt.edu/DTome.CONCLUSIONS:As demonstrated with the antipsychotic drug clozapine, the DTome tool was effective and promising for the investigation of relationships among drugs, adverse interaction drugs, drug primary targets, drug-associated genes, and proteins directly interacting with targets or genes. The resultant DTome network provides researchers with direct insights into their interest drug(s), such as the molecular mechanisms of drug actions. We believe such a tool can facilitate identification of drug targets and drug adverse interactions.
DssTox - SUMMARY: The Distributed Structure-Searchable Toxicity (DSSTox) ARYEXP and GEOGSE files are newly published, structure-annotated files of the chemical-associated and chemical exposure-related summary experimental content contained in the ArrayExpress Repository and Gene Expression Omnibus (GEO) Series (based on data extracted on September 20, 2008). ARYEXP and GEOGSE contain 887 and 1064 unique chemical substances mapped to 1835 and 2381 chemical exposure-related experiment accession IDs, respectively. The standardized files allow one to assess, compare and search the chemical content in each resource, in the context of the larger DSSTox toxicology data network, as well as across large public cheminformatics resources such as PubChem (http://pubchem.ncbi.nlm.nih.gov). AVAILABILITY: Data files and documentation may be accessed online at http://epa.gov/ncct/dsstox/.
eChemPortal - eChemPortal allows simultaneous searching of reports and datasets by chemical name and number and by chemical property. Direct links to collections of chemical hazard and risk information prepared for government chemical review programmes at national, regional and international levels are obtained. Classification results according to national/regional hazard classification schemes or to the Globally Harmonized System of Classification and Labelling of Chemicals (GHS) are provided when available. In addition, eChemPortal provides also exposure and use information on chemicals.
EDGE - Transcriptional profiling via microarrays holds great promise for toxicant classification and hazard prediction. Unfortunately, the use of different microarray platforms, protocols, and informatics often hinders the meaningful comparison of transcriptional profiling data across laboratories. One solution to this problem is to provide a low-cost and centralized resource that enables researchers to share toxicogenomic data that has been generated on a common platform. In an effort to create such a resource, we developed a standardized set of microarray reagents and reproducible protocols to simplify the analysis of liver gene expression in the mouse model. This resource, referred to as EDGE, was then used to generate a training set of 117 publicly accessible transcriptional profiles that can be accessed at http://edge.oncology.wisc.edu/. The Web-accessible database was also linked to an informatics suite that allows on-line clustering and K-means analyses as well as Boolean and sequence-based searches of the data. We propose that EDGE can serve as a prototype resource for the sharing of toxicogenomics information and be used to develop algorithms for efficient chemical classification and hazard prediction.
ExpressionBlast - ExpressionBlast is a search engine for gene expression data. It allows you to compare your own expression results to over thousands of studies and close to a million samples currently available at GEO, and find other studies (even across species) that have similar (or opposite) expression patterns to your results.
FINDSITE(comb) - Virtual ligand screening is an integral part of the modern drug discovery process. Traditional ligand-based, virtual screening approaches are fast but require a set of structurally diverse ligands known to bind to the target. Traditional structure-based approaches require high-resolution target protein structures and are computationally demanding. In contrast, the recently developed threading/structure-based FINDSITE-based approaches have the advantage that they are as fast as traditional ligand-based approaches and yet overcome the limitations of traditional ligand- or structure-based approaches. These new methods can use predicted low-resolution structures and infer the likelihood of a ligand binding to a target by utilizing ligand information excised from the target's remote or close homologous proteins and/or libraries of ligand binding databases. Here, we develop an improved version of FINDSITE, FINDSITE(filt), that filters out false positive ligands in threading identified templates by a better binding site detection procedure that includes information about the binding site amino acid similarity. We then combine FINDSITE(filt) with FINDSITE(X) that uses publicly available binding databases ChEMBL and DrugBank for virtual ligand screening. The combined approach, FINDSITE(comb), is compared to two traditional docking methods, AUTODOCK Vina and DOCK 6, on the DUD benchmark set. It is shown to be significantly better in terms of enrichment factor, dependence on target structure quality, and speed. FINDSITE(comb) is then tested for virtual ligand screening on a large set of 3576 generic targets from the DrugBank database as well as a set of 168 Human GPCRs. Excluding close homologues, FINDSITE(comb) gives an average enrichment factor of 52.1 for generic targets and 22.3 for GPCRs within the top 1% of the screened compound library. Around 65% of the targets have better than random enrichment factors. The performance is insensitive to target structure quality, as long as it has a TM-score ≥ 0.4 to native. Thus, FINDSITE(comb) makes the screening of millions of compounds across entire proteomes feasible.
GeneCards - GeneCards is a searchable, integrated, database of human genes that provides concise genomic related information, on all known and predicted human genes. The GeneCards human gene database extracts and integrates a carefully selected subset of gene related transcriptomic, genetic, proteomic, functional and disease information, from dozens of relevant sources. It provides robust user-friendly access to up-to-date knowledge. GeneCards overcomes barriers of data format heterogeneity, and uses standard nomenclature and approved gene symbols. GeneCards presents a complete summary for each gene, and provides the means to obtain a deep understanding of biology and medicine.GeneChaser - BACKGROUND:The amount of gene expression data in the public repositories, such as NCBI Gene Expression Omnibus (GEO) has grown exponentially, and provides a gold mine for bioinformaticians, but has not been easily accessible by biologists and clinicians.RESULTS:We developed an automated approach to annotate and analyze all GEO data sets, including 1,515 GEO data sets from 231 microarray types across 42 species, and performed 12,658 group versus group comparisons of 24 GEO-specified types. We then built GeneChaser, a web server that enables biologists and clinicians without bioinformatics skills to easily identify biological and clinical conditions in which a gene or set of genes was differentially expressed. GeneChaser displays these conditions in graphs, gives statistical comparisons, allows sort/filter functions and provides access to the original studies.We performed a single gene search for Nanog and a multiple gene search for Nanog, Oct4, Sox2 and LIN28, confirmed their roles in embryonic stem cell development, identified several drugs that regulate their expression, and suggested their potential roles in sex determination, abnormal sperm morphology, malaria infection, and cancer.CONCLUSION:We demonstrated that GeneChaser is a powerful tool to elucidate information on function, transcriptional regulation, drug-response and clinical implications for genes of interest.
GeneFriends - BACKGROUND: Although many diseases have been well characterized at the molecular level, the underlying mechanisms are often unknown. Nearly half of all human genes remain poorly studied, yet these genes may contribute to a number of disease processes. Genes involved in common biological processes and diseases are often co-expressed. Using known disease-associated genes in a co-expression analysis may help identify and prioritize novel candidate genes for further study. We have created an online tool, called GeneFriends, which identifies co-expressed genes in over 1,000 mouse microarray datasets. GeneFriends can be used to assign putative functions to poorly studied genes. Using a seed list of disease-associated genes and a guilt-by-association method, GeneFriends allows users to quickly identify novel genes and transcription factors associated with a disease or process. We tested GeneFriends using seed lists for aging, cancer, and mitochondrial complex I disease. We identified several candidate genes that have previously been predicted as relevant targets. Some of the genes identified are already being tested in clinical trials, indicating the effectiveness of this approach. Co-expressed transcription factors were investigated, identifying C/ebp genes as candidate regulators of aging. Furthermore, several novel candidate genes, that may be suitable for experimental or clinical follow-up, were identified. Two of the novel candidates of unknown function that were co-expressed with cancer-associated genes were selected for experimental validation. Knock-down of their human homologs (C1ORF112 and C12ORF48) in HeLa cells slowed growth, indicating that these genes of unknown function, identified by GeneFriends, may be involved in cancer.GeneFriends is a resource for biologists to identify and prioritize novel candidate genes involved in biological processes and complex diseases. It is an intuitive online resource that will help drive experimentation.
GPCRDB - The GPCRDB is a Molecular Class-Specific Information System (MCSIS) that collects, combines, validates and disseminates large amounts of heterogeneous data on G protein-coupled receptors (GPCRs). The GPCRDB contains experimental data on sequences, ligand-binding constants, mutations and oligomers, as well as many different types of computationally derived data such as multiple sequence alignments and homology models. The GPCRDB provides access to the data via a number of different access methods. It offers visualization and analysis tools, and a number of query systems. The data is updated automatically on a monthly basis.
Human-gpDB - G-protein coupled receptors (GPCRs) are a major family of membrane receptors in eukaryotic cells. They play a crucial role in the communication of a cell with the environment. Ligands bind to GPCRs on the outside of the cell, activating them by causing a conformational change, and allowing them to bind to G-proteins. Through their interaction with G-proteins, several effector molecules are activated leading to many kinds of cellular and physiological responses. The great importance of GPCRs and their corresponding signal transduction pathways is indicated by the fact that they take part in many diverse disease processes and that a large part of efforts towards drug development today is focused on them. We present Human-gpDB, a database which currently holds information about 713 human GPCRs, 36 human G-proteins and 99 human effectors. The collection of information about the interactions between these molecules was done manually and the current version of Human-gpDB holds information for about 1663 connections between GPCRs and G-proteins and 1618 connections between G-proteins and effectors. Major advantages of Human-gpDB are the integration of several external data sources and the support of advanced visualization techniques. Human-gpDB is a simple, yet a powerful tool for researchers in the life sciences field as it integrates an up-to-date, carefully curated collection of human GPCRs, G-proteins, effectors and their interactions. The database may be a reference guide for medical and pharmaceutical research, especially in the areas of understanding human diseases and chemical and drug discovery.
IUPHAR Receptor Database - This contribution highlights efforts by the International Union of Basic and Clinical Pharmacology (IUPHAR) Nomenclature Committee (NC-IUPHAR) to classify human receptors and ion channels, to document their properties, and to recommend ligands that are useful for characterization. This effort has inspired the creation of an online database (IUPHAR-DB), which is intended to provide free information to all scientists, summarized from primary literature by experts.
KEGG - Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/ or http://www.kegg.jp/) is a database resource that integrates genomic, chemical and systemic functional information. In particular, gene catalogs from completely sequenced genomes are linked to higher-level systemic functions of the cell, the organism and the ecosystem. Major efforts have been undertaken to manually create a knowledge base for such systemic functions by capturing and organizing experimental knowledge in computable forms; namely, in the forms of KEGG pathway maps, BRITE functional hierarchies and KEGG modules. Continuous efforts have also been made to develop and improve the cross-species annotation procedure for linking genomes to the molecular networks through the KEGG Orthology system. Here we report KEGG Mapper, a collection of tools for KEGG PATHWAY, BRITE and MODULE mapping, enabling integration and interpretation of large-scale data sets. We also report a variant of the KEGG mapping procedure to extend the knowledge base, where different types of data and knowledge, such as disease genes and drug targets, are integrated as part of the KEGG molecular networks. Finally, we describe recent enhancements to the KEGG content, especially the incorporation of disease and drug information used in practice and in society, to support translational bioinformatics.
LINCs Canvas Browser - or the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology.
LIPIDBANK - "LIPIDBANK for Web" is a new database system offering information on lipids over the Internet. This database was developed through a joint research project between the International Medical Center of Japan and the Japan Science and Technology Corporation from 1996 to 1999. It is composed of three parts: the factual data including name of the lipid, chemical and physical properties, information on biological activities, genetic information,etc., the graphic data including picture and figure, and the related reference data. This is the first lipid database open to the world through the internet from Japan. This article describes its' brief history and how to search the glycolinid data from "LIPIDBANK for Web".
LIPID MAPS - LIPID Metabolites And Pathways Strategy (LIPID MAPS) is a multi-institutional effort created in 2003 to identify and quantitate, using a systems biology approach and sophisticated mass spectrometers, all of the major — and many minor — lipid species in mammalian cells, as well as to quantitate the changes in these species in response to perturbation. The ultimate goal of our research is to better understand lipid metabolism and the active role lipids play in diabetes, stroke, cancer, arthritis, Alzheimer's and other lipid-based diseases in order to facilitate development of more effective treatments. Since our inception, we have made great strides toward defining the "lipidome" (an inventory of the thousands of individual lipid molecular species) in the mouse macrophage. We have also worked to make lipid analysis easier and more accessible for the broader scientific community and to advance a robust research infrastructure for the international research community. We share new lipidomics findings and methods, hold annual meetings open to all interested investigators, and are exploring joint efforts to extend the use of these powerful new methods to new applications.
MetaADEDB - Prediction and identification of adverse drug events (ADEs) play an important role in developing personalized medicines. Many ADEs are not identified during clinical trials until a drug was approved for use in the clinic, which results in adverse morbidity and mortality around the world. Consequently, we developed a comprehensive computer-available ADEs database (abbreviated as MetaADEDB) by integrating the comparative toxicogenomics database (CTD), SIDER and OFFSIDES, which connects 3060 chemicals (including 1300 FDA approved and experimental drugs) and 13, 256 ADEs. All drugs and diseases in MetaADEDB were annotated by the Medical Subject Headings and the Unified Medical Language System vocabularies. In total, 527, 216 drug-ADEs associations were created in MetaADEDB. The MetaADEDB provides a user-friendly interface to search a specific drug-ADE association, by drug name, ADE name, MESH or UMSL identifiers and similarity search.
NAR Databases - Compendium of the databases published in Nucleic Acids Research
NucleaRDB - The NucleaRDB is a Molecular Class-Specific Information System that collects, combines, validates and disseminates large amounts of heterogeneous data on nuclear hormone receptors. It contains both experimental and computationally derived data. The data and knowledge present in the NucleaRDB can be accessed using a number of different interactive and programmatic methods and query systems. A nuclear hormone receptor-specific PDF reader interface is available that can integrate the contents of the NucleaRDB with full-text scientific articles.
Pathguide - Pathguide contains information about 325 biological pathway related resources and molecular interaction related resources. Click on a link to go to the resource home page or 'Details' for a description page. Databases that are free and those supporting BioPAX, CellML, PSI-MI or SBML standards are respectively indicated.
Pathway Commons - Pathway Commons is a convenient point of access to biological pathway information collected from public pathway databases, which you can browse or search.
PCI DB - We started to compile a Protein-Compound Interaction Database (PCI DB) by collecting and merging information from dispersed databases with the aim of preparing information infrastructure for the next generation protein science.
PDTD - BACKGROUND: Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D) structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking) http://www.dddc.ac.cn/tarfisdock, which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD), and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation. DESCRIPTION: PDTD is a web-accessible protein database for in silico target identification. It currently contains >1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of >830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling) pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores. CONCLUSION: PDTD serves as a comprehensive and unique repository of drug targets. Integrated with TarFisDock, PDTD is a useful resource to identify binding proteins for active compounds or existing drugs. Its potential applications include in silico drug target identification, virtual screening, and the discovery of the secondary effects of an old drug (i.e. new pharmacological usage) or an existing target (i.e. new pharmacological or toxic relevance), thus it may be a valuable platform for the pharmaceutical researchers.
PharmGKB - The Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB: http://www.pharmgkb.org) is devoted to disseminating primary data and knowledge in pharmacogenetics and pharmacogenomics. We are annotating the genes that are most important for drug response and present this information in the form of Very Important Pharmacogene (VIP) summaries, pathway diagrams, and curated literature. The PharmGKB currently contains information on over 500 drugs, 500 diseases, and 700 genes with genotyped variants. New features focus on capturing the phenotypic consequences of individual genetic variants. These features link variant genotypes to phenotypes, increase the breadth of pharmacogenomics literature curated, and visualize single-nucleotide polymorphisms on a gene's three-dimensional protein structure.
PreDC - MOTIVATION: Drug combinations are a promising strategy for combating complex diseases by improving the efficacy and reducing corresponding side effects. Currently, a widely studied problem in pharmacology is to predict effective drug combinations, either through empirically screening in clinic or pure experimental trials. However, the large-scale prediction of drug combination by a systems method is rarely considered. RESULTS: We report a systems pharmacology framework to predict drug combinations on a computational model, termed PEA (Probability Ensemble Approach), for analysis of both the efficacy and adverse effects of drug combinations. Firstly, a Bayesian network integrating with a similarity algorithm is developed to model the combinations from drug molecular and pharmacological phenotypes, and the predictions are then assessed with both clinical efficacy and adverse effects. It is illustrated that PEA can predict the combination efficacy of drugs spanning different therapeutic classes with high specificity and sensitivity (AUC = 0.90), which was further validated by independent data or new experimental assays. PEA also evaluates the adverse effects (AUC = 0.95) quantitatively and detects the therapeutic indications for drug combinations. Finally, the PreDC (Predict Drug Combination) database includes 1571 known and 3269 predicted optimal combinations as well as their potential side effects and therapeutic indications.
ProtChemSI - Biological networks are powerful tools for predicting undocumented relationships between molecules. The underlying principle is that existing interactions between molecules can be used to predict new interactions. Here we use this principle to suggest new protein-chemical interactions via the network derived from three-dimensional structures. For pairs of proteins sharing a common ligand, we use protein and chemical superimpositions combined with fast structural compatibility screens to predict whether additional compounds bound by one protein would bind the other. The method reproduces 84% of complexes in a benchmark, and we make many predictions that would not be possible using conventional modeling techniques. Within 19,578 novel predicted interactions are 7,793 involving 718 drugs, including filaminast, coumarin, alitretonin and erlotinib. The growth rate of confident predictions is twice that of experimental complexes, meaning that a complete structural drug-protein repertoire will be available at least ten years earlier than by X-ray and NMR techniques alone.
ProTox - Animal trials are currently the major method for determining the possible toxic effects of drug candidates and cosmetics. In silico prediction methods represent an alternative approach and aim to rationalize the preclinical drug development, thus enabling the reduction of the associated time, costs and animal experiments. Here, we present ProTox, a web server for the prediction of rodent oral toxicity. The prediction method is based on the analysis of the similarity of compounds with known median lethal doses (LD50) and incorporates the identification of toxic fragments, therefore representing a novel approach in toxicity prediction. In addition, the web server includes an indication of possible toxicity targets which is based on an in-house collection of protein-ligand-based pharmacophore models ('toxicophores') for targets associated with adverse drug reactions. The ProTox web server is open to all users and can be accessed without registration at: http://tox.charite.de/tox. The only requirement for the prediction is the two-dimensional structure of the input compounds. All ProTox methods have been evaluated based on a diverse external validation set and displayed strong performance (sensitivity, specificity and precision of 76, 95 and 75%, respectively) and superiority over other toxicity prediction tools, indicating their possible applicability for other compound classes.
PTID - SUMMARY: Although in silico drug discovery approaches are crucial for the development of pharmaceuticals, their potential advantages in agrochemical industry have not been realized. The challenge for computer-aided methods in agrochemical arena is a lack of sufficient information for both pesticides and their targets. Therefore, it is important to establish such knowledge repertoire that contains comprehensive pesticides' profiles which include physicochemical properties, environmental fates, toxicities and mode of actions. Here, we present an integrated platform called Pesticide-Target interaction database (PTID), which comprises a total of 1 347 pesticides with rich annotation of ecotoxicological and toxicological data as well as 13 738 interactions of pesticide-target and 4 245 protein terms via text mining. Additionally, through the integration of ChemMapper, an in-house computational approach to polypharmacology, PTID can be used as a computational platform to identify pesticides targets and design novel agrochemical products.
Reactome - Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice.
SEA search tool - The identification of protein function based on biological information is an area of intense research. Here we consider a complementary technique that quantitatively groups and relates proteins based on the chemical similarity of their ligands. We began with 65,000 ligands annotated into sets for hundreds of drug targets. The similarity score between each set was calculated using ligand topology. A statistical model was developed to rank the significance of the resulting similarity scores, which are expressed as a minimum spanning tree to map the sets together. Although these maps are connected solely by chemical similarity, biologically sensible clusters nevertheless emerged. Links among unexpected targets also emerged, among them that methadone, emetine and loperamide (Imodium) may antagonize muscarinic M3, alpha2 adrenergic and neurokinin NK2 receptors, respectively. These predictions were subsequently confirmed experimentally. Relating receptors by ligand chemistry organizes biology to reveal unexpected relationships that may be assayed using the ligands themselves.
SMPDB - The Small Molecule Pathway Database (SMPDB) is an interactive, visual database containing more than 350 small-molecule pathways found in humans. More than 2/3 of these pathways (>280) are not found in any other pathway database. SMPDB is designed specifically to support pathway elucidation and pathway discovery in clinical metabolomics, transcriptomics, proteomics and systems biology. SMPDB provides exquisitely detailed, hyperlinked diagrams of human metabolic pathways, metabolic disease pathways, metabolite signaling pathways and drug-action pathways. All SMPDB pathways include information on the relevant organs, organelles, subcellular compartments, protein cofactors, protein locations, metabolite locations, chemical structures and protein quaternary structures. Each small molecule is hyperlinked to detailed descriptions contained in the Human Metabolome Database (HMDB) or DrugBank and each protein or enzyme complex is hyperlinked to UniProt. All SMPDB pathways are accompanied with detailed descriptions, providing an overview of the pathway, condition or processes depicted in each diagram. The database is easily browsed and supports full text searching. Users may query SMPDB with lists of metabolite names, drug names, genes/protein names, SwissProt IDs, GenBank IDs, Affymetrix IDs or Agilent microarray IDs. These queries will produce lists of matching pathways and highlight the matching molecules on each of the pathway diagrams. Gene, metabolite and protein concentration data can also be visualized through SMPDB's mapping interface. All of SMPDB's images, image maps, descriptions and tables are downloadable.
SPACE - MOTIVATION: Anatomical Therapeutic Chemical (ATC) classification system, widely applied in almost all drug utilization studies, is currently the most widely recognized classification system for drugs. Currently new drug entries are added into the system only on users' requests, which leads to seriously incomplete drug coverage of the system, and bioinformatics prediction is helpful during this process.RESULTS: Here we propose a novel prediction model of drug-ATC code associations, using logistic regression to integrate multiple heterogeneous data sources including chemical structures, target proteins, gene expression, side-effects and chemical-chemical associations. The model obtains good performance for the prediction not only on ATC codes of unclassified drugs but also on new ATC codes of classified drugs assessed by cross-validation and independent test sets, and its efficacy exceeds previous methods. Further to facilitate the use, the model is developed into a user-friendly web service SPACE ( S: imilarity-based P: redictor of A: TC C: od E: ), which for each submitted compound, will give candidate ATC codes (ranked according to the decreasing probability_score predicted by the model) together with corresponding supporting evidence. This work not only contributes to knowing drugs' therapeutic, pharmacological and chemical properties, but also provides clues for drug repositioning and side-effect discovery. In addition, the construction of the prediction model also provides a general framework for similarity-based data integration which is suitable for other drug-related studies such as target, side-effect prediction etc.
STITCH3 - Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug–target relationships and binding affinities. In STITCH 2, the number of relevant interactions is increased by incorporation of BindingDB, PharmGKB and the Comparative Toxicogenomics Database. The resulting network can be explored interactively or used as the basis for large-scale analyses. To facilitate links to other chemical databases, we adopt InChIKeys that allow identification of chemicals with a short, checksum-like string. STITCH 2.0 connects proteins from 630 organisms to over 74 000 different chemicals, including 2200 drugs.
STRING - An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein-protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources.
SuperTarget - There are at least two good reasons for the on-going interest in drug-target interactions: first, drug-effects can only be fully understood by considering a complex network of interactions to multiple targets (so-called off-target effects) including metabolic and signaling pathways; second, it is crucial to consider drug-target-pathway relations for the identification of novel targets for drug development. To address this on-going need, we have developed a web-based data warehouse named SuperTarget, which integrates drug-related information associated with medical indications, adverse drug effects, drug metabolism, pathways and Gene Ontology (GO) terms for target proteins. At present, the updated database contains >6000 target proteins, which are annotated with >330,000 relations to 196,000 compounds (including approved drugs); the vast majority of interactions include binding affinities and pointers to the respective literature sources. The user interface provides tools for drug screening and target similarity inclusion. A query interface enables the user to pose complex queries, for example, to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target proteins within a certain affinity range.
SuperToxic - Within our everyday life, we are confronted with a variety of toxic substances of natural or artificial origin. Toxins are already used, e.g. in medicine, but there is still an increasing number of toxic compounds, representing a tremendous potential to extract new substances. Since predictive toxicology gains in importance, the careful and extensive investigation of known toxins is the basis to assess the properties of unknown substances. In order to achieve this aim, we have collected toxic compounds from literature and web sources in the database SuperToxic. The current version of this database compiles about 60,000 compounds and their structures. These molecules are classified according to their toxicity, based on more than 2 million measurements. The SuperToxic database provides a variety of search options like name, CASRN, molecular weight and measured values of toxicity. With the aid of implemented similarity searches, information about possible biological interactions can be gained. Furthermore, connections to the Protein Data Bank, UniProt and the KEGG database are available, to allow the identification of targets and those pathways, the searched compounds are involved in.
TargetHunter - Target identification of the known bioactive compounds and novel synthetic analogs is a very important research field in medicinal chemistry, biochemistry, and pharmacology. It is also a challenging and costly step towards chemical biology and phenotypic screening. In silico identification of potential biological targets for chemical compounds offers an alternative avenue for the exploration of ligand-target interactions and biochemical mechanisms, as well as for investigation of drug repurposing. Computational target fishing mines biologically annotated chemical databases and then maps compound structures into chemogenomical space in order to predict the biological targets. We summarize the recent advances and applications in computational target fishing, such as chemical similarity searching, data mining/machine learning, panel docking, and the bioactivity spectral analysis for target identification. We then described in detail a new web-based target prediction tool, TargetHunter ( http://www.cbligand.org/TargetHunter ). This web portal implements a novel in silico target prediction algorithm, the Targets Associated with its MOst SImilar Counterparts, by exploring the largest chemogenomical databases, ChEMBL. Prediction accuracy reached 91.1% from the top 3 guesses on a subset of high-potency compounds from the ChEMBL database, which outperformed a published algorithm, multiple-category models. TargetHunter also features an embedded geography tool, BioassayGeoMap, developed to allow the user easily to search for potential collaborators that can experimentally validate the predicted biological target(s) or off target(s). TargetHunter therefore provides a promising alternative to bridge the knowledge gap between biology and chemistry, and significantly boost the productivity of chemogenomics researchers for in silico drug design and discovery.
Toxygates - In early stage drug development, it is desirable to assess the toxicity of compounds as quickly as possible. Biomarker genes can help predict whether a candidate drug will adversely affect a given individual, but they are often difficult to discover. In addition, the mechanism of toxicity of many drugs and common compounds is not yet well understood. The Japanese Toxicogenomics Project (TGP) provides a large database of systematically collected microarray samples from rats (liver, kidney, and primary hepatocytes) and human cells (primary hepatocytes) after exposure to 170 different compounds in different dosages and at different time intervals. However, until now, no intuitive user interface has been publically available, making it time consuming and difficult for individual researchers to explore the data. We present Toxygates, a user-friendly integrated analysis platform for this database. Toxygates combines a large microarray dataset with the ability to fetch semantic linked data, such as pathways, compound-protein interactions and orthologs, on demand. It can also perform pattern-based compound ranking with respect to the expression values of a set of relevant candidate genes. By using Toxygates, users can freely interrogate the transcriptome's response to particular compounds and conditions, which enables deep exploration of toxicity mechanisms
TPDB - BACKGROUND: The toxic effects of many simple organic compounds stem from their biotransformation to chemically reactive metabolites which bind covalently to cellular proteins. To understand the mechanisms of cytotoxic responses it may be important to know which proteins become adducted and whether some may be common targets of multiple toxins. The literature of this field is widely scattered but expanding rapidly, suggesting the need for a comprehensive, searchable database of reactive metabolite target proteins. DESCRIPTION: The Reactive Metabolite Target Protein Database (TPDB) is a comprehensive, curated, searchable, documented compilation of publicly available information on the protein targets of reactive metabolites of 18 well-studied chemicals and drugs of known toxicity. TPDB software enables i) string searches for author names and proteins names/synonyms, ii) more complex searches by selecting chemical compound, animal species, target tissue and protein names/synonyms from pull-down menus, and iii) commonality searches over multiple chemicals. Tabulated search results provide information, references and links to other databases. CONCLUSION: The TPDB is a unique on-line compilation of information on the covalent modification of cellular proteins by reactive metabolites of chemicals and drugs. Its comprehensiveness and searchability should facilitate the elucidation of mechanisms of reactive metabolite toxicity.
TTD - Increasing numbers of proteins, nucleic acids and other molecular entities have been explored as therapeutic targets, hundreds of which are targets of approved and clinical trial drugs. Knowledge of these targets and corresponding drugs, particularly those in clinical uses and trials, is highly useful for facilitating drug discovery. Therapeutic Target Database (TTD) has been developed to provide information about therapeutic targets and corresponding drugs. In order to accommodate increasing demand for comprehensive knowledge about the primary targets of the approved, clinical trial and experimental drugs, numerous improvements and updates have been made to TTD. These updates include information about 348 successful, 292 clinical trial and 1254 research targets, 1514 approved, 1212 clinical trial and 2302 experimental drugs linked to their primary targets (3382 small molecule and 649 antisense drugs with available structure and sequence), new ways to access data by drug mode of action, recursive search of related targets or drugs, similarity target and drug searching, customized and whole data download, standardized target ID, and significant increase of data (1894 targets, 560 diseases and 5028 drugs compared with the 433 targets, 125 diseases and 809 drugs in the original release described in previous paper).
T3DB - In an effort to capture meaningful biological, chemical and mechanistic information about clinically relevant, commonly encountered or important toxins, we have developed the Toxin and Toxin-Target Database (T3DB). The T3DB is a unique bioinformatics resource that compiles comprehensive information about common or ubiquitous toxins and their toxin-targets into a single electronic repository. The database currently contains over 2900 small molecule and peptide toxins, 1300 toxin-targets and more than 33 000 toxin-target associations. Each T3DB record (ToxCard) contains over 80 data fields providing detailed information on chemical properties and descriptors, toxicity values, protein and gene sequences (for both targets and toxins), molecular and cellular interaction data, toxicological data, mechanistic information and references. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, government documents, textbooks and scientific journals. A key focus of the T3DB is on providing 'depth' over 'breadth' with detailed descriptions, mechanisms of action, and information on toxins and toxin-targets. T3DB is fully searchable and supports extensive text, sequence, chemical structure and relational query searches, similar to those found in the Human Metabolome Database (HMDB) and DrugBank. Potential applications of the T3DB include clinical metabolomics, toxin target prediction, toxicity prediction and toxicology education.
WikiPathways -Here, we describe the development of WikiPathways (http://www.wikipathways.org), a public wiki for pathway curation, since it was first published in 2008. New features are discussed, as well as developments in the community of contributors. New features include a zoomable pathway viewer, support for pathway ontology annotations, the ability to mark pathways as private for a limited time and the availability of stable hyperlinks to pathways and the elements therein. WikiPathways content is freely available in a variety of formats such as the BioPAX standard, and the content is increasingly adopted by external databases and tools, including Wikipedia. A recent development is the use of WikiPathways as a staging ground for centrally curated databases such as Reactome. WikiPathways is seeing steady growth in the number of users, page views and edits for each pathway. To assess whether the community curation experiment can be considered successful, here we analyze the relation between use and contribution, which gives results in line with other wiki projects. The novel use of pathway pages as supplementary material to publications, as well as the addition of tailored content for research domains, is expected to stimulate growth further.