GNPS
Wang, Mingxun, et al. "Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking." Nature Biotechnology 34.8 (2016): 828-837. PMID: 27504778
Network Annotation Propagation via GNPS
da Silva, Ricardo R., et al. "Propagating annotations of molecular networks using in silico fragmentation." PLoS computational biology 14.4 (2018): e1006089.
The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data.
Representation of interactions among the NP community, GNPS spectral libraries, and GNPS data sets. At present 221,083 MS/MS spectra from 18,163 unique compounds are used for searches in GNPS. These include both third-party libraries, such as MassBank, ReSpect, and NIST, as well as spectral libraries created for GNPS (GNPS-Collections) and spectra from the NP community (GNPS-Community). GNPS spectral libraries grow through user contributions of new identifications of MS/MS spectra. To date, 55 community members have contributed 8,853 MS/MS spectra from 5,568 unique compounds (30.5% of the unique compounds available). In addition, ongoing curation efforts have already yielded 563 annotation updates for library spectra. The utility of these libraries is to dereplicate compounds (recognition of previously characterized and studied known compounds), in both public and private data. This dereplication process is performed on all public data sets and results are automatically reported, thus enabling users to query all data sets, organisms, and conditions. Automatic reanalysis of all public data creates a virtuous cycle in which contributions to libraries can be matched to all public data. Combined with molecular networking (Fig. 3), this automatic reanalysis empowers community members to identify analogs that can then be added to GNPS spectral libraries.
The GNPS platform has grown to serve a global user base of >9,200 users from 100 countries.