The Data Interest Group/Chemistry: DIGChem

Coming in 2018:

More (Future/Past)

The Data Interest Group/Chemistry is an effort to foster a culture of data sharing within the chemistry community. In order to accomplish this vision, CRDIG is analyzing the existing landscape of chemistry data standards and chemical data repositories, evaluating and updating existing standards, analyzing the need for domain specific repositories, and advocating for and educating researchers, librarians, publishers, and vendors on the benefits of research data sharing. Anyone in the broader chemistry community who is interested is welcome to join the discussions. We are working in conjunction with the International Union of Pure and Applied Chemistry (IUPAC) Subcommittee on Cheminformatics Data Standards (SCDS) and the Research Data Alliance (RDA) Chemistry Research Data Interest Group (CRDIG). Discussions of the interest group have been held at the ACS National Meetings, with the Division of Chemical Information (CINF), at the RSC Chemical Information and in Computer Applications Group (CICAG), at the IUPAC General Assembly, at RDA Plenaries, and at the Beilstein Institute.

A number of working groups are being formed, using Google Drive as a platform for developing and sharing documentation. Individual groups may use alternate sites, such as GitHub, for development efforts where appropriate. The groups are briefly described below, with links to the Google Drive directories containing the shared documents.

  • Journal Data Publication Guidelines: A spreadsheet indicating the requirements and recommendations for supporting data from a broad range of chemistry journals and publishers. Ideally, this analysis could drive efforts at standardizing recommendations across journals.
  • Survey of Existing Cheminformatics Standards/Repositories: Annotated bibliography of existing standards in cheminformatics as well as chemical data repositories, including status on ownership, curation, business model, etc.
  • Workflows and Tools for Chemical Research Data Publication: In conjunction with researchers, instrument and software vendors, repositories, librarians, publishers, develop exemplar workflows and tools to support researchers who want to share their data, as well as ease to process for repositories to import data.
  • JCAMP-DX Update: Update the existing JCAMP-DX standard to include optional identifiers for journal article citations (DOIs), researcher identity (ORCID), substances (InChI, PDB), and others. In addition, vendor specific additions to the standard will be reviewed for possible normalization.
  • NMR/Spectra Repositories: Many longstanding repositories already exist in chemistry, with established workflows to import and curate data from a variety of sources. New repositories are being established to encourage researchers themselves to upload and publish content, often without curation. Generic repositories may not include the appropriate metadata for chemical data. This group will focus on the data and metadata needs for NMR and other spectra repositories.
  • Open Chemical Structure Representation: While a number of chemical structure representations are de facto standards, their details may or may not be published comprehensively or openly, and those software packages using the standards are at some level of risk when the standards might be updated in undocumented ways. This group will study whether specific standards could/should be made open, and advocate for making those open, where appropriate.
  • Metadata Recommendations for DataCite Registration of DOIs: Researchers are already beginning to register data publications via DataCite, with extensions as allowed to the DataCite metadata schema. Recommendations are needed to ensure that chemical data can be handled in a uniform and reusable way.
  • Education: [pedagogy, data science, quantitative analysis, cheminformatics, laboratory notebook skills]
  • Professional Training: Develop material to support and train researchers and support staff in best practices for data publication, using the best practice recommendations of the chemistry domain.
  • Cheminformatics Color Book Dissemination: There is an IUPAC project to disseminate the results of these working groups as a formal IUPAC Color Book, but with a focus on digital publication and ease of re-use by both humans and computers.
  • Gold Book Website: The IUPAC Gold Book is an aggregation of IUPAC standards and recommendations from the other IUPAC Color Books, which are themselves based on PAC reports and recommendations after consideration by ICTNS. This group will focus on enhancing the machine readability of terms in the Gold Book.

Note that all of the efforts can be viewed at the main DIGChem directory level.

We welcome broad input from the chemistry community. If you are interested in joining this effort, please contact the co-Chairs at: DIGchem@outlook.com.

Thanks for your interest.