BraSNAM 2023 Dataset Submission Guidelines

Open Science has become an irreversible path taken by many scientists recently, aiming to make their research products openly available to the public and the scientific community (Kidwell et al, 2016; Popkin, 2019). The XII Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2023) acknowledges this scientific movement and provides the opportunity to submit datasets papers. This document aims to define guidelines for authors who intend to submit their work and promote a short discussion on what it means to publish data.


BraSNAM 2023 aims to develop a clear and voluntary policy about access to data from scientific works in support of research and data papers. Therefore, we strongly encourage the authors of research papers to provide access to their data in open datasets instead of as supplementary material files with restricted access conditions.


What is a dataset paper?


A dataset paper is a peer-reviewed document describing a dataset. It shows the effort to prepare, curate, and describe data organized by the authors.


The aim of a dataset paper is to describe data and the process of their collection. It doesn't intend to report hypotheses and conclusions as in conventional research papers. Dataset papers provide recognition for the effort of the scientific community dedicated to providing valuable data from many different sources and contexts so that other researchers can reuse them in their own studies.


Guidelines for submitting dataset papers


Dataset papers peer review is not yet a well-defined process. Nevertheless, the goal of the peer review process is to increase trust in scientific data and results, as well as certify datasets of their quality (Mayernik et al., 2015).


The datasets submitted to the workshop should follow the FAIR guiding principles (Wilkinson et al., 2016). The FAIR guidelines state that data must be Findable, Accessible, Interoperable, and Reusable. It also points out that these principles put emphasis on enhancing the ability of machines to automatically find and use the data.


To be Findable, datasets must have:

1. A persistent Digital Object Identifier (DOI). For this workshop, the DOI must be generated after the acceptance of the paper.

2. Rich metadata. Please provide a separate file that explains the meaning of all variables in the data set.

3. Dataset citation in the data availability statement and in the reference list of the manuscript. The citation must be done after the work is accepted for publication.


To be Accessible, datasets must be:

1. Openly and freely retrievable by their identifier (DOI).

2. Hosted by a stable and recognized open repository. For the workshop, we will require that the data be available in a Google Drive repository for blind-review and if accepted, stored on Zenodo.

3. Submitted to discipline-specific repositories, that is, the content of your submission should fit the venue of the repository. For the workshop, we will use Zenodo, as it is a generalist repository.


To be Interoperable, datasets must be:

1. Logically and consistently formatted, facilitating the use by others. I.e. for excel spreadsheets, convert the files to plain text formats (csv or txt, for instance).

2. Describe using a standard vocabulary, if possible. Standard vocabularies consist of lists of terms that cover disciplines of relevance to the study topic.


To be Reusable, datasets should:

1. Provide a metadata file including the exact variable name as in the data file, measurement units and a longer explanation of what the variable means.

2. Provide a clear and accessible data usage license, preferentially Creative Commons versions of licenses.

3. Provide details like version, accessibility of any software that is required to view the data or to replicate the analysis. If the software was programmed by the authors, the source code must also be provided together with the dataset.


Step-by-step instructions


Considering the guidelines discussed above, we provide a step-by-step list of tasks for dataset paper submissions to the workshop.


1. The peer review process for the dataset papers is single-blind. Therefore, make sure the authors' names are not on the paper nor in the link for the dataset. The authors will also be responsible for any sensitive information in the datasets. Upload your dataset in a volatile repository for evaluation. That is, upload your data to some service that provides a link to your dataset allowing the holder of the link to download your data. Make sure that the authorship of the data is not traceable through the link. We strongly recommend using Google Drive, as it generates a random link to your dataset folder with reading permissions without connecting the folder to your account.

2. After the evaluation of your dataset paper, and (hopefully) acceptance, you might upload your dataset on Zenodo (https://zenodo.org/), generate the DOI, make sure that all the information required by the guidelines is present in the repository, and change the link of the paper to the permanent repository for the camera-ready version of your work.



References:


Kidwell, M.C., et al. (2016). Badges to Acknowledge Open Practices: A Simple, Low-Cost, Effective Method for Increasing Transparency. PLoS Biology 14(5): e1002456. doi:10.1371/journal.pbio.1002456


Mayernik, M. S., Callaghan, S., Leigh, R., Tedds, J., & Worley, S. (2015). Peer Review of Datasets: When, Why, and How, Bulletin of the American Meteorological Society, 96(2), 191-201. https://doi.org/10.1175/BAMS-D-13-00083.1


Popkin, G. (2019). Data sharing and how it can benefit your scientific career. Nature 569:445-447. doi: 10.1038/d41586-019-01506-x


Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018. https://doi.org/10.1038/sdata.2016.18


Templates: