News and Updates‎ > ‎

Putting the Data Curation Network to the Test

posted Oct 12, 2018, 2:37 PM by Elizabeth Coburn   [ updated Oct 15, 2018, 6:15 AM ]

We are happy to report that the implementation phase of the DCN is well underway! Shortly after the first DCN All Hands Meeting held July 2018 in Minneapolis, MN we began piloting our shared staffing model for curating research data across a network of data repositories. Our goals for these first six months are to test and improve the DCN workflow, to inform our tracking system development (in Atlassian’s Jira) where all of the curation work will take place, and to give every curator in the network a chance to test the curation process with real data. We’ve run two datasets through the workflow so far. These exercises have proven to be informative, successful, and overall exceeded our expectations in terms of their utility.

The first pilot involved a Bioinformatics dataset (R and CSV files) submitted by a DCN curator from the University of Illinois. The dataset was matched with the expertise of a DCN Curator at Penn State University. The resulting curation and timeline view of this first pilot are detailed in the Value Stream Map (Figure 1), below.


Figure 1. Pilot 1 Value Stream Map

Value Stream Maps are a useful tool in the DCN’s workflow development. By tracking a dataset as it makes its way through our workflow from start to finish (e.g., how much time is spent working versus how much time is spent waiting) we can begin to label the necessary steps for tracking using our software. In Figure 2, below, the Jira workflow steps (“statuses”) are matched to the corresponding Value Stream Map steps.
Figure 2. Jira Workflow is derived and confirmed from the Pilot 1 Value Stream Map


In addition to confirming our assumptions regarding workflows, and giving curators an opportunity to test drive the process, the pilots have proven illuminating in other ways. Our test cases have shown real world challenges that our Network will face. For example, as each pilot has operated on a fairly relaxed timeline (we have not imposed deadlines or due dates up to this point) we’ve experienced delays related to other professional obligations and priorities outside of the Network, such as vacation schedules, weekends, waiting on researchers and data submitters for more information or dealing with any other issues. Timing and scheduling are especially important issues for the Network given its end-user oriented nature, and having a chance to begin to develop methods for dealing with these issues now, during the implementation phase, is essential. 

We will continue testing the DCN throughout the fall, giving every curator at least one opportunity to participate, and many opportunities for discussion and to provide feedback. Our goal is to have the DCN operational in early 2019 to begin curating data sets originating from institutions in the Network.

Seeing everything come together, and our success with these first two test datasets, has been both exciting and encouraging!

Comments