Sharing and integrating scientific research data are common requirements of international and interdisciplinary data intensive research collaborations but are often difficult for a variety of technical, cultural, policy and legal reasons. For example, the NSF’s INTEROP and DataNet programs are addressing many of the technical and cultural issues through their funded projects, including DataONE, but the legal and policy issues surrounding data are conspicuously missing from that work. The ultimate success of programs like DataNet depends on scalable data sharing that includes data governance.

Data governance is the system of decision rights and accountabilities that describe who can take what actions with what data, and when, under what circumstances, using what methods. It includes laws and policies associated with data, as well as strategies for data quality control and management in the context of an organization. It includes the processes that insure important data assets are formally managed throughout an organization, including business processes and risk management. This includes virtual organizations such as large, international research collaborations. Data governance ensures that data can be trusted and that people can be made accountable for actions affecting the data.

The research community recognizes that data governance issues such as legal licensing, and related technical issues such as implementing attribution on the Web, would benefit from wider community discussion. The workshop will address questions of how large-scale, international scientific research collaborations can legally and effectively share research data. The workshop will cover
• Legal/policy issues (e.g. data copyrights, sui generis data rights, licensing and contracts for data)
• Attribution and/or citation requirements (often mandated by license),
• Persistence of data and its citability (e.g. identifiers for data and data creators)
• Discovery and provenance metadata, including its governance (e.g. licenses for metadata)
• Schema/ontology discovery and sharing, including governance (e.g. licenses for ontologies)

The primary goal of the workshop is to develop recommendations to research sponsors and the broader community of scientific stakeholders on these issues. In particular, the workshop will discuss how NSF OCI (e.g. DataNet) projects might address these data governance questions as part of a sound data management plan, as mandated by the NSF grant proposal guidelines. One approach to understanding what is needed for a good data management plan is to work with current projects focused on data management to develop reasonable strategies for data governance.

Invited participants have varying expertise in the scientific, technical, legal and cultural aspects of research data, and will develop both short-term recommendations and a long-term research agenda for the continuation of this work. Participants will be asked to provide a brief, one-page position paper on data governance barriers and challenges they have experienced or would like to be addressed, such as: license choice (or copy/data rights, contracts, and public domain dedications), attribution, publishing, citing, provenance, metadata, and standards adoption (e.g. for ontologies).