Data Management Plan

The NSTX-U Data Management Plan (DMP) is a critical component of the program’s fundamental research data pipeline, ensuring data standards, validation, security, and integrity. The DMP describes the types of data that are measured or produced through analysis, and it also describes the resources available for the data management and preservation during the course of research operations. In addition, the DMP describes software and methods available for sharing of data, and it provides a link to the NSTX-U Data Usage and Publications agreement for data access. Finally, information on the NSTX-U and PPPL research computing resources are provided. The entirety of the DMP satisfies DOE requirements. Oversight of the NSTX-U DMP is provided by the NSTX-U Director of Research: Stan Kaye. Any questions/inquiries should be directed via electronic mail to kaye@pppl.gov.

I. Data Categories

Data from NSTX-U discharges will be obtained from a suite of diagnostics measuring a broad range of plasma characteristics, as well as from analysis codes whose input is based on measured data. The three main categories of NSTX-U data are raw, reduced, and analyzed. The NSTX-U data described below are not proprietary or protected. All data types are preserved, and they are shareable as described in Sections II and III.


A. Raw

Raw (measured) data may take the form of voltages, emissivities, etc, and are not directly usable as input to higher level analysis routines. The raw data may be usable, however, for validation or repeatability. The raw data measured by the various diagnostics are stored in formats specific to each diagnostic. The raw data consist of:

1. OD - temporally and spatially constant information during the course of a plasma discharge such as fixed operational settings, device/facility conditions, etc.

2. 1D - temporally varying measurements (magnetic fluxes, neutron rates, etc.), or spatially varying data taken only at one time

3. 2D - measurements that vary both in time and space (kinetic profiles, etc.)

4. 3D - temporally varying 2D images (visible camera, gas puff imaging, etc.)


B. Reduced

Raw data will be converted to reduced data through diagnostic-specific analysis software. Reduced data will be in real physics units (e.g., temperatures, densities, etc.), and once validated by the responsible diagnostician, can be used as input to high level analysis codes. A listing of NSTX-U diagnostics, units for the measurements, and the person responsible for the diagnostic is provided here.


C. Analyzed

Validated reduced data that has been synthesized through direct analysis or through higher level analysis codes that are specific to each diagnostic. Analyzed data, along with some validated reduced data, form the basis for figures and physics conclusions presented in publications, which are shareable through both the MDSplus data storage system and the Princeton University DataSpace repository (see Sections III.C, V).


II. Data Management Resources, Storage, and Archival

This section will discuss data management resources, including those for storage and preservation of data. The resources used are shared and open-source for minimizing costs and person power requirements. The storage and preservation resources allow for data integrity and security, as well as enabling data sharing (see Section III).


A. Resources

On-site data management resources include real-time and post-experiment data reduction, standardized open-source data acquisition architecture and storage that is common among a large number of experiments both domestically and internationally (MDSPlus), on-call help for software and hardware issues (software and hardware engineers), coordinated hardware maintenance, upgrades and compatibility affecting computers owned by PPPL as well as those owned by collaborators, shared CPU resources with some CPUs dedicated to specific data acquisition and reduction tasks, and web-based visualization tools. In particular, the familiarity of researchers around the world with MDSplus allows for easy access to NSTX-U data and participation in the NSTX-U research program. Web-based information about the NSTX-U hardware and software environment is publicly available and can be found here. This website also gives information on available tools for generating plots, describing discharge conditions, and providing information and links to supporting tools commonly used for data analysis (MDSplus tools, Python, IDL, MATLAB), which are maintained on the local PPPL cluster. Outside resources include Google Mail, Sites, and Docs (NSTX-U web pages are managed through Google) and mdsplus.org for downloading and documentation of MDSPlus tools.


B. Data Storage

On-site data is stored in MDSPlus, the standard architecture for data management within the magnetic fusion community. All data are stored within this architecture, except for certain exceptions (such as fast camera videos, which is stored in its own repository, CAMDATA). Data storage is centrally managed and is contained in a dedicated project space. There is no standard format required for the video data, but the data format for this has evolved into a de facto standard. Access to NSTX-U data is described in Section III. Data contributed to international databases are stored on off-site servers but are accessible through the Web.


C. Data Preservation

Experimental and process data is preserved using the NSTX-U Instrumentation and Control systems enumerated under System Breakdown Structure element 1.6. The data is archived using the EPICS system archiver (engineering operations data repository) and using MDSplus driven acquisition for diagnostics. Data is made available using interfaces provided by the MDSplus software package. Other types of metadata such as logbooks, analyzed data and other data and databases of this type are also archived and made available through various interfaces such as web, APIs and custom tools.

Long term preservation uses a tape-backup library utilizing the Veritas "netbackup" software for end users with a self-help archiving system. Procedure ITD-003 (Nov. 2010) governs the PPPL backup policy and it includes both on- and off-site storage. Assistance on storage and archival is obtained from the instrumentation & Control group but routed through PPPL's helpdesk.


III. Data Access and Sharing

The raw and reduced data (defined above) are shared among NSTX-U Team members, and the data can be accessed through with MDSplus tools, which can be used directly or embedded in other software codes, such as those written in Python, IDL, or MATLAB. The analyzed data (defined above) are shared selectively either through MDSplus or through digital data files associated with published results.


A. Resources

Data sharing is facilitated through Web-based visualization tools accessible to public, open-source common MDSPlus architecture/tools including shared analysis codes, NTCC module library containing a number of physics analysis codes, common login cluster (ability to access main computer cluster from on- or off-site), trusted data movement mechanisms among PPPL, ORNL, GA, NERSC, MIT and ITER, common output file standardized formats (e.g., Plasma State file from TRANSP runs, NETCDF, HDF5, Excel, ASCII files, etc.). Data transfer over the internet is accomplished by a 10 Gigabyte ESNET connection to all National Labs, and Globus. Data provenance is limited to maintaining histories of data calibrations, etc through MDSPlus and keeping track of data smoothing, averaging, etc. in UFILES (for TRANSP runs), ensuring data integrity.


B. Access and Sharing

The on-line data are shared and accessible through a number of software tools, including direct access to the MDSplus storage architecture, as described above. All research data displayed in publications are made digitally accessible to the public at the time of publication. The data files are stored in the Princeton University DataSpace Repository in a dedicated project space. The data stored in DataSpace includes those displayed in charts, figures, images, etc., and they are identified uniquely by Archival Resource Keys (ARKs). The ARKs and/or URLS for accessing the data files will be given in the publication. The underlying digital research data used to generate the displayed data are made available through the establishment of a collaboration, whose requirements are given below (Sec. III.C)


C. Requirements

All funded NSTX-U Team members (users) have open access to the data stored in MDSplus and CAMDATA. Researchers outside the NSTX-U Team are granted access to the data upon request, and through a collaboration with an NSTX-U researcher of similar interest. The establishment of a collaboration is contingent on both identifying a point of contact with an NSTX-U researcher and reading and signing the NSTX-U Data Usage and Publication agreement, which governs the requirements for using data and publishing the results. This policy ensures that there will be lines of communication between the collaborator and the responsible NSTX-U point of contact, which, in turn, ensures proper use of the data and review of work prior to being submitted for publication.


IV. Links to NSTX-U and PPPL Data Management Resources

The following links provide additional information for the management and analysis of NSTX-U data.


General information on PPPL computing


PPPL Research Computing


PPPL Information Technology


NSTX-U Software and Systems

NSTX-U Software Page

NSTX-U Diagnostics

NSTX-U EPICs


Transport Analysis Code

TRANSP


V. Digital Data

Digital data in support of publications are provided in accordance with DOE policy. This will include data displayed in charts, figures, images, etc., and they will be identified uniquely by Archival Resource Keys (ARKs). The ARKs and/or URLS for accessing the data files will be given in the publication. The data files will be stored in the Princeton University DataSpace Repository. Both “readme” and “digital data” files are uploaded to DataSpace. The readme files contain the description of the data contained in the digital data files.

Instructions for authors on how to include the ARKs in the publications, as well as how to upload the readme and data files to the DataSpace repository, can be found here.