Data Management Plan

The NSTX-U Data Management Plan (DMP) is a critical component of the program’s fundamental research data pipeline, ensuring data standards, validation, security, integrity, data sharing and preservation. The DMP describes the types of data that are measured or produced through analysis, and it also describes the resources available for the data management and preservation during the course of research operations. In addition, the DMP describes software and methods available for data sharing, and it provides a link to the NSTX-U Data Usage and Publications Agreement for data access. Finally, information on the NSTX-U and PPPL research computing resources are provided. The entirety of the DMP satisfies Department of Energy (DOE) requirements. Oversight of the NSTX-U DMP is provided by the NSTX-U Director of Research: Stan Kaye. Any questions/inquiries should be directed via electronic mail to kaye@pppl.gov.

I. Data Categories

Data from NSTX-U discharges will be obtained from a suite of diagnostics measuring a broad range of plasma characteristics, as well as from analysis codes whose input is based on measured data. The three main categories of NSTX-U data are raw, reduced, and analyzed. The NSTX-U data described below are not proprietary or protected. All data types are preserved, and they are shareable as described in Sections II and III

A. Raw

Raw (measured) data may take the form of voltages, emissivities, etc, and are not directly usable as input to higher level analysis routines. The raw data may be usable, however, for validation or repeatability. The raw data measured by the various diagnostics are stored in formats specific to each diagnostic. The raw data consist of:

1. 0D - temporally and spatially constant information during the course of a plasma discharge (e.g., fixed operational settings, device/facility conditions).

2. 1D - temporally varying measurements (e.g., magnetic fluxes, neutron rates), or spatially varying data taken only at one time

3. 2D - measurements that vary both in time and space (e.g., kinetic profiles)

4. 3D - temporally varying 2D images (e.g., visible camera, gas puff imaging)

B. Reduced

Raw data will be converted to reduced data through diagnostic-specific analysis software. Reduced data will be in real physics units (e.g., temperatures, densities, etc.), and once validated by the responsible diagnostician, can be used as input to high level analysis codes. A listing of NSTX-U diagnostics, units for the measurements, and the person responsible for the diagnostic is provided here.

C. Analyzed

Validated reduced data that has been synthesized through direct analysis or through higher level analysis codes that are specific to each diagnostic. Analyzed data, along with some validated reduced data, form the basis for figures and physics conclusions presented in publications, which are shareable through both the MDSplus data storage system and the Princeton (University) Data Commons repository (see Sections III.B, V).

II. Data Management Resources, Storage, and Preservation

This section will discuss data management resources, including those for storage and preservation of data. The resources used are shared and open-source for minimizing costs and personnel resource requirements. The storage and preservation resources allow for data integrity and security, as well as enabling data sharing (see Section III). 

A. Resources

On-site data management resources include:


In particular, the familiarity of researchers around the world with MDSplus allows for easy access to NSTX-U data and participation in the NSTX-U research program. Web-based information about the NSTX-U hardware and software environment is publicly available and can be found here. This website also provides information on available tools for generating plots, describing discharge conditions, and providing information and links to supporting tools commonly used for data analysis (MDSplus tools, Python, IDL, MATLAB), which are maintained on the local PPPL cluster. External resources include Google Mail, Sites, and Docs (the NSTX-U website is managed through Google), and mdsplus.org for downloading and documentation of MDSplus tools.

B. Data Storage

On-site data is stored in MDSplus, the standard architecture for data management within the magnetic fusion community. Most data are stored within this architecture; one exception is fast camera videos, which are stored in its own repository, CAMDATA. Data storage is centrally managed and is contained in a dedicated project space. There is no standard format required for the video data. Access to NSTX-U data is described in Section III. Data contributed to international databases are stored on off-site servers but are accessible through the Web. 

C. Data Preservation

Experimental and process data is preserved using the NSTX-U Instrumentation and Control systems enumerated under System Breakdown Structure element 1.6. The data is archived using the EPICS system archiver (engineering operations data repository) and using MDSplus driven acquisition for diagnostics. Data is made available using interfaces provided by the MDSplus software package. Other types of metadata such as logbooks, analyzed data and other data and databases of this type are also archived and made available through various interfaces such as web, APIs and custom tools.


Long term preservation uses a tape-backup library utilizing the Veritas "netbackup" software for end users with a self-help archiving system. Procedure TCR-P-106 (Jan. 2018) governs the PPPL backup policy and it includes both on- and off-site storage. Assistance on storage and archival is obtained from the Instrumentation & Control group but routed through PPPL's Helpdesk (login required).


III. Data Access and Sharing

The raw and reduced data (defined above) are shared among NSTX-U Team members, and the data can be accessed through MDSplus tools, which can be used directly or embedded in other software codes, such as those written in Python, IDL, or MATLAB. The analyzed data (defined above) are shared selectively either through MDSplus or through digital data files associated with published results.

A. Resources

Data sharing is facilitated through web-based visualization tools accessible to public, open-source common MDSplus architecture/tools including shared analysis codes, NTCC module library containing a number of physics analysis codes, common login cluster (ability to access main computer cluster from on- or off-site), trusted data movement mechanisms among PPPL, ORNL, GA, NERSC, MIT, and ITER, common output file standardized formats (e.g., Plasma State file from TRANSP runs, NETCDF, HDF5, Excel, ASCII files, etc.). Data transfer over the internet is accomplished by a 100 Gigabit ESNET connection to all National Labs, and Globus. Data provenance is limited to maintaining histories of data calibrations through MDSplus, keeping track of data smoothing, averaging, etc. in UFILES (for TRANSP runs), and ensuring data integrity.

B. Access and Sharing

The online data are shared and accessible through a number of software tools, including direct access to the MDSplus storage architecture, as described above. All research data displayed in publications are made digitally accessible to the public at the time of publication. The data files are stored in the Princeton Data Commons (PDC) repository in a dedicated project space. The data stored in PDC includes those displayed in charts, figures, images, etc., and they are identified uniquely by Digital Object Identifiers (DOIs). The DOIs and/or URLs for accessing the data files will be provided in the publications, conference papers, and presentations/posters. The underlying digital research data used to generate the displayed data are made available through the establishment of a collaboration, whose requirements are given below (Sec. III.C).

C. Requirements

All funded NSTX-U Team members (users) have open access to the data stored in MDSplus and CAMDATA. Researchers outside the NSTX-U Team are granted access to the data upon request, and through a collaboration with an NSTX-U researcher of similar interest. The establishment of a collaboration is contingent on both identifying a point of contact with an NSTX-U researcher and reading and signing the NSTX-U Data Usage and Publication Agreement, which governs the requirements for using data and publishing the results. This policy ensures that there will be lines of communication between the collaborator and the responsible NSTX-U point of contact, which, in turn, ensures proper use of the data and review of work prior to being submitted for publication.


IV. Links to NSTX-U and PPPL Data Management Resources

The following links provide additional information for the management and analysis of NSTX-U data.


General information on PPPL computing


NSTX-U Software and Systems


Transport Analysis Code

V. Digital Data

Digital data in support of publications are provided in accordance with DOE policies and plans (e.g., DOE Order 241.1B, Public Access Plan). This includes data displayed in charts, figures, images, etc. Such datasets are given a unique DOI. The DOIs and/or URLs for accessing the data files will be provided in publications, conference papers, presentations/posters. The data files are stored in the Princeton Data Commons (PDC) Repository. Documentation (e.g., readme files) and digital data files, and other information are uploaded to PDC.


Instructions for authors on how to include the DOIs in the publications can be found here. Instructions for uploading readme and data files to the PDC repository are contained in the guide.