Loading Data

The essential purpose of the DataNav suite of applications is the effective display of scientific data, whether that be in a standalone figure for journal publication or in a large repository managed as a DataNav portal. A key issue you must face, then, is loading that data -- potentially LOTS of data -- in a compact and efficient form that the DataNav applications can recognize and consume, and doing it as conveniently and quickly as possible.

How data sets are defined in DataNav

A data set object in DataNav is encapsulated by an identifier, a data set format code, a breadth B and length L, additional parameters that vary with the set format, and a single-precision floating-point array that holds the actual raw data. The identifier is a non-empty ASCII string up to 40 characters long and consisting only of alphanumeric characters and selected punctuation marks ( $_[](){}+-^!=@|.<> ). The contents of the raw data array and the number and significance of the additional parameters depends on the data set format. Currently, DataNav supports seven different formats:

- ptset : A single point set with optional standard deviation data. Each data point in the set is represented by a tuple of the form {x y [yStd y xStd xe]}. All tuples must have the same length B, which must be between 2 and 6, and L such tuples are stored sequentially in the raw data array. There are no additional parameters for this format.
- series : A single data series sampled at regular intervals in x, with optional standard deviation data. Each sample in the series is represented by a tuple of the form {y [yStd ye]}. The implied x-coords are {x0 + 0*dx, x0 + 1*dx, x0 + 2*dx, ...}, where the sample interval dx and the initial value x0 are additional parameters associated with this set format. All tuples have the same length B, which must be between 1 and 3, and L such tuples are stored sequentially in the raw data array.
- mset : A collection of 1+ (usually many) individual point sets all sharing the same x-coords ("m" for "multiple"). In typical usage, each individual point set represents a repeated measure of the same stochastic phenomenon, so the variation in that phenomenon is captured in the collection. The collection is represented as a sequence of L tuples of the form {x y1 [y2 y3 ...]}. All tuples have the same length B, the number of individual point sets in the collection is B-1, and each set contains L points. There are no additional parameters.
- mseries : A collection of 1+ (usually many) individual data series all sampled at {x0 + 0*dx, x0 + 1*dx, x0 + 2*dx, ...}, as for the series format. The collection is represented as a sequence of L tuples of the form {y1 [y2 y3 ...]}. All tuples have the same length B, which is the number of individual series in the collection.
- raster1d : A collection of B rasters in x. In this format, the data array begins with a list of B raster lengths, followed by each raster's samples: {n1 n2 .. nB x1(1) x1(2) .. x1(n1) x2(1) x2(2) .. x2(n2) ... xB(1) xB(2) .. xB(nB)}. Obviously, the individual rasters can and are likely to have different lengths. Note that the total number of raster samples is n1 + n2 + ... + nB = L, so the total length of the raw data array is B+L. There is no y-coordinate associated with this data set format, which can only be rendered by the raster presentation node. There are no additional parameters associated with this format.
- xyzimg : A 3D data set in the form {x, y, z(x,y)}, where one measured variable z is a function of two independent variables x,y. The raw data array in this case is a B×L matrix holding the "intensity" z(x,y) at each "pixel" (x,y), where x=[1..B] and y=[1..L]. The image-like matrix is stored row-wise in the 1D data array. Additional defining parameters include the actual range [x0..x1] spanned by the data in x (in "user" units), and the range [y0..y1] spanned in y.
- xyzset : A generic 3D point set, where each point in the set is represented by the tuple {x, y, z}. All tuples have the same length B==3, and L such tuples are stored sequentially in the raw data array. There are no additional parameters for this format.

For the ptset format, (xStd, yStd) are the standard deviations in x and y at each point (x,y), while (xe,ye) determine how nonzero standard deviations are rendered as error bars. If ye = 0, an error bar of length 2·yStd is drawn through the data point (x,y), from (x, y-yStd) to (x, y+yStd). If ye = 1, a one-sided error bar is drawn from (x,y) to (x, y+yStd); if ye = -1, the one-sided error bar is drawn from (x,y) to (x,y-yStd). If ye is any other value, then no error bar is drawn, even if yStd is nonzero. Analogously for xe. Only the data point's coordinates (x, y) are required. An error bar is drawn only if the corresponding standard deviation is explicitly specified and nonzero; only the absolute value of the standard deviation is used. If an error bar position code is omitted (i.e., if the datum tuple is {x y yStd} or {x y yStd ye xStd}), the error bar will be centered on the data point. [NOTE: To get a data point with a horizontal error bar only, we CANNOT use {x y xStd}, because the third number is always interpreted as yStd. Use {x y 0 0 xStd xe} instead.]

How to enter data into DataNav apps

The simplest (but not easiest!) way to get data into Figure Composer is to enter it manually, using the Dataset Editor dialog. With this dialog you can change the data set ID, format, parameters, and raw data content. You can enter data one sample at a time, or you can paste numbers copied from a spreadsheet program like Excel or from a text editor, as long as the "shape" and numeric content of the copied selection "make sense". For more information, review the section on the Dataset Editor.

Entering data manually, even if you're copying it from a spreadsheet or text editor, quickly becomes tedious when you're dealing with a large data set or a large number of individual data sets. A better way to handle data is to store your sets in DataNav-compatible source files, which can then be read by Figure Composer or DataNav Builder.

While loading data from file in this way is a much more powerful avenue than manual data entry, there's still the issue of how to generate the data set source file in the first place. DataNav supports a number of different data file formats, including legacy formats that were used by Phyplot, Figure Composer's predecessor. The most important file format is a custom binary file with a "table of contents" to support randomly accessing any given data set stored in the file. This file can "grow" to contain a very large number of sets; in fact, it was once used in the backing store for the data repository of a DataNav hub. Naturally, it is critical that we provide tools for reading and writing such files. Since the vast majority of analysis work in the Lisberger and other laboratories is done in Matlab, we have written and maintain two Matlab scripts for writing and reading DataNav data sets to and from a source file: putdatanavsrc() and gettdatanavsrc().

Another utility, put2fyp(), may be used to inject data sets prepared in Matlab directly into a figure file that was constructed previously in Figure Composer. This comes in handy when you want to re-use the same figure to display closely related sets of data, such as the results from several different repetitions of the same experiment. In fact, as of version 4.6.2, you can user put2fyp() to inject not only raw data, but also text content and even an entire graph.

However, given that most researchers use Matlab to analyze and process their experimental data, and even to prepare draft figures, the matfig2fyp() utility is probably the single most useful function in the DataNav support package for Matlab. This function will convert an open Matlab figure to an equivalent (to the extent possible) FypML figure. You no longer have to construct a figure from scratch in Figure Composer, and you don't even have to think about how to get data into that figure. Just prepare the draft figure in Matlab, call matfig2fyp() to convert it and save it as a .fyp file, then open that file in Figure Composer or DataNav Builder. [NOTE: As of Version 4.2.3, you may not even need to use matfig2fyp(). Instead, you can save your Matlab figure as a FIG file, then open it directly in Figure Composer!]

Finally, the dn_vibatch() function automates the process of preparing a view instance data batch file (*.dvb) which, in turn, is used by DataNav Builder to populate a data hub with instance data associated with a particular navigation view. See the Builder tutorial for a detailed explanation of how dn_vibatch() plays a role in constructing a large data hub.

These various Matlab tools are an indispensable part of the DataNav suite; for more information on them, see the subsection on Matlab Utilities.

"NaN" in data sets

Note that NaN is acceptable as a floating-point number in the raw data array of any DataNav-compatible data set. It is used to mark an ill-defined point in the data. When rendered in a DataNav figure, an ill-defined data point usually creates a "gap" in the rendering data trace.

Page updated

Report abuse