Navigation

GALEON Use Cases for Scientific Data Types

One interesting thing learned from GALEON (Geo-interface for Air, Land, Environement, Ocean NetCDF) Phase 1 is the value of a relatively simple interface that allows the client to specify a space time bounding box and a set of variables and get back the data valuse in a useful form such as geoTIFF or CF-netCDF.  This document is an overview of variations on such use cases for different scientific data types prevalent n the GALEON community.   GALEON has focused on the OGC (Open Geospatial Consortium) WCS (Web Coverage Service) specification, but it is not entirely clear that WCS is the best/proper interface for serving all these data types.  However, it does seem clear that the WCS ability to specify the data of interest in terms of a space time bounding box and a set of variables is an important and valuable use case characteristic that should be available for collections of non-gridded data.

Simple Space-time Bounding Box-Variable Data Request

By developing a Python scripting client libary for WCS 1.0 and WCS 1.1, Dominic Low made it convenient to develop rudimentary WCS clients that make use of the bounding box/variable data request.   This is illustrated in:


An illustration of  the application of this simple data request for a wide variety of data types is given in Airport Weather Use Case and Related Standards.  The idea underlying this catch-all use case is that a client wants all the atmospheric data available near an airport for a study of the behavior of storms in that region.
The data types involved illustrate many of the Scientific Data Types of the Unidata Common Data Model (CDM) and the Scientific Feature Types of the BADC Climate Science Modelling Language (CSML)
  • point data from lightning strike observations
  • "station" observations from fixed weather stations
  • vertical profiles from balloon soundings and wind profilers
  • trajectory data obtained from instruments onboard aircraft whichhave taken off and landed recently
  • volumetric scans from ground-based radars
  • visible, infrared, and water-vapor (and possibly other wavelength) satellite imagery 
  • gridded output from national or hemispheric weather forecasts (typically run at centers like NCEP and ECMWF) --sometimes used as boundary conditions for a higher-resolution local forecast model.
The "feature of interest" (O&M reference here?) can be defined in terms of a bounding box with three spatial dimensions and and time frame that encompasses a storm that passed through the airport.


Multiple Platforms Sampling the Atmosphere


Starting with Gridded Data

As noted in the Phase 1 Summary, GALEON focused initially on gridded data because WCS, in its current form, is restricted to gridded data.  For regularly spaced grids such as the output of weather forecast models, GOES satellite images, and gridded mosaics of radar observation products, WCS delivery of data encoded in CF-netCDF proved very useful for many clients.
.


Global Forecast System (GFS) 2.5 Degree Surface Temperature Forecast over South America.

While the straightforward interface and CF-netCDF encoding proved very useful in many use cases, a number of WCS limitations were brought to light.  Additional gridded dataset use cases and illustrations of WCS 1.0 limitations are given in Gridded Output from Weather Forecast Models.

Non-gridded Data in the Form of Station Observation Datasets

Comparing Gridded Forecast Output with Station Observation Data



Overlay of observed temperatures at weather stations on a
background image showing forecast temperatures for
the same area in space and time.

The numerical values are temperatures observed at reporting stations whereas the colored grid in the background represents the temperatures forecast by the North American Model (80km resolution) for the same time.  In terms of standard interfaces, one might argue that a client should be able to request the observational samplings at irregularly spaced points using the space time bounding box  that one uses to request the regularly gridded data from the weather forecast model. 

More examples of station/point dataset use cases can be found in Station Data Collections Via Standard Interfaces

Status


An extension to the CF (Climate and Forecast) Conventions has been proposed for station/point datasets encoded in netCDF.  Once this extension is adopted by the CF community, the CF-netCDF extension to WCS should apply to it with suitable modifications.  There are some fundamental issues to be confronted here because these station data collections are not the traditional grids that come to mind as coverages.  They definitely do not fit the regularly spaced definition of a WCS grid.  The usual way to provide Earth referencing coordinates is in a table that lists the station identifiers and their locations.  The observations are indexed by the station identifiers.  On the other hand, the collections of station data are valid discrete point coverages as defined by the ISO 19123 Coverage specification and it is often convenient to request collections of such observational data in terms of a space-time bounding box and a set of parameters (or fields or properties, e.g., temperature, pressure, wind speed and direction).   That aspect of the WCS coverage request would be useful for requesting such collections.

Note that the cf-pcmdi web pages referenced below at llnl.gov have a self-signed security certificate, so your browser may come up with dire security warnings for these URLs.  The information is there, but it is for you to decide whether to make security exceptions for these web pages.

Radar Data Collections as Coverages?

As noted above, one of the main "discoveries" of Phase I of GALEON was the value of the relatively simple WCS use case of specifying a bounding box and property of interest and getting back a dataset containing that data.  Whether or not collections of radar data of the sort shown below are technically "coverages" in the sense of OGC or ISO standards, it would be useful to be able to access such collections with an interface similar to that of WCS in which one specifies the spatial and temporal bounding box and a variable (or parameter or field or property, depending on which standard specification one is working with) and get back a data collection such as the one in the illustration.



Radar reflectivity from three radars showing a storm on the California coast

When viewed graphically, this collection or radar observations looks a lot like a coverage.  But the radar range rings do emphasized the fact that the specification of the location of individual observation points is a bit more complicated than it is for the case of a regularly gridded dataset.  More examples of collections of radar data are given at:



Status


At this point, there are no CF conventions proposed for collections of radar observations of this sort.  This is an important area for future work and it would benefit from taking advantage of the precedents being set in the realm of gridded datasets and collections of station/point datasets.

Other Data Types: E.g., Trajectories and Swaths


Among the most challenging data collections are those where the observing platform is actually moving relative to the Earth as the observations are taken.  Most commercial aircraft are instrumented to take atmospheric measurements along their flight paths.  These would be classified as trajectories.  And the question in the context of this document is how one would serve collections of such trajectories (say in the area near an airport) via standard protocols.

On the other hand, there are important cases where the observing platform is moving and possibly changing orientation (aircraft, ship, satellite) but onboard instruments are scanning.  The observations of such systems usually are classified as swaths and are among the most challenging in terms of determining the location of each observation point in terms of Earth-referenced coordinates.

<<<< GMU citation here. >>>

Alternative Protocols for Data Access

Because they are not regular grids, the observational data collections discussed in the preceding sections do not fit the coverage definition of the current WCS specification.  However, they do conform to the ISO 19123 definition of a coverage.  This issue is thoroughly addressed in:

S. Nativi, J. Caron, B. Domenico, L. Bigagli, "Unidata's Common Data Model Mapping to the ISO 19123 Data Model", Earth Sci Inform DOI 10.1007/s12145-008-0011-6, Springer-Verlag 2008.

But this begs the question of which OGC protocol is appropriate for delivering collections of non-gridded observational data.  Traditionally station/point data have been thought of as "features" to be delivered via a WFS.  In its current version however, WFS does not handle time varying data nor collections of such date. On the other hand, the SOS protocol is a natural for cases where one is looking or a stream of time-series information from a sensor or set of sensors, but is not very well adapted to the space-time bounding box query in its current definition.  Moreover, from the GALEON perspective, if the collection is encoded into a CF-conforming netCDF object, it would be a natural to deliver it in the same fashion as gridded CF-netCDF objects, so WCS is still quite attractive.  The bottom line is that none of the current OGC data access protocols are ideally suited to this type of request for the types of data collections discussed here.

The following sections outline work currently underway for delivering these observations via the WFS and SWE suite.

WFS

Andrew Woolf and his colleagues and the British Atmospheric Data Center have addressed the issue of "features vs. coverages" in:

http://epubs.cclrc.ac.uk/work-details?w=49662

and also in a presentation given at the OGC Interoperability Day in Boulder:

http://epubs.cclrc.ac.uk/work-details?w=41614


SWE/SOS

The OGC Ocean Sciences Interoperabilty Experiment is coming at the issues from the angle of the SWE (Sensor Web Enablement) and SOS (Sensor Observation Service.


WMS


For completeness, the WMS interface can be very useful for delivering data in the form of fully rendered maps.  <<< Cite efforts underway at the British Met Office?? >>>

Interacting Directly with Remote Data Collections Using Existing
Working (but not formally standard) Community Systems

To get a hands-on sense of how this can be made to work, it can be instructive to use tools that have been implemented for communities of practice that don't conform to formal, international standards.  These examples can then be seen as very concrete use cases for what the standards should enable.  Hopefully this provides a practical, bottom up, grounding for the discussion of abstract standard interfaces.

The examples in this section make use of a pure Java application called the Integrated Data Viewer which can be invoked via the Java Webstart facility.  So, if the reader has the latest version of Java Runtime Environment with Webstart installed, it is possible to download and bring up the IDV which will in turn access data from remote servers using protocols such as OPeNDAP, ADDE, and HTTP.  Note that, at the first invocation of one of these URLs, the process involves downloading the entire IDV application which takes a while.  In addition, for each invocation, the IDV goes out and downloads the needed data form a remote THREDDS Data Server which also takes time, but that's inherently a part of the remoted data access process.  Hopefully though, these exercises will provide a sense of what is now possible with community standard protocols and what should be possible with the final version of the formal standard protocol interfaces: