AQ Use Case 3: Publish, Harvest, and Query Metadata via Clearinghouse

Transverse Use Cases 1-10  are activity / capability  elements common to most SBA scenarios.  This is a workspace for the implementation of Use Case 3 specific to the Air Quality/Health SBA. The implementation of the UC components is accessible through the links.

Clearinghouse consolidation letter - need letter to policy level about clearinghouse consolidation

 Use Case Title

 Publish, Harvest, and Query Metadata


Use Case 3 intends to verify the Publishing, Harvesting and Discovering of metadata through the GEOSS Common Infrastructure (CGI) components GEOSS Registry and GEOSS Clearinghouse and the Air Quality Community Catalog.

Actors and Interfaces

  • AQ Community Catalog (Catalog Service Provider, CSR);
  • AQ Community Portal; General AQ Client
  • AQ Community Catalog <- GEOSS Clearinghouse(s); 
  • GEOSS Clearinghouse(s) <- AQ Portal & AQ Client

Initial Status and Preconditions

  1. AQ Community Catalog (AQComCat), populated with Service Records (as needed)
    1. AQ Data Access Service metadata developed (link to ISO dev page) iteratively as ISO 19115 (example)
    2. AQComCat creates Web Accessible Folder (WAF) for harvesting by Clearinghouse 
    3. Data providers/mediators prepare (link to ISO prep options/tools page) ISO 119115 metadata records and through ISO 19139 validator submit registration to AQComCat , 
  2. AQComCat is registered  (AQ Use Case 1 - Register Resources) in GEOSS Registry as Catalog (done


  1. By GEOSS protocol, Clearinghouse queries GEOSS Registries for registered catalog comps. (done
  2. Clearinghouse extracts from GEOSS Registry record for AQComCat harvest policies  (not done, TBD...  - needed?)
  3. Clearinghouse access AQ WAF and harvests all or part of the available metadata holdings 
    1. AQComCat permits harvesting (WAF open - no access protocol - needed?)
    2. Clearinghouse harvests CSR holdings ( done by 3 Clerainghouses but ad hoc = need harvest protocol! now added in GEOSS Service Registration)
    3. Harvest to be repeated periodically (based on GEOSS CSR Registration) 
  4. From the harvests, Clearinghouse extracts discovery metadata records (desired for AQ Community in marun?)
    1. Use discovery metadata to filter clearinghouse contents - (see Use Case 4 for using discovery metadata to filter)
    2. For each 'Discovered' record, returns discovery metadata + link to full submitted metadata ??? (ESRI and USGS return the a link to full submitted metadata record

 Post Condition

 The GEOSS Clearinghouse is prepared to accept and process resource discovery queries from Clearinghouse clients and return desired metadata records. Clearinghouse has a service interface (API) for machine communication  ???? 


Whats New: 
2008-04-01??: WAF with ISO 19115 Metadata (see spreadsheet below for all fields in record) available through GEOSS ESRI Clearinghouse , GEOSS USGS Clearinghouse and harvested with issues available through Compusult for all details on the harvest see crosswalk between metadata and clearinghouses
2008-04-04: AQ UC3 Status Report for 2008-04-08 telecon:

AQ Community Catalog Core Elements of ISO 19115. To edit this table go to Google Docs:
  1. The AQ Community Catalog is using GetCapabilities documents to automatically generate part of the ISO 19115 metadata record (white fields)
  2. Fields related to metadata type, etc. will be automatically filled in (light green fields)
  3. Fields that must be entered by hand, not found in the OGC doc (Orange fields)
  4. Fields that could be captured in OGC GetCap, but need to establish a convention (blue)
INSPIRES ISO 19115 is also compared to the core metadata and it is found to have a few more fields mainly about data usage/access constraints and provider information. This is another convention decision that needs to be made. 


Goal for AQ Implementation of Use Case 3: To allow data providers to register their data in AQ community catalog that has a catalog service (CSW 2.0.2) registered in GEOSS Service Registry. The registered community catalog will then be harvested by GEOSS Clearinghouse and AQ Community Portal making data discoverable through the Clearinghouse.

Process completed so far: 
1. Create valid ISO 19115 metadata for dataset with ISO 19115 Core elements (done) 
To create the initial valid ISO 19115 doc (template):  
- created metadata record for each dataset parameter. 
- GetCapabilities for: capabilities URL, dataset name, bbox, time range, provider abbr., distributor contact
- used start time for dataset creation date  
- Used Dataspaces for Abstract and cut part of abstract into lineage statement if it told where dataset came from. 
- Fixed metadata contact
- Fixed keywords and topic categories for AQ. 
2. Make several ISO 19115 records by hand using the valid templates. (Done)
3. Put in the WAF (Done)
4. Register WAF as component in GEOSS Registry and registered WAF as service underneath the component (Done) 
* In updated CSR include how often harvesting should be done. and harvesters will harvest on a schedule.
5a. Alert GEO Portals that the WAF is registerd (Done)
6a. Test GEO Portal Harvest (Done)
7a. Access Data access services through GEO Portals. 
8a. Access raw metadata through GEO Portal RSS Feed (done)

Tasks for ISO Metadata: 
1. Restructure catalog records in WAF, so that one record describes dataset with many parameters encoded. 
- Currently we point to the getcapabilities, but the user doesn't know which 
2. Restructure OGC GetCapabilities docs so that each GetCapabailities is for one dataset and can be used to automatically create the ISO 19115. 
3. Find additional fields not in GetCapabilities that need to be entered by the provider to register in com. cat - if any. 

CSW Interface:
CSW 2.0.1:
All metadata created by Erin through the WAF are ingested and hosted by the CSW instance
Simple CSW 2.0.1 client is developed and will be deployed to the AQ portal and will be ready for test by next telecon.
Upgrade from 2.0.1 to 2.0.2 is done initially and under test, will update when done.
Finished 2.0.2 CSW will be registered into the GEOSS clearinhouse.

Tasks for CSW Interface:
- CSW 2.0.2 will be integrated with the Semantic search.
- All catalogs connected with ESIP portal will be added to the semantic search within AQ portal
- Air Quality related semantic search will be added.

2. Register CSW service in GEOSS Service Registry and remove the WAF component
3. Test harvest and access of data access services through the GEO Portals 

parallel tasks: 
- Automatically generate DataFed records once we have a valid ISO 19115/19119 schema for discovery
- With ISO 19115/19119 template for discovery, we can work with GIOVANNI/others? to create additional metadata records for WAF
- Continue to add more metadata to the ISO records about usage, lineage etc. 
- Need to figure out how dataspaces feeding in User-added content to ISO records and ISO feeding content to the wiki

Implementation Background:
CSW 2.0.2 Common Returnable but not queryable
Table 8. Mapping to Common Returnable Properties - Pg. 33 CSW 2.0.2 OGC Core Queryable Prop Table 6. OGC Core Queryable Properties - Pg. 30 CSW 2.0.2 Additional Queryable for Info Sources Table 10. Additional Queryable Properties Common to all Information Resources - Pg. 35 CSW 2.0.2 ISO 19115 Queryable&Returnable for Datasets Table 11. Additional Queryable Properties for Datasets, Dataset Collections and Applications - Pg. 35GEO Portals: The three GEO Portals and the GEOSS Service Registry  were compared by the AQ WG and found to query the yellow fields with Y's. 
 CSW 2.0.2: The GEOSS Architecture Implementation Pilot CFP describes the community catalog records and the catalog interface to the GEOSS Clearinghouse in Annex B, Architecture.  It says that ISO 19115 is recommended geospatial standard in the GEOSS 10-Yr Plan and that it will be listed in the GEOSS Standards Registry as a comprehensive geospatial metadata standard (Pg. 67). The Pilot also identifies that this second phase Pilot should: "focus [on] common metadata (core queryables and common responses) in the 2nd Phase of AIP is recommended to be the OGC CSW:Record which in turn is based on the Dublin Core metadata with a handful of extension elements' that support greater search/return by GEOSS communities. One of those 'profiles' is the CSW ISO 19115 Profile. The CSW core queryable fields for a dataset plus those needed for ISO19115 Profile are the white column in the table (ISO 19115/19119-Appl Profile CSW  2.0 pg. 30, tables 6, 10, 11). 

The AQ Community uses standard OGC data access services, WMS and WCS, so we plan to build our catalog by harvesting the metadata already included in the WMS/WCS GetCapabilities documents and then including the addtional queryable fields needed. KML is included in the green columns because the KML could be extended to hold more metadata and those files could also be harvested by the community catalog. 



GetRecords Example:

Schema - Return records 

Other Catalogs GetCapabilities(Change DescribeRecords to GetCapabilities in links):

DescribeRecords examples from CSW 2.0.2:

Add tutorial to guide through process.



Discussion: Use space below for discussion on this topic. New threads are at the top. 

From Erin (1/19/09): Hi All, The AQ WG is creating a Community Catalog in order to register the metadata for AQ datasets and make the datasets discoverable and accessible through the GEOSS clearinghouse. The Community Catalog will have a CSW 2.0.2 interface.  We have started a workspace page on the google sites to outline the process of: creating the community catalog to ultimately discovering and accessing datasets through clearinghouse. We are going to start our catalog using the CSW record (from Record.xsd ) that is common to all CSW 2.0.2 implementations (neon green cells in image). I saw that this is what ESRI is using now. Then we'll test the access of our datasets through the clearinghouse/GEO Portals. After that we will expand our catalog to include the other fields in the attached image for the ISO 19115 Profile in CSW 2.0.2. From the CCRM perspective do you all see any issues with this method? With the catalog fields the next step for us is to implement the service - I think we can use the CSW2.0.2 GetCapabilities and DescribeRecords from Marten as templates, since our catalog will have the same fields. Marten, could you or someone else point me to what the GetRecords and return for that would look like? Is there anything that we are missing? 
From Marten: hi, are you going to build a catalog service from scratch? or implementing available software? there are many aspects of CSW to account for that a single sample won't cover. I suggest to setup a conference call this week. would that work? I can host a webcast where I show some sample queries/responses
From Erin: Marten, other than ESRI GIS Portal Toolkit is there free software that we could implement? I looked at Geonetwork and it looked like they only support CSW 2.0.1. Our programmer is looking at the CSW today - he may be interested in a webcast/telecon later this week. I'll check. Thanks for the offer. I'm asking about the open source so that this can be more reusable. Thanks again!

From Archie (1/14/09)I've added to my status page for the FGDC Clearinghouse on the GEOSS Pilot web site.  It includes a summary table of our harvesting status for the catalog sites listed in the Component and Services Registry.

From Erin Robinson: Hi Archie -Thanks so much for exposing what you are harvesting and the harvest is ok. This will be really helpful for AQ b/c we can look at these sites that are ok as examples of catalog services. We are going to implement a CSW 2.0.2 for our AQ Community Catalog as part of Use Case 3 of the pilot. For the listed CSW 2.0.X could you link to the get capabilities and describe record xmls? 
From Archie: I wanted to shorten the links.  The GetCapabilities link should always be the same - just add "?SERVICE=CSW&REQUEST=GetCapabilities" Sometimes there seem to be problems with case-sensitivity, but I claim those are bugs.  The spec clearly gives the above.  I thought about making the URLs links - I suppose I could go ahead and do that.  Think it would be useful from the context of the spreadsheet? The DescribeRecord link ought to be derivable from the Capabilities response.  I'm not tracking those anyway since we don't really use the DescribeRecord response contents - what we're extracting to our internal database is pretty modest.
From Erin Robinson: Maybe just a note on the page that if you want to see get capabilities add "?SERVICE=CSW&REQUEST=GetCapabilities" to the links below. The few links I tried didn't return anything or returned an error (JAXA) and I think they'd return capabilities if they had the extension. 
From Archie: OK.  If adding the string to the URL doesn't work in a particular case,I should probably add the URL that does work anyhow.
From Erin Robinson:For the AQ WG implementation of Use case 3 , I have been working on a comparison of queries by each of the GEO Portals and the CSW 2.0.2. required fields. Would you mind looking at this comparison and see if FGDC portal query is right - to create the table, I went to the FGDC portal and marked what I could search on as the queryable fields? Are the fields you harvest for the FGDC portal from the CSW2.0.x catalogs extracted from the describe record.xml?
From Archie: Probably.  I haven't written the complete ISO record parser yet except for a pretty simplistic handling of some WMS records I get from GOS. We're really trying to do the bare minimum handling we can get away with.  But no less.  Most of the records harvested in right now are from the CSR, or are FGDC records obtained via Z39.50.

From Erin Robinson(1/9/09): Hello AQ WG and CCRM WG, The AQ WG has been working on development of a community catalog that will allow AQ datasets provided by the community to be accessed through the GEOSS Clearinghouse. Use Case 3 for AIP-2 is designed to publish, harvest and query metadata systems so that component and services are discoverable through GEOSS Clearinghouse. This task seems to match the AQ Community Catalog goal. I have posted an initial method and suggested set of minimum fields for the AQ Community Catalog to the AIP-2 Google Site as a way to work on this AQ implementation of use case 3. It needs community participation - both from the technical perspective of the CCRM WG as well as the AQ WG. From the CCRM, I think it would be helpful to have instructions/examples to implement the CSW 2.0.2 service. From the AQ WG perspective we need to agree on a common set of catalog fields that we will use to search and harvest data. I'd like to suggest that we use the comment part of this page to respond in order to capture the process that we will be going through to implement use case 3. I've posted this as a comment to get things started.

From Marten Hogeweg: As it happens, I just posted a message on the CSW 2.0.2 interface we have running on our GEOSS Portal/Clearinghouse. The main entry point for this is the GetCapabilities request:

 what you may be more directly interested in is the structure of the records we return as included here:

This CSW  service is part of the GIS Portal Toolkit. I’m not sure what your approach is to provide CSW, but you may want to consider using GIS Portal Toolkit.

From Erin Robinson: Marten, We aren't exactly sure what our approach to provide CSW will be. We had thought we'd create a sql database and then provide the service interface on top of that. Would this work? Would the GIS Portal Toolkit provide an easier method? Also, I noticed the record response didn't include time of the measurement. Could we extend the record return to include temporal extent? 

From Marten: If you choose to use GIS Portal Toolkit (GPT) then you won’t have to do any coding. It comes with its own CSW interface. I checked and saw that WSU already has a license for GPT (since 2006) and that WSU has a Educational site license to ESRI software. I checked my records and found Scott Horn ( took a training class in November 2007. If Scott is involved he may already have GPT 9.3 setup, otherwise we can arrange for a license. GPT 9.3 has a no-cost license. GPT can run with Oracle, SQLServer, and PostgreSQL databases. More info on If you register with the ESRI GEOSS portal and send me your username, I can give you publishing permissions. That will give you a good idea of what you can do with GPT.

From Zhenlong Li:  Some changes are done to AQ Portal in Tools tab. 
                                 1. Re-built the Time-enabled WMS Viewer portlet to fit the portal.
                                            2. Trim the boundary of the semantic search tool:
                                            GEOSS site link:

Erin Robinson,
Jan 19, 2009, 6:22 PM
Erin Robinson,
Jan 30, 2009, 11:27 AM
Erin Robinson,
Jan 30, 2009, 11:15 AM