July 9 Clearinghouse Ad Hoc

posted Jul 7, 2010, 5:24 AM by Joshua Lieberman   [ updated Jul 9, 2010, 8:22 AM ]

Clearinghouse issues and roadmap vis-a-vis AIP-3, e.g.
    Harvesting protocol and parameters for community catalog
Support for GEOSS Common Record & ISO AP
   Role of brokers and query expansion / vocabulary mapping
   Links between data, services, and applications
   Clearinghouse harvesting and update policy
   Clearinghouse synchronization with GEOSS Portals


Josh Lieberman, Archie Warnock, Joan Maso, Lionel Menard, Marten Hogeweg, Yuqi Bai, Mattia Santoro, Steve Browdy, Erin Robinson, Roberto Lucchi,


Email from Doug:
Through the AIP-3 project in GEOSS, and prior projects, a set of requirements or concerns have been raised about what the common queryables across remote catalogs would be, and what can be reliably found through the harvest mechanism. Most of this predates the deployment of the GeoNetwork implementation of Clearinghouse which converts all remote metadata into ISO metadata using developed or included XSLT stylesheets. This provides a more stable query target with more consistency of content and presentation. This also provides a greater number of queryable properties than the small set provided by other clearinghouses using only a CSW 2.0.2 'baseline' capability per csw:Record results. The current GEOSS Clearinghouse implementation, using GeoNetwork software, provides an ISO (Metadata) Application Profile version of CSW and will return verbose metadata in ISO form in response to a "Full" record request. Unless the original metadata was ISO, this "Full" record will not be formatted as its original representation -- we do not know if this will be an issue or just should be documented. I would suggest that the notion of a "GEOSS Common Record" is now better defined in practice, though additional requirements may still be a point of research and evaluation. We should discuss what the realistic core requirements are for GEOSS in such a record as part of the discovery use case.

The GEOSS Clearinghouse is harvesting from the following protocols:
Z39.50 "GeoProfile"  (manually-initiated, under testing)
CSW 2.0.2 baseline, AP ISO, ebRIM with no extensions
WebDAV, sitemaps, and Web Accessible Folders
OGC GetCapabilities (WMS, WFS, WCS) endpoints
Local file access for batch ingest of packaged, static metadata
Also available to the system, are the following protocols:
ISO 23950 "SRU"
GeoNetwork "native"
As for formats, the following "Full" metadata formats are recognized and parsed into ISO 19139 metadata for ingest:
ISO 19115, 19119 XML (no transform, per 19139)
ebRIM common information model XML, including the CSR form
CSW csw:Record XML
OGC GetCapabilities XML
The two remaining topics for the telecon are of interest to the Clearinghouse team: 1) the role of brokers and query expansion or term mapping, and 2) links between data, services, and applications. GeoNetwork provides a means to add vocabularies as SKOS structures that can be used to browse categories, but there is no automatic mapping of entered terms to preferred terms done within the system. Some logical mapping to categories such as SBA or ISO Topic could be done to assist in browsing, but a more intelligent means to index and access information using ontologies/vocabularies would be desirable. As for links between data, services, and applications, we have only what providers make available as metadata. It is rare that publishers encode the associations (that are permitted in ISO metadata and supported in CSW) between such artifacts. GeoNetwork does not yet exploit the concept of 'related records' though this could be the subject of applied research into interpreting, building, and traversing trees of associations based on discovery requirements using RDF or similar technology.

About coupling metadata records and link to a client take a look at this metadata detailed file at EnerGEO portal: http://energeo.researchs ​tudio.at/geoportal/catal​ og/search/viewMetadataDe​ tails.page?uuid=%7BDB153�-6F0B-406A-81B8-67349�B28%7D

Doug: Useful action to look at clearinghouse capabilities for queryable properties to compare with GEOSS Common Record term proposals.

Josh: Use Clearinghouse mailing list to discuss e2e mappings and proposed additional queryables to support them.

Marten: how are new harvest mappings implemented?

Doug: largely “organic” right now

Josh: action to start documenting mappings, queryables, and harvesting experiences in Sites and develop a methodology for managing the e2e publish - harvest -query connection.

Lionel: EnerGEO portal is looking at metadata records and client links, working with SSR. The view above shows one metadata file containing the client link.

Josh: This is one side of possible link discovery, assuming that the service is the starting point.

Doug: different ways of putting links in.

Josh: 3 ways of returning links: hard links, dynamic links, or query to links. How to support queries for appropriate linked resources. Oh, and a fourth way: 3rd part recommendations for working linkages.

Steve: concern for level of effort to support “social network” discovery versus more formal semantic interoperability.

Doug: let’s continue to experiment, find simple working approaches. Need to examine both approaches.

Steve: Data Harmonization WG is interested in issues such as semantic annotation and may provide some attention to this experimentation.

Mattia: EuroGEOSS broker connection with Clearinghouse:
Enhanced OpenSearch  / SPARQL endpoint can accept semantically enhanced queries (e.g. SKOS concepts). (Expanded) results can then be federated manually / automatically to CSW AP catalogs such as the Clearinghouse.

Doug: How are keywords exposed to a user? Mattia: visible usage where related concepts are shown as (intermediate) results of queries. Tool also to visualize keyword graphs returned by queries.

Josh: Three (at least) issues:
    1) Logistics of exposing broker client and federating to Clearinghouse
    2) Loading terms into the broker repository
    3) Describing registered / harvested resources using the available terms and allocating them to the correct queryables. Might have to apply the expanded keywords to AnyText instead of keywords.

Josh: Action to try out federation to the Clearinghouse for next Friday and do a demo at that time.

Doug: In longer term, may add the term “metasearch” augmentation of OpenSearch to the catalog / clearinghouse capabilities itself.

Josh: segue to harvesting (keyword harvesting will be a needed process eventually).

Doug: catalogs are registered with a harvesting parameter, e.g. once / periodically / as updated / only distribute queries. This field for harvesting control is part of CSR registration for metadata sources. Separate groups to review CSR registrations and Clearinghouse harvesting choices.

Lionel: how are old records, old metadata sources eliminated? It seems that records from removed WAF’s are not being removed. Newly registered metadata sources are not showing up yet.

Marten: ESRI Portal both searches the Clearinghouse and Lionel’s catalog directly.

Josh: Issues: 1) Are deleted CSR registrations propagated into removing harvested records and 2) is there any disambiguation when the same resource can be harvested from multiple metadata sources?

Marten: If records are actually removed from the WAF, then the harvested records will be removed in ESRI clearinghouse. Removal of the WAF registration doesn’t seem to have this effect, though.

Doug: immediate issue is that EnerGEO CSW has not been set up for harvesting. Someone with admin privileges needs to do that.

Action: Doug to send the Clearinghouse CSW endpoint to Mattia and/or add to site.


Site: http://sites.google.com/a/aip3.ogcnetwork.net/home/home/end2engineering-2/e2e-telecons/june29telecon
Local time: http://timeanddate.com/worldclock/fixedtime.html?month=7&day=9&year=2010&hour=14&min=0&sec=0&p1=0
Dimdim & Telecon Info: http://my.dimdim.com/joshlieberman /
Telecon Access Code: 719290# # Telecon: Conference Dial-in Number: * USA (712) 432-1600 * Austria: 0820 4000 1552 * Belgium: 070 35 9974 * France: 0826 100 256 * Germany: 01805 00 76 09 * Ireland: 0818 270 021 * Italy: 848 390 156 * Netherlands: 0870 001 920 * Spain: 902 886025 * Switzerland: 0848 560 179 * UK: 0844 58 191 02