Navigation

Recent site activity

AQ Use Case 4: Client Search of Metadata

https://6283588568219177751-a-1802744773732722657-s-sites.googlegroups.com/site/geosspilot2/air-quality-and-health-working-group/aq-documents/ESRIGEOPortal-satellitesearch.pdf?attredirects=0&auth=ANoY7cpM6IK6k4An97GcebNpbLkS5nTuz1cvXmtAwOJ01nda0Olyl2uQzkNKgKkoy-ihLlX41NUfmHedIuxR8UD4kpF1mMyzSoApFruz1au4-BP8g1NDBxOabnyYNC9bTHebVQ6eEAqaVN7rojwT7KtBWCrpom1KL2I6MRwjh0jVnHFMDDnrz1stuTNVWHRbqtRkjktfyREC7RW2-U5w6nIV47L-rwpE-5Mfdr6zoBApojA0QG4XpjPK85tjpdwXj6BPZ5TYuvbBPSxi4I7SSHKLCU78M9jSqFZX1LXlJDjv0Ztx7WOAVGg%3DWTransverse Use Cases 1-10  are activity / capability  elements common to most SBA scenarios.  This is a workspace for the implementation of this Use Case is specific to the Air Quality/Health SBA. The implementation of the UC components is accessible through the links.

 Title

 Client Search Metadata

 Overview

 This use case describes the conditions and steps for portals and application clients to support the GEOSS user in searching the GEOSS Clearinghouses and Community Catalogs. This use case is a pre-condition to the Use Case 5 Presentation & Alerts.

 Actors and Interfaces


Actors:
(1) Client Apps: Portals, WebApps; DeskApps; (2) AQ Community Catalog
Interfaces: Client Apps <- GEOSS Clearinghouse; GEOSS Clearinghouse <- AQ Community Catalog

Initial Status and Preconditions


1. GEOSS User is looking for information of value to task at hand
2. Client applications (AQ Portal and WebApps) are available to query GEOSS Clearinghouse ("done")
3. GEOSS Clearinghouses is successfully populated with metadata from GEOSS Registry and harvested from AQ ComCat (not automatic)
4. GEOSS Clearinghouses have API or CSW 2.0.2 interface for Client Access (ESRI, USGS both API and CSW, Compusult CSW only)

Evolution


  1. Client apps have query interface (CSW 2.0.2. or OpenSearch API.) to the GEOSS Clearinghouse (TBD)
  2. Client apps presents user with search criteria based on queryable properties of selected catalogs
    1. Simple Keyword and Area of Interest/box search
    2. Advanced parameter searches (Organization, Time ...; Region/place of interest; Community Catalogs to be searched, SBA; Keyword in abstract, title, full text; Resource type (e.g. service, workflow, document, client app, portlet, alert, etc); Other
    3. More specific Earth-Observation criteria ( Row/path; Collection; Cloud cover; Subsetting; Ordering/delivery)
    4. Community-specific search (Thesarus, Cluster & Pattern matching)
    5. Current Clearinghouse supports: Title;Record type;Full text;Abstract; Identifier; Bbox; Return format (not really, not all)
  1. The resultSet is returned and presented to the user with options to
    • Display total number of results
    • Display basic information about each result (e.g. thumbnail, title, abstract, organization, etc)
    • Expand results to display additional information (e.g. full metadata record) from Community Catalog
    • Group or sort results in categories (e.g. by type of resource, by SBA, etc)
    • Select resources of interest for evaluation and/or use.


 Post Condition

The Client application has retrieved the necessary metadata to present the GEOSS User with information on discovered resources matching the user search criteria for further evaluation and/or use.



ToDos:
  • API to query Clearinghouse for list of Catalog Components
  • Clearinghouse Client software to cascaded search


User perspective of searching for data via the GEO and Community portals
Test searches and sucesses/limitations with Portals, Clearinghouses and/or ISO metadata content.
Green indicates a search that returns results as desired; Yellow indicates a search that returns results but not as desired; Red indicates a failed search either due to the limitations with the search client or due to limitations in the available metadata.
Test searches for: Compusult Portal ESA Portal ESRI Portal ESRI Clearinghouse API - CSW ESRI Clearinghouse API -REST USGS Clearinghouse API ISO metadata Semantic Search within AQ Portal
Services from the AQ Catalog Air Quality Catalog
(0 record) [Note: The portal goes blank if search the keyword as a Phrase, i.e. using double quotes or single quotes around it.]

 

Air Quality Catalog"

(0 record)

 

"Air Quality Catalog"

(0 record)
    http://clearinghouse.awcubed.com/cgi-bin/ch_query?cmd=search&count=20&startIndex=0&keyword=
&title=&type=
&collection_name=Air%20Quality%20Community
%20Catalog&rawtext=&abstract

=&identifier=&bbox=&format=
atom&.submit.x=72&.submit.y=8
* catalog info included in metadata distributor section
* also include tag for Air Quality Community Catalog in keywords
 
Data service related to air quality Air Quality

(10 records)


[Note: The portal goes blank if search the keyword as a Phrase, i.e. using double quotes or single quotes around it.]
"Air Quality"(1of2)

"Air Quality" (2of2)

(total 14 records)
"Air Quality"

(8 records)
  http://geoss.esri.com/geoportal/rest/find/document?searchText=%22air%20quality%22
&contains=exact&f=georss
  fails because 'air quality' is not in ISO documents
* ISO includes topic categories that all portals search on - Atmosphere and Climatic
 
Data services from OMI sensor OMI

(0 record)
OMI
(1 record)
OMI
(3 records)
  http://geoss.esri.com/geoportal/rest/find/document?searchText=title:omi&contains=exact&f=georss

http://geoss.esri.com/geoportal/rest/find/document?searchText=omi&contains=exact&f=georss

  No OMI Services registered in AQ Com Cat right now  
Data from the AIRNow monitoring network AIRNow

(6 records)
AIRNow

(2 records)
AIRNow

(4 records)
  http://geoss.esri.com/geoportal/rest/find/document?searchText=title:airnow&contains=exact&f=georss

http://geoss.esri.com/geoportal/rest/find/document?searchText=airnow&contains=exact&f=georss

http://geoss.esri.com/geoportal/rest/find/document?searchText=abstract:airnow&contains=exact&f=georss

http://clearinghouse.awcubed.com/cgi-bin/ch_query?cmd=search&count=20&startIndex=0&keyword=
&title=AIRNOW&
type=&collection_name=&rawtext=&abstract=
&identifier=&bbox=&format=HTML
&.submit.x=72&.submit.y=8
   
CALPUFF smoke forecast model output CALPUFF

(0 record)
CALPUFF

(0 record)
CALPUFF (1 record) Get Record by ID CSW Service

http://geoss.esri.com/geoportal/csw202/discovery?request=GetRecordByID&service=
CSW&version=2.0.2&
id={9973F01C-0C13-45E2-BBA2-243AFB554987}
http://geoss.esri.com/geoportal/rest/find/document?searchText=title:CALPUFF&contains=exact&f=georss http://clearinghouse.awcubed.com/cgi-bin/ch_query?cmd=search&count=20&startIndex=0&keyword=
&title=NGC_CALPUFF&type=
&collection_name=&rawtext=&abstract=
&identifier=&bbox=&format=HTML
&.submit.x=72&.submit.y=8
   
WMS records  WMS         (224 records)
 no found.
 WMS        (42 records)
     http://clearinghouse.awcubed.com/cgi-bin/ch_query?cmd=search&count=20&startIndex=0&keyword=
&title=&type=&collection_name=
&rawtext=WebMapService&abstract=&identifier=
&bbox=&format=HTML&.submit.x=0&.submit.y=0
   
 WMS air
 WMS air     (4 records)
 no found.
 WMS air   (76 records)
     
WCS Records   WCS         (127 records)
 no found.
 WCS         (23 records)
      http://clearinghouse.awcubed.com/cgi-bin/ch_query?cmd=search&count=20&startIndex=0&keyword=
&title=&type=&collection_name=
&rawtext=WebCoverageService&abstract=&identifier=
&bbox=&format=HTML&.submit.x=0&.submit.y=0
   
 WCS air
 WCS air    (2 records)
 no found.
 WCS air   (55 records)
     
 All Services and Components registered available through Clearinghouse            http://clearinghouse.awcubed.com/cgi-bin/ch_query?cmd=search&count=20&startIndex=0&keyword=&title=&
type=&collection_name=&rawtext=c431cdd2204f
&abstract=&identifier=&bbox=&format=HTML
&.submit.x=0&.submit.y=0
   
 Community Catalog
no found.
 "Community Catalog"          (2 records)  no found.      http://clearinghouse.awcubed.com/cgi-bin/ch_query?cmd=search&count=20&startIndex=0&keyword=&title=&type=&collection_name=&rawtext=%22community%20catalog%22&abstract=&identifier=&bbox=&format=HTML&.submit.x=0&.submit.y=0    
 Community Portal
no found.
 "Community Portal"             (2 records)  no found.          
 Wildfires "Wildfire"   (4 records)
 "Wildfires"    (16 records)  "Wildfire"  (3 records)          
Smoke  "Smoke"    (4 records)
 "Smoke"        (3 records)   "Smoke"   (5 records)          
Particulate matter
 "Particulate matter"        (2 records)
 no found.  "Particulate matter"       (2 records)
         
 air quality monitoring networks
 "monitoring network"      (2 records)

 "mornitoring"  (1033 records)

"networks"      (7 records)

 "monitoring network"     (4 records)          
 air quality and satellites
 no found.
 "satelite"     (4 records) "satellite"  (86 records)          
 Wildfires in Southern California in late October 2007
 no found.
 no found.  "Southern California"  (2 records)          
 Smoke in Southern California in late October 2007  no found.
 Smoke in Southern California     (1388 records)   "Southern California"  (2 records)          



1) Search the GEOSS Clearinghouses for services that were harvested via the AQ Community Catalog. The AQ Community Catalog was registered in the GEOSS Component and Service Registry as a Catalog WAF. The Clearinghouse finds catalogs through the registry. 

1a) Search the GEO Portals (Compusult, ESA, ESRI) for initial discovery queries. 
Each GEO portal has different query and return capabilities. A comparison of the GEO portals was done in order to find the minimum set of fields needed to include in the metadata in order to be searched by each of the portals. Below is a table that illustrates the mapping between CSW:record, our metadata and USGS, ESRI Clearinghouse Search/Return 


Metadata Mapping to Clearinghouse

 
1b) Search the GEOSS Clearinghouses (ESRI, FGDC, Compusult) from the AQ Community Portal for a more specific AQ search. 
  • Currently able to search ESRI clearinghouse via REST interface
    • user enters search term
    • example query:  http://geoss.esri.com/geoportal/rest/find/document?searchText=OMI&contains=exact&max=20
    • REST api documentation :AQ Community Portal
    • query can be run with following qualifiers (per ESRI):
      • uuid
      • fileIdentifier
      • dateModified
      • geometry
      • anytext
      •  title
      • abstract
      • keywords
      • body
      • contentType
      • dataTheme
    • Evaluating best method to find AQ services
    • Results returned as RSS (HTML also available)
    • Link to metadata document included in RSS results
      • Considering making secondary query of metadata document
    • Need to add bbox query
  • Currently researching the API for other two clearinghouses (FGCC, Compusult)
    • Would like to compare results of same search across all three clearinghouses
  • Look as ESRI CSW interface


1c) Identify successes and limitations in GEOSS Clearinghouses metadata searching (both in metadata stored in GEOSS registry and in search tools)
  • Does it make sense for AQ Community Portal to search all three clearinghouses?
    • should we evaluate all three then pick best one?
    • if searching all three, how to maintain usability with large search result sets?
      • if it's too slow will anyone want to use it?
  • Can the portal access the full metadata record and search over the record? For example if AQ has an instrument or platform field in the ISO record, could we find those records and then perform an additional search on instrument? 
2) Identify air quality specific/customized metadata searches that might be appropriate from the AQ Community Portal
  • Are there metadata elements used by the air quality community (and not commonly used across other communities) that are not easily searched using the GEOSS Portals and GEOSS Clearinhouse APIs that would require air quality specific searches (for example, through the AQ Community Portal)? 
Modified by Gianni Sotis 20090210:
Why there is here this split between GEO Portal and CRH?
The CRH will be accessed through the GEO Portal.
At least the word "directly" could be removed.
Reponse by Stefan Falke 20090210
Removed 'directly' and replaced with 'AQ Community Portal'
Also moved this discussion to Comments section of page below


Semantic Search

Semantic search is an emerging technology that utilizes Semantic Web technologies, such as ontology & reference engines, to solve some problems that can not be solved in traditional keyword-based search. For example, (1) can't support the use of domain knowledge, 2) can't support scientific-understanding in query, and 3) can't support intelligent reasoning and semantic modeling accumulated in scientific domains. Through reasoning on a highly interconnected network of data and ontologies, semantic search engines are supposed to understand user’s query on both syntax and semantic level and is able to exploit more scientifically relevant answers to users’ queries.

Towards solving the problems in traditional search, semantic search is usually tied with a specific science domain, such as Earth science and Air Quality. We have been working closely with domain experts in water cycle communities to build a knowledge base using ontology-represented knowledge within domain search for projects such as WECHO (http://eie.esipfed.org/c/portal/layout?p_l_id=PUB.1.428). Based on the ontology, we conduct research to develop rule-based inference algorithms based on existing reasoning tools to provide semantic navigation for end users. A prototype which employs the above techniques is made available through the ESIP portal (eie.esipfed.org).
The semantic search engine also expands the connection to multiple major Earth science catalogs, such as GOS, NCDC, GCMD, ECHO, and ESG.

In this AIP work, we are working together to improve the semantic search engine to combine both reasoning and initial ranking algorithms to sort the searching results from the AQ catalog (WAF, Z39.50, & CSW) and other connected catalogs for Air Quality resources. The target is to 1) develop a simple CSW client for the AQ portal hosted by Stefan, 2) improve the resource volume by add other catalogs, and 3) add the semantic search capabilities developed previously to the simple client. The objective is to improve the searching experience for Air Quality community by leveraging knowledge gained through Air Quality communities. The knowledge base will be expaned through close collaboration with Air Quality domain experts using SWEET ontology framework and existing ontologies accumulated through the ESIP portal effort.

 

ESRI API 

Hmmm... Not really in our case. Below a list of elements supported in our REST interface:

 

Base URL: http://geoss.esri.com/geoportal/rest/find/document

 

Tweak the search by using one or more of the following parameters. Combinations are ‘and-ed’ together:

 

bbox - bounding box; defines bounding box of the query - each record found has to have it's envelope completly enclosed within defined bounding box. Bounding box is defined by two pairs of coordinates representing west-south, and east-north corner of the envelope separated by coma (,). Each corner is defined by pair of values as longitude-lattitude separated by coma (,). Default: -180,-90,180,90 (entire World).

Example: .../rest/find/document?bbox=0,0,20,30

 

spatialRel - spatial relationship. Possible values: esriSpatialRelWithin (metadata envelope has to completly fit within request bounding box), esriSpatialRelOverlaps (metadata envelope has at least overlap request bounding box). Default: esriSpatialRelWithin. Used in conjunction with bbox parameter.

Example: .../rest/find/document?spatialRel=esriSpatialRelOverlaps

 

searchText - text to be searched within metadata.

Example: .../rest/find/document?searchText=Congo

 

contains - true to search for any word specified in searchText attribute, false to perform exact search. Used in conjunction with searchText parameter. Also accepts any of the values defined in ISearchFilterKeyword.KeySearchTextOptions. Default: true

Example: .../rest/find/document?contains=false

 

contentType - search metadata for the specific content type. Accepts names defined in SearchEngineCSW.AimsContentTypes. Default: none

Example: .../rest/find/document?contentType=downloadableData

 

dataCategory - search metadata for the specific data category. Accepts any set fo the following keywords eparated by come (,): farming, biota, boundaries, climatologyMeteorologyAtmosphere, economy, elevation, environment, geoscientificInformation, health, imageryBaseMapsEarthCover, intelligenceMilitary, inlandWaters, location, oceans, planningCadastre, society, structure, transportation, utilitiesCommunication. Default: empty set

Example: .../rest/find/document?dataCategory=economy,elevation

 

after - metadata updated afer certain date given as 'yyyy-mm-dd'

Example: .../rest/find/document?after=2006-01-01

 

before - metadata updated before certain date given as 'yyyy-mm-dd'

Example: .../rest/find/document?before=2006-01-01

 

orderBy - sort parameter. Accepts any of the values defined in SearchFilterSort.OptionsSort. Default: SearchFilterSort.OptionsSort.dateDescending.

Example: .../rest/find/document?orderBy=dateAscending

 

max - maximum number of records in the feed. Default: 10.

Example: .../rest/find/document?max=5

 

geometryType - defines how spatial data will be represented in the feed. Possible values are: esriGeometryPoint, esriGeometryPolygon, esriGeometryBox. Default: esriGeometryPolygon.

Example: .../rest/find/document?geometryType=esriGeometryPoint

 

f - output format. Possible values are: georss, kml, html, htmlfragment. Default: georss. If htmlfragment selected, body'less HTML snippet will be generated.

Example: .../rest/find/document?f=kml

 

style - style URL. Array of URL's of the Cascading Style Sheet (*.css) files used when html format is choosen. URL's are separated by coma (,).

Example: .../rest/find/document?f=html&style=http://<host>:<port>/<context>/main.css

 

target - links target. Possible values are: blank, parent, self, top (just like HTML "target" attribute except no leading underscore '_'). Default: blank. It affects every link generated in GEORSS feed. Example: .../rest/find/document?target=self


USGS Search Info - Conversations between Archie Warnock and Erin Robinson

Collection Field: 

From: Archie Warnock <warnock@awcubed.com>
Date: Mon, Mar 9, 2009 at 4:13 PM
Subject: Re: Another weird character
To: Erin Robinson <emr1@wustl.edu>


Erin Robinson wrote:
Another quick question: You have a 'collection field' in the return - is that title coming from the GEOSS component registry component name? We think with the combination of keywords and thiscollection field we will be able to search/find our records and related records with no problems. 

It's not really intended to be searchable because it's not consistent. Right now, it's a text string that I supply here.  If there turns out to be consistent usage in the service registry records (not the component registry), I will probably try to use it but right now I'm just taking the easy way out.


**** Resolution was to include catalog ID in the metadata records so that we could find them regardless of if it happens that USGS or others happen to tag on a collection/component information. Can search for component ID in clearinghouse and find our GEOSS registered componenets and all data access services too. 

 

Keyword Field:

Erin Robinson wrote:
I noticed that the keyword: Air Quality Community Catalog doesn't get harvested as a keyword, but it does return in the harvested record - is there some thing wrong with that term? It is identical in format to the Data Access Service keyword and that is harvested every time. 

I'll have to check.  I thought I'd been grabbing repeated values of the DescriptiveKeyword elements but it's possible that my parser is missing it.

*** Resolution: Fixed


Record Type: (March 6)
Erin Robinson wrote:
One other quick thing - is there a field in ISO 19115 that you pull for record type

When I import the record, I look at the metadata standard element and version to try to figure it out.  Actually, I just look at those to find the string '19115' and if that's there, I assume what it is.  Depending on how we wind up interpreting things, that could change.  Right now, I'm kinda uninterested in what the metadata record is pointing to - it's all I can do to keep track of the metadata records themselves.\

*** Resolution: Wait and Check again to see if there is a way to connect record type to associated service. 

Service type:
Erin Robinson wrote:
Our WMS records <http://clearinghouse.awcubed.com/cgi-bin/ch_query?cmd=search&count=20&startIndex=0&keyword=Data+Access+Service&title=&type=&collection_name=&rawtext=WebMapService&abstract=&identifier=&bbox=&format=HTML&.submit.x=66&.submit.y=1> have the string. The kind of neat thing is that if you search full text for WebMapService, the return is a few other records that are using the WMS URN and a lot of WCS records include the URN for WCS. I "faked" the service type as a queryable parameter in the search interface we've copied from you:http://capita.wustl.edu/AQ_CommCat_Client_Ed.htm

Yeah, cool!

I don't track service type on a record-by-record basis since for some (like FGDC records) it doesn't make sense.  Not everything in the clearinghouse is a service record.  That's a clever hack, though.

Raw text: 
Erin Robinson wrote:
Hey Archie - in the search:

http://clearinghouse.awcubed.com/cgi-bin/ch_query?cmd=search&count=20&startIndex=0&keyword=Data+Access+Service&title=&type=&collection_name=&rawtext=&abstract=&identifier=&bbox=&format=HTML&.submit.x=0&.submit.y=0 <http://clearinghouse.awcubed.com/cgi-bin/ch_query?cmd=search&count=20&startIndex=0&keyword=Data+Access+Service&title=&type=&collection_name=&rawtext=&abstract=&identifier=&bbox=&format=HTML&.submit.x=0&.submit.y=0>


What does rawtext= mean? what can I set it to? and for format, what formats can it return? we'd like to in either one or two steps get to the harvested xml. 

rawtext is what I use for full-text search.  I take the XML record and strip out all of the tags and attributes.  What's left is the full text of the document (the Registry records are an exception since most of their useful text is actually in attributes).  The value should be whatever search terms you want.

It can return (right now) HTML, Atom and an internal XML result format (that's pretty easy to use).  Plain text and Dublin Core are on the to-do list.

The two-step to get the harvested XML is to use the basic search to find the local document URN and then use "cmd=present&uuid=NNNNN".  I'm still tweaking the interface, though - better (when I get it running) to use the CSW one or OpenSearch.  Sheesh... I should write a manual someday ;-)

*** Resolution - follow up to see if we can have multiple raw text searches. 

 
Subpages (1): AQ Community Portal
Comments