This master project will be part of the CEMAGREF project called Knowledge for Organic Farming and Innovative System (KOFIS). KOFIS is a Semantic Web French portal (Meilender et al. 2011) dedicated to sustainable agriculture. The main component of a Semantic Web portal is a tailored semantic resource used to annotate the element of the web portal. This kind of resource is defined as an ontology: formal, explicit specification of a shared conceptualization (Gruber 1993). The W3C has proposed several formats to store ontologies: RDF, OWL, SKOS.
The main objectives of this master project is to propose a design method to build several ontologies for the KOFIS system. The design method will be based on transformations of other resources like the thesaurus AGROVOC, biological taxonomies, official databases and so on. KOFIS can import ontologies in OWL format to annotate web pages. Thus the project would propose a design method of OWL ontology based on thesaurus (Assem et al. 2004) (Hepp & Bruijn 2007) . This method should be dedicated to agricultural domain.
The student will have to study the different types of resources available in agriculture. He will also have to define for each type the interesting elements for the KOFIS system. Based on his/her observations, he/she should propose a method for extracting the interesting elements of this resource and also propose a method to arrange and structure them adequately following ontology design pattern.
For example AGROVOC is the most well-known worldwide multilingual terminological resource on agriculture. AGROVOC is a multilingual thesaurus, so it is used by librarians for indexing documents (B. J. Wielinga et al. 2001). The 3 relationships available in AGROVOC thesaurus are :
broader/narrower relations represent any hierarchical relations like localization, subset, instance of, part-of, etc...
used for relations group keywords having similar meaning and propose the keyword validated by librarians
see also relations represent any other relationships that librarians can find useful...
In order to build an ontology, AGROVOC has several drawbacks:
AGROVOC covers large domains of agriculture in several languages. The first goal will be to build smaller, domain oriented and language oriented vocabularies (B. J. Wielinga et al. 2001).
Because AGROVOC is a thesaurus the relations between keywords are very broad and may not be complete or correct (Soergel et al. 2004). The second goal of this project will be to enrich the previous vocabularies with very specialized relationships already defined in OWL ontologies (like the has pest relationship). This step can reused specialized taxonomies or databases (Paolo Ponzetto & Navigli 2009).
At the end of the project from AGROVOC thesaurus we would like to build several OWL ontologies based on English and French languages (Oltramari & Stellato 2008) (Aguado-de-Cea et al. 2010). Another perspective of this project it to publish these ontologies on the web using semantic web technology (linked data).
Context KOFIS :
In the context of the “Knowledge for Organic Farming and Innovative System” (KOFIS) project, two different tools, Drupal Content Management System (Drupal CMS) and Semantic Media Wiki (SMW), are used to construct a knowledge management system named KOFIS.
In KOFIS, Drupal CMS is an open space where different kinds of users can create blogs and forum concerning topics in agriculture. Content of blogs and forum is about problems faced by farmers in their crops production. Drupal uses tags as annotation vocabulary for content classification and searching. Tags form taxonomy. The hierarchical relation of taxonomy in Drupal is informal parent-child relation (childOf). This hierarchical relation means that one tag is more specific or more general than the other. The taxonomy can be enriched with terms (without taking into account the hierarchy of terms in the thesaurus) from Agrovoc thesaurus. In any case, it is possible to modify the hierarchy of Drupal taxonomy.
In KOFIS, SMW is a closed space in which only the information approved by experts is stored. SMW uses categories and properties for annotation. Categories form taxonomy. The hierarchical relation of taxonomy in SMW is formal subClassOf relation. The taxonomy is backbone of domain ontology. Categories, properties and annotated pages are considered respectively as classes, properties and individual in the domain ontology of SMW.
Users intend to use information stored in SMW to help the process of searching solution to problems stored in Drupal. Consequently, the information stored in SMW is queried from Drupal. The query composed of tags in Drupal is used to retrieve pages annotated with categories of SMW.
References:
Aguado-de-Cea, G. et al., 2010. Rivière or Fleuve? Modelling Multilinguality in the Hydrographical Domain. In Proceedings of the 1st International Workshop on the Multilingual Semantic Web (MSW). USA: CEUR Workshop Proceedings, pp. 21-28.
Assem, M. et al., 2004. A Method for Converting Thesauri to RDF/OWL. In The Semantic Web – ISWC 2004. LNCS. Berlin, Heidelberg: Springer, pp. 17-31. Available at: http://www.springerlink.com/index/10.1007/978-3-540-30475-3_3 [Accessed May 17, 2011].
Gruber, T., 1993. A translation approach to portable ontology specification. , 5(2), pp.199-220.
Hepp, M. & Bruijn, J., 2007. GenTax: A Generic Methodology for Deriving OWL and RDF-S Ontologies from Hierarchical Classifications, Thesauri, and Inconsistent Taxonomies. In The Semantic Web: Research and Applications. LNCS. 4th European Semantic Web Conference, ESWC 2007. Austria: Springer Berlin Heidelberg, pp. 129-144. Available at: http://www.springerlink.com/index/10.1007/978-3-540-72667-8_11 [Accessed May 5, 2011].
Meilender, T. et al., 2011. Les moteurs de wikis sémantiques : un état de l’art. In RNTI. Extraction et gestion des connaissances (EGC’2011). Brest, France: Hermann, pp. 575-580.
Oltramari, A. & Stellato, A., 2008. Enriching Ontologies with Linguistic Content: an Evaluation Framework. In Ontolex Workshop. Marrakech, Morocco.
Paolo Ponzetto, S. & Navigli, R., 2009. Large-scale taxonomy mapping for restructuring and integrating wikipedia. In Proceedings of the 21st International Joint Conference on Artifical Intelligence (IJCAI). California, USA, p. 2083 2088. Available at: http://portal.acm.org/citation.cfm?id=1661778 [Accessed May 18, 2011].
Wielinga, B.J. et al., 2001. From thesaurus to ontology. In Proceedings of the international conference on Knowledge capture - K-CAP 2001. the international conference. Victoria, British Columbia, Canada: ACM, pp. 194-201. Available at: http://portal.acm.org/citation.cfm?doid=500737.500767 [Accessed August 24, 2011].
Soergel, D. et al., 2004. Reengineering Thesauri for New Applications: The AGROVOC Example. Journal on Digital Information, 4(4).
Experience in programming and design is required. Good knowledge of design are mandatory: design pattern.
advanced programming skills in php, java, XML and XSLT; advanced design skills in databases and UML language.
The applicant should be fluent in English or French.
Well organized and motivated person.
Good knowledge of Web Semantic technologies: OWL, RDFS, RDF, SKOS, SPARQL. Good knowledge of ETL technologies. Experience in linked data or web of data project will be appreciated.
The applicant will work with different kind of domain experts so good communication skills will be appreciated.
location: Cemagref de Clermont Ferrand, France.
email contact: Catherine.roussey at cemagref.fr
The salary will be around 436,05 euro by month.
The working hours are 35 hours by week.
Duration: 4 at 6 months (to be discussed).
At the end of the master project, the candidate can apply to a PhD proposed by the Cemagref on the same subject.
Applicants should send their CVs, a letter of motivation, their two last year school reports, and an example of one of their project reports to catherine.roussey at cemagref.fr