Semantic wikis distributed on Peer-to-Peer networks
Beginning: 1st December 2006
Defense: 9th November 2010
Director: Pascal MOLLI
Co-director: Hala SKAF-MOLLI
Thesis subject
The Web 2.0 has shown the importance of collaborative systems. It has allowed to transform a community of strangers into a community of collaborators. Nowadays, many social tools are available such as wikis, blogs, instant messaging and video-conferencing systems. These social tools are used by large communities, they produce a large amount of information. The Semantic Web aims structuring this information in order to make it processable by machines, thus enabling more precise searches and more efficient navigation. Many social tools evolve by integrating the Semantic Web technologies such as RDF, RDFS, OWL etc.. Wikis as social tools follow this trend and evolve towards semantic wikis as Semantic MediaWiki, SweetWiki, IkeWiki and OntoWiki.
The introduction of Semantic Web technologies in social tools raises a number of problems:
Scalability and fault tolerance of these systems is a very difficult problem. The introduction of Semantic Web technologies makes these problems even more acute. The scalability and the fault tolerance of semantic wikis is clearly an open problem.
Maintaining the quality of the content is a general problem of collaborative systems. The quality of the content is related to the coordination process of the community actors. The introduction of Semantic Web technologies positioned semantic wikis as a collaborative system and an engineering system of ontologies. Supporting processes for ontologies acquisition and maintenance is an important element of ontologies engineering. The support of these processes is clearly an open problem in the context of semantic wikis.
The optimistic replication model allows to address both the scalability, the fault tolerance and the representation of processes problems. This model considers N sites replicating shared objects. An object is modified on a site by applying local operations. Then, these operations are propagated, received and integrated on the other sites. The system is correct if it respects some properties such as the causality and the eventual consistency. The resulting system scales because it allows to distribute the load on different sites. The failure of a site does not produce the failure of the entire system. Controlling the propagation of the operations between the sites allows the representation of processes.
The general contribution of this thesis is to show how it is possible to instantiate the optimistic replication model in the context of semantic wikis. We put ourselves in a context where the number of sites is unknown and varies like in peer-to-peer networks. As a collaborative system, an optimistic replication system is correct if it respects the consistency model CCI (Causality, Convergence, Intention):
Causality preservation The execution of the operation should respect the happened-before relation defined by Lamport,
Convergence when all the replicas have received all the operations, they have the same document,
Intention preservation the effects of the execution of an operation must be the same on all sites and its execution must not change the effect of independent operations.
The development of a Peer-to-Peer Semantic Wiki based on an optimistic replication raises several issues such as: how to maintain and replicate the new data type of the semantic wikis (wikis pages and annotations)? how to propagate and synchronize the changes on these data? and how to ensure the consistency model CCI on the replicated data?
Replication approaches in the P2P semantic web focus on the share of the semantic data and do not handle collaborative editing. In the context of distributed systems, existing synchronization algorithms do not guarantee the CCI consistency model on a data type that combines text and semantic annotations.
The challenge of this thesis is to propose an instantiation of optimistic replication model for semantic wikis. The instantiation of this model on this data type requires defining the editing operations on these data and their intentions, proposing mechanisms to propagate them, and adapting the synchronization algorithms to ensure the CCI consistency on a data type combining text and semantic annotations.