Legislation and Regulation

on the Semantic Web

June 17, 2019, Université de Montréal, Canada

in conjunction with ICAIL

Call for Participation

This workshop will bring together academics, lawyers, government administrators, legal service providers, and corporate researchers to analyze and discuss a shared, linked corpus of legislation and regulation that is represented in a machine readable form, e.g. Semantic Web/Knowledge Graph technologies. With such technologies, data is published in a standardized format such as Resource Description Framework (RDF), Web Ontology Language (OWL), Extensible Markup Language (XML), or others. Such formats facilitate querying, linking, and inferencing over the internet and with automation.

The goal of working with a shared corpus using the Semantic Web is that individual researchers can work with tools, formats, or approaches or theories as they wish, but still have a common basis for discussion, linking, and accessing resources. Over the course of the workshop (before, during and after), efforts will be made to integrate the contributions and demonstrate the utility of a prototype legislative/regulatory Semantic Web. The shared corpus will be a relatively small, coherent, multi-jurisdictional, related corpus of articles international relevance as selected by the workshop organisers (see Shared Corpus below).

The workshop is very timely. There have been recent advances in natural language processing of legal texts, developments of legal XMLs (e.g., LegalDocML and LegalRuleML), creations of corpora of annotated texts (e.g., GDPR, financial regulations, smoking regulations), and formations of international online networks of collaborators (e.g., Better Rules, World Legal Information Institutes) which aim to create machine readable legislation and regulation. Yet, there has not been a forum for a task to develop a shared, linked, diverse corpus which tests out fundamental goals of the Semantic Web. This workshop addresses this gap in the research and development community.

Topics for the workshop:

  • ontologies
  • XML, JSON, RDF, etc.
  • Controlled natural language,
  • Translation from natural language into a formal language
  • Methodologies and tools for translation
  • Interpretative issues of the texts
  • Querying
  • Inference
  • Practical matters related to how to promote and realise the LegRegSW vision.
  • Abstract goals and position papers on machine-readable, linkable legislation/regulation
  • Specific suggestions and position papers, e.g. a high level annotation language

Invited Speaker:

Pierre-Paul Lemyre, Director of Business Development, Lexum.com

Accepted papers:

Wolfgang Alschner and Peter Zachar, University of Ottawa, Canada

    • Corpus Analysis of Canadian Federal Regulations

Enrico Francesconi, Institute of Legal Information Theory and Techniques, Italy, and Publications Office of the European Union, Luxembourg

    • Semantic Annotation for Reasoning on Legal Provisions and Norms: a GDPR regulation exercise

Mirna El Ghosh and Habib Abdulrab, National Institute of Applied Sciences of Rouen, France

    • Towards a Well-Founded Legal Domain Ontology for the Protection of Personal Data Grounded on the Unified Foundational Ontology

Guido Governatori, Silvano Colombo Tosatto, Gabriela Ferraro, Nick van Beest, Mohammad Badiul Islam, Ho-Pun Lam, Francesco Olivieri and Regis Riveret, The Commonwealth Scientific and Industrial Research Organisation and Data61, Australia

    • Extracting Rules from Legal Texts: Challenges

Adrian Kelly, Inland Revenue Department, New Zealand

    • A Computer Language Model for Digitising New Zealand Statute Law

Monica Palmirani, University of Bologna, Italy

    • Legal Knowledge Extraction from Legal Document: GDPR Use-Case

Daniela Piana, University of Bologna, Italy

    • Frames of Case Laws as Bottom Up Avenues toward Graph Technology in a Multiple Sourced Legal System

Mark Stodder and Grant Vergottini, Xcential Legislative Technologies, United States

    • Xcential's approach to machine-readable law

Important Dates

Submission due date: May 20, 2019

Accept/Reject Notification: May 22, 2019

Suitability of submissions for the workshop will be evaluated by members of the organising and program committees.

Paper Submission Format

We welcome submissions of titles and abstracts (up to 400 words) that address the workshop description and topics (above). The abstracts should indicate what machine readable resources have been created, discussion about the resources, e.g. tools and techniques that were applied to the corpus, and links to the resources. See Workshop Format for further information about machine readable resources.

The Organising Committee will review the abstracts for appropriateness for the workshop. Authors of accepted abstracts will be invited to present at the workshop.

Note that virtual presentation and participation is also welcome.

See the note below on Publication subsequent to the workshop.

Workshop Format

Prior to the workshop, participants should make their machine readable resources accessible on the web, queriable by others on the web, and (optimally) downloadable. Resource formats can be in, e.g., XML, RDF, JSON. By accessible and queriable on the web, it is meant that the files can be accessed by, e.g., API, SPARQL, or a JSON query tool.

In the morning session, there will be short presentations of the reports and demos from participants about their work on the shared corpus and their approach.

In the afternoon session, there will be open discussion by participants about linking, querying, and inferencing with respect to the shared corpus and relative to the contributed resources. Participants will outline a plan for further work and collaboration.

Practical Information

The workshop is held in conjunction with International Conference on AI and Law (ICAIL), June 17-21, 2019, Montréal, Canada. See the ICAIL conference for registration, travel, accommodation, and program information.

Links to Tools and Resources

Social Media

Twitter: #legregsw


If there is sufficient interest by workshop participants, the organising committee will seek a publication outlet. The presentation and discussion at the workshop should provide an opportunity to further develop the work individually and collaboratively.

Organising Committee

  • Adam Wyner, Swansea University, Law and Computer Science
    • First point of contact: a.z.wyner@swansea.ac.uk
  • Adeline Nazarenko, University of Paris 13, LIPN
  • Francois Levy, University of Paris 13, LIPN
  • Matt Lynch, Parliamentary Counsel Office, Scottish Government
  • Enrico Francesconi, Italian National Research Council (ITTIG-CNR), Publications Office of the EU
  • Monica Palmirani, University of Bologna, Legal Studies

Shared Corpus

The main purposes of the workshop is to create and discuss as much material in common as possible which is 'coherent', machine-readable, web-accessible, linkable, and queryable analysis of the normative contents of a body of legal text.

To accomplish this, we have created a small, scoped, shared corpus. With such a small, scoped, shared corpus, participants can better appreciate the commonalities/differences of alternative analyses and tool application along with issues arising from integration and access. Such an approach is intended to contribute towards work of greater breadth, depth, completeness, and connectedness. Our choice of shared corpus is pragmatic and intended to serve the scientific purposes of the meeting.

To suit the main purposes of the workshop, the OC has selected a text based on the following guidelines:

  • a topic of widespread, common interest;
  • small and narrowly scoped;
  • extensible in terms of scale, related works, languages, and jurisdictions;
  • in a commonly used language;
  • normative content.

Therefore, we propose to create a shared corpus:

  • EU legislation on Data Protection
  • only selected passages
  • in English
  • articles that bear on norms
  • distinguished between what is out of scope, then what is of primary, secondary, and tertiary interest.

While the focus is first and foremost on the shared primary corpus, discussions are welcome about the other levels, the selections, and alternatives.

Should participants not wish to work with the primary (secondary or tertiary) corpus, but some other corpus, such discussion is also welcome, providing some explanation for why they cannot or do not want to work with the shared primary corpus.

The document we propose for the shared primary corpus is drawn from


Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance)

Document 32016R0679

The online source text is:


To create the corpus, we have selected chapters and articles from the source text. The intention is that the text is facilitates discussion about analysis and linking rather than being representative of the whole legislation. A principled selection is impractical (other choices could be made) and ill-formed (the legislation per se comes as a whole); more to the point, a principled edit is not essential to the current technical agenda. Selections of any sort are likely to give rise to issues of analysis, processing, interpretation, or dependencies, which are valid topics of discussion at the meeting.

We have made the corpus available in two forms - primary corpus in a text file (link below) or a roll-your-own corpus. To indicate our selections from the source text and create a roll-your-own corpus, we have used the table of contents with the following:

*0* out of scope - not to be included in the analysis of the text.

*1a* primary - the first portions of text to be analysed. This is a section of articles from Chapter 3 of the legislation.

*1* additional material that can be considering after *1a*.

*2* secondary - where the primary corpus is treated, the secondary corpus can be added.

*3* tertiary - where the secondary corpus is treated, the tertiary corpus can be added.

After working on *1a*, researchers are invited to add additional sections. One should preferably follow the order above. Definitions might be particularly important. Moreover, so as to create as much common analysis to the greatest extent possible, we advise a top-down methodology, working from the top of the document and down.

The primary corpus in a text file only contains the *1a* components. Caveat: errors such as missing sections or other belong to Adam Wyner.

One can use the Table of Contents with Indications to create the other corpora.