Information Integration 2020/2021

Lectures.

Teacher. Marco Console (console at diag dot uniroma1 dot it)

For whom is this course. Information Integration is a 3 credit section of the course Large Scale Data Management for the students of the Master in Engineering of Computer Science (School of Engineering) of Sapienza UniversitĂ  di Roma.

Prerequisites. A good knowledge of Databases (SQL, relational data model, Entity-Relationship data model, conceptual and logical database design) and some notions of mathematical logic. Relevant notions will be surveyed during the course.

Course goals. Information integration is the problem of combining together heterogeneous data residing at different sources and providing the user with a unified view of these data. Integrating heterogeneous data is fundamental problem of real world applications and is characterized by exciting issues, both theoretical and practical. In this course we will introduce the theoretical foundations of information integration and present an overview on the body of research work carried out in the area. Special attention will be devoted to the following aspects: logical foundations of information integration, architectures for information integration, modeling an information integration application, ontology-based data access and integration, processing queries in information integration, data exchange, and reasoning on queries.

Course Material. Lecture notes (slides) and recorded lectures will be made available in the Moodle page of this course. https://elearning.uniroma1.it/enrol/index.php?id=12918

Exam. There are two different possibilities for the exams

  • Students study and present a research paper assigned by the instructor.

  • Students perform a group project (1-2 individuals) regarding the topics covered in the course. There are two possibilities for the projects.

    • Choose a set of data sources with data relevant for a certain phenomenon (for example, data taken from open data published on-line, or data taken from a database or from csv files, or from xls files known by the student), and develop a data integration or data exchange application using such data sources (and using any tool selected by the student).

    • Choose a coordinated project with the section on Big Data Management taught by Prof. Lembo. Just to give an idea of possibile coordinated projects, here is a(n incomplete) list of possible projects:

      1. the part related to Information Integration can be an ETL system integrating interesting data sources into a data warehouse using a chosen tool, and the parte on Big Data Management can be an application of OLAP operations over the integrated data in order to carry out interesting analyses.

      2. The part related to Big Data Management can be the set up of heterogeneous data sources, including NoSQL data sources, and the part on Information Integration can be a system integrating such data sources using a chosen tool.

      3. The part related to Information Integration can be a system integrating interesting data sources into a NoSQL source (such as a graph database), and the part on Big Data Management can be the implementation of queries over the integrated data.

Previous Editions. Information Integration 2019/20. Thought by Prof. Maurizio Lenzerini. (http://users.diag.uniroma1.it/~lenzerin/home/?q=node/76)