Research Projects

Introduction

The importance of Data Management nowadays makes it necessary to find new paradigms to extract and obtain information out of the huge data repositories maintained by companies. There are three aspects that are crucial in this regard. First, data volumes are usually very large, with companies storing terabytes of data that need to be processed in shorter amounts of time. Second, quality of the data is essential, since it allows to extract sound and significant conclusions out of the information obtained from those repositories. Third, the capability to extract information out of the huge amounts of processed data, which allows the different actors to obtain the information required for each business process. There are different solutions and research efforts regarding these issues. Business intelligence, Master Data Management or Custormer Data Integration are three examples of such areas where companies and research institutions place significant efforts.

My main research topics

A Short Description of My Main Research Topics

Through my research career I have been oriented to performance, exploration and quality in data management, focusing particularly on large data volumes. I investigate the creation of new data structures, algorithms, methods and applications in the area of Data Management that make it easier to manipulate large amounts of data.

I have been working on maximizing performance in Graph Database Management Systems, while I was in DAMA-UPC. In this area, we proposed a new way to devise data storage and management. Information tends to be organized in large networks where not only data about certain entities are important, but also the relationship between those entities. Examples of this may include social networks, biochemical investigation on complex organisms, communication networks, etc. We propose a new and sound system that allows for the efficient manipulation of data in large networks. This type of system poses new research challenges to be explored. Second, we study performance in Relational Database Management Systems. Most of the data in the real world is still organized following the traditional relational model. In this situation, it is mandatory to be able to return reliable and fast answers to user queries on complex databases.

Also, I have focused on Relational Database Management Systems and Data Privacy aspects. In the following, we include a brief description of our main research projects.

Graph Databases

Data tends to be organized in huge data networks!

The size of the volume of data manipulated in any organization is Today constantly drier. The analysis of these has an increasingly greater role In the decision-making of large enterprises or in the study of various fields, Academics and non-academics, which have an impact on the improvement of life Society in which we live.

The future of organizations of the information points clearly to a tendency to organize In a natural way data in the form of large graph and networks where the various entities Represent defined in a set of nodes and their relationships are expressed with a set of Edges that unite them.

These are the three topics I have been focusing my attention on:

Relational Database Management Systems

Managing Large Amounts of Relational Data!

The quick technological evolution is the Philosopher’s Stone for success in most of nowadays businesses. The computer era is not a novelty anymore. Every company, organization or industry has a digital library to store important data for their businesses.

The use of Relational Database Management Systems as powerful tools to store, modify and access data in a database is completely generalized world-wide. The complexity of RDBMSs range from the most simple applications, designed for home use or small companies with modest information storage requirements, like Microsoft Access, to very complex and sophisticated RDBMSs, such as DB2 UDB, Oracle or Microsoft SQL Server, used in critical situations where the huge amount of data to be manipulated requires advanced techniques to improve performance.

However, the rapid and continuous growth of the amount of data to be stored and manipulated in-creases beyond the possibilities of current hardware and software, jeopardizing the acceptable performance of RDBMSs.

These are the topics I have been focusing my attention on:

Performance Aspects of Data Privacy and Anonymization for Very Large Datasets

When Size Matters

With the increase of available public data sources and the interest for analyzing them, privacy issues are becoming the eye of the storm in many applications. The vast amount of data collected on human beings and organizations as a result of cyberinfrastructure advances, or that collected by statistical agencies, for instance, has made traditional ways of protecting social science data obsolete. This has given rise to different techniques aimed at tackling this problem and at the analysis of limitations in such environments. The growing accessibility to high-capacity storage devices allows keeping more detailed information from many areas. While this enriches the information and conclusions extracted from this data, it poses a serious problem for most of the previous work presented up to now regarding privacy, focused on quality and paying little attention to performance aspects. In our group we explore data privacy and anonymization requirements related to the area of high performance and very large data volumes management (i.e. algorithms and structures for efficient data management, parallel or distributed systems, etc).

These are the main topics I have been focusing my attention on: