Publications

A Schema Integration Approach for Big Data Analysis. 

Authors : Souad Amghar, Safae Chardal and Salma Mouline

Journal: Ingénierie des Systèmes d'Information 

April 2023

Abstract

"A huge volume of data is analyzed by organizations to understand their clients and improve their services. In many cases, these data are stored separately in different database systems and need to be integrated before being used in analysis tools or prediction applications. One of the main tasks of data integration process is the definition of the global schema. Defining a global schema in the context of NoSQL systems is a demanding task since it necessitates dealing with a variety of issues, including the lack of local schemas, data model heterogeneity, and semantic heterogeneity. To address these challenges, this work aims to automatically define the global schema of a set of databases stored in heterogeneous NoSQL systems. The main contributions of this work are presented in three phases: (1) Schema extraction where we define the local schemas using a unified representation. (2) Schema matching in which we propose a hybrid approach to find matching attributes between the local schemas. (3) Schema integration where we define the global schema using the schema matching results. A Covid-19 use case as well as other benchmarks are presented in this paper to evaluate the results of the proposed approach and illustrate its effectiveness. '


Storing, Preprocessing and Analyzing Tweets: Finding the suitable NoSQL System

Authors : Souad Amghar, Safae Chardal and Salma Mouline

Journal: International Journal of Computers and Applications

November 2020

Abstract

"In the past few years, Tweets have been widely used to perform Big Data analysis. However, the incredible

amount of data captured by Twitter needs to be stored for further processing which may be a challenging

task for many database systems. NoSQL is a generation of databases that aim to handle a large volume of data.

However there is a large set of NoSQL systems, each has its own characteristics. Consequently choosing the

suitable NoSQL system to handle Tweets is challenging. Based on these motivations, this work is carried out to

find the suitable NoSQL system to manage Tweets. This paper presents the requirements of managing Tweets

and provides a detailed comparison of five NoSQL systems namely, Redis, Cassandra, MongoDB, Couchbase

and Neo4j regarding these requirements. The five NoSQL systems are compared in a real scenario where we

collect and analyze 1.000.000 Tweets. The chosen scenario enables to evaluate not only the performance of

the read and write operations, but also other requirements related to Tweets management such as scalability,

analysis tools support and analysis languages support. The obtained results show that Couchbase is the most

suitable NoSQL systems for managing Tweets."


Data Integration and NoSQL Systems: A State of the Art 

Authors : Souad Amghar, Safae Chardal and Salma Mouline

Conference: The 4th International Conference On Big Data and Internet of Things (BDIoT 2019) October 23-24, 2019, Tangier-Tetuan, Morocco

Abstract:

 " Data Integration is one of the older research problems in database area. It aims to combine data stored in different data sources and provide a unified view of data to the user. With the spread of a new generation of database systems, called NoSQL systems, data integration becomes more challenging, since we have to integrate data stored in systems that implement different data models and query languages. Inspired by these motivations, we provide in this paper an overview of data integration challenges and solutions in the context of NoSQL systems.  "


SBGN2HFPN transformation of SBGN-PD into Petri Nets illustrated on the Glycolysis pathway 

Authors: Safae Cherdal, Salma Mouline and Souad Amghar

Journal: International Journal of Intelligent Engineering and Systems, 

October 2018

Abstract:

" Systems Biology Graphical Notation (SBGN) is a standard graphical language for representing biological and biochemical processes and interactions. Even that SBGN Process Description-an SBGN sub-language- facilitates biological systems representation, it allows only qualitative descriptions of metabolic pathways and does not provide quantitative analytic environment. However such descriptions are essential for behavioural analysis. This analysis can be possible using computational formalisms such as Hybrid Functional Petri Net (HFPN) which is a Petri net extension dedicated to study and verify biopathways. However, biologists use generally graphical notations such as SBGN. To address this paradox, we propose, in this paper, an SBGN-PD to HFPN transformation based on Model Driven Engineering (MDE). Furthermore, we illustrate this transformation on the Glycolysis pathway and verify the transformation resulting HFPN model by comparing it with the existing HFPN model through simulation.  "


Which NoSQL Database for IoT Applications ?

Authors : Souad Amghar, Safae Chardal and Salma Mouline

Conference: 2018 International Conference on Selected Topics in Mobile and Wireless Networking (MoWNeT) , June 20, 2018.

Abstract:

 " A large amount of data are generated every moment by connected objects creating Internet of Things (IoT). These data are difficult to handle using traditional databases leading to use NoSQL databases. These latter have achieved a large popularity thanks to their high performance, flexibility in scaling, and high availability. But, which NoSQL database is the most suitable for IoT applications? In this paper, we discuss the main requirements of IoT data management, and we compare five of the most popular NoSQL databases namely, Redis, Cassandra, MongoDB, Couchbase and Neo4j, in accordance with IoT data management requirements, in order to find the most suitable NoSQL database for IoT applications. "