Publications
A Schema Integration Approach for Big Data Analysis.
Authors : Souad Amghar, Safae Chardal and Salma Mouline
Journal: Ingénierie des Systèmes d'Information
April 2023
Abstract
"A huge volume of data is analyzed by organizations to understand their clients and improve their services. In many cases, these data are stored separately in different database systems and need to be integrated before being used in analysis tools or prediction applications. One of the main tasks of data integration process is the definition of the global schema. Defining a global schema in the context of NoSQL systems is a demanding task since it necessitates dealing with a variety of issues, including the lack of local schemas, data model heterogeneity, and semantic heterogeneity. To address these challenges, this work aims to automatically define the global schema of a set of databases stored in heterogeneous NoSQL systems. The main contributions of this work are presented in three phases: (1) Schema extraction where we define the local schemas using a unified representation. (2) Schema matching in which we propose a hybrid approach to find matching attributes between the local schemas. (3) Schema integration where we define the global schema using the schema matching results. A Covid-19 use case as well as other benchmarks are presented in this paper to evaluate the results of the proposed approach and illustrate its effectiveness. '
Storing, Preprocessing and Analyzing Tweets: Finding the suitable NoSQL System
Authors : Souad Amghar, Safae Chardal and Salma Mouline
Journal: International Journal of Computers and Applications
November 2020
Abstract
"In the past few years, Tweets have been widely used to perform Big Data analysis. However, the incredible
amount of data captured by Twitter needs to be stored for further processing which may be a challenging
task for many database systems. NoSQL is a generation of databases that aim to handle a large volume of data.
However there is a large set of NoSQL systems, each has its own characteristics. Consequently choosing the
suitable NoSQL system to handle Tweets is challenging. Based on these motivations, this work is carried out to
find the suitable NoSQL system to manage Tweets. This paper presents the requirements of managing Tweets
and provides a detailed comparison of five NoSQL systems namely, Redis, Cassandra, MongoDB, Couchbase
and Neo4j regarding these requirements. The five NoSQL systems are compared in a real scenario where we
collect and analyze 1.000.000 Tweets. The chosen scenario enables to evaluate not only the performance of
the read and write operations, but also other requirements related to Tweets management such as scalability,
analysis tools support and analysis languages support. The obtained results show that Couchbase is the most
suitable NoSQL systems for managing Tweets."
You can find Tweets used in this work here
Data Integration and NoSQL Systems: A State of the Art
Authors : Souad Amghar, Safae Chardal and Salma Mouline
Conference: The 4th International Conference On Big Data and Internet of Things (BDIoT 2019) October 23-24, 2019, Tangier-Tetuan, Morocco
Abstract:
" Data Integration is one of the older research problems in database area. It aims to combine data stored in different data sources and provide a unified view of data to the user. With the spread of a new generation of database systems, called NoSQL systems, data integration becomes more challenging, since we have to integrate data stored in systems that implement different data models and query languages. Inspired by these motivations, we provide in this paper an overview of data integration challenges and solutions in the context of NoSQL systems. "
You can request the full text from here
SBGN2HFPN transformation of SBGN-PD into Petri Nets illustrated on the Glycolysis pathway
Authors: Safae Cherdal, Salma Mouline and Souad Amghar
Journal: International Journal of Intelligent Engineering and Systems,
October 2018
Abstract:
" Systems Biology Graphical Notation (SBGN) is a standard graphical language for representing biological and biochemical processes and interactions. Even that SBGN Process Description-an SBGN sub-language- facilitates biological systems representation, it allows only qualitative descriptions of metabolic pathways and does not provide quantitative analytic environment. However such descriptions are essential for behavioural analysis. This analysis can be possible using computational formalisms such as Hybrid Functional Petri Net (HFPN) which is a Petri net extension dedicated to study and verify biopathways. However, biologists use generally graphical notations such as SBGN. To address this paradox, we propose, in this paper, an SBGN-PD to HFPN transformation based on Model Driven Engineering (MDE). Furthermore, we illustrate this transformation on the Glycolysis pathway and verify the transformation resulting HFPN model by comparing it with the existing HFPN model through simulation. "
You can find the full text here
Which NoSQL Database for IoT Applications ?
Authors : Souad Amghar, Safae Chardal and Salma Mouline
Conference: 2018 International Conference on Selected Topics in Mobile and Wireless Networking (MoWNeT) , June 20, 2018.
Abstract:
" A large amount of data are generated every moment by connected objects creating Internet of Things (IoT). These data are difficult to handle using traditional databases leading to use NoSQL databases. These latter have achieved a large popularity thanks to their high performance, flexibility in scaling, and high availability. But, which NoSQL database is the most suitable for IoT applications? In this paper, we discuss the main requirements of IoT data management, and we compare five of the most popular NoSQL databases namely, Redis, Cassandra, MongoDB, Couchbase and Neo4j, in accordance with IoT data management requirements, in order to find the most suitable NoSQL database for IoT applications. "
You can request the full text from here