Nowadays, data are on move from legacy systems to distributed heterogeneous environment with multi-featured sources (e.g. IoT, sensors, etc.). The possible hidden characteristics of federated data need to be considered to make proper clustering decision in heterogeneous environment. In this study, we focus on unsupervised learning rather than supervised learning because unsupervised learning can exploit freely relations among federated datasets due to non-reliance on the external information, whereas supervised learning confines its analytical model to class labels which may mask important features in federated environments. Current clustering techniques do not produce considerable results for distributed heterogeneous environment, but have the potential to deliver promising outcomes by tinkering data analytics design procedure to draw inferences about distributed data through advance clustering algorithms. One of the promising candidate to deliver optimal solution by exploiting federated heterogeneous environment using unsupervised learning may be collaborative clustering. Collaborative clustering may get an edge to improve speed and quality of analytics by features federation in match to legacy and competitive methodologies. Collaborative clustering runs the clustering algorithms locally and then collaborate by exchanging information about their findings remotely. Hence, this collaborative clustering approach searches for hidden common structure in federated distributed environment to empower the organization(s) with right decisions besides maintaining privacy and confidentiality.
Cloud computing systems are currently composed of large numbers of relatively inexpensive computers, interconnected by standard IP routers and supported by stock disk drives. However, many demanding applications have now reached a fundamental limit in their ability to scale out using traditional machines. Future performance improvements will derive from the use of high-end specialized equipment in addition to standard hardware: GPU, FPGAs, programmable routers, and advanced storage technologies. In this context, we investigates: (i) how cloud providers may offer such extremely heterogeneous hardware to its users; and (ii) how cloud customers may make use of these heterogeneous resources to run their applications such that they exhibit the best possible price-performance tradeoff. Contrary to regular resource scheduling, cross-resource scheduling consists of provisioning groups of resources with inter-resource constraints such as the available bandwidth between resources. It uses its knowledge about the physical layout of the cloud to translate such requests into specific requests for individual resources. As such, it sits at the border between the Infrastructure-as-a-Service layer (which makes individual resources available as a service) and the Platform-as-a-Service layer (which needs sets of resources to execute user applications).
Nowadays, social networks have become popular in a wide variety of areas in the society. Since the massive and complex data of social networks are generally highly connected, they can be modeled as a graph. In this context, graph databases are emerging as a promising technology to manage social networks while providing many advantages such as flexible modeling of complex data, efficient data traversals on the graph, processing of massive data while avoiding costly join operations, the use of graph theory algorithms, etc. In addition, fuzzy sets provide interesting properties for flexible querying of databases. A fuzzy set models a gradual property that can be integrated as a preference in a query addressed to a database. Fuzzy queries extend boolean logic by taking into account the tolerance and the gradualness in users' intentions. Since fuzzy sets are mainly studied in the context of relational and object-oriented databases, their study in the context of graph databases constitutes a new line of research. My research thus aims at achieving fuzzy querying of graph databases in social networks.
My post-doctoral research focused on two main areas: P2P streaming and resource allocation.
We proposed a peer-to-peer system for streaming user-generated live video. Peers are arranged in levels so that video is delivered at about the same time to all peers in the same level, and peers in a higher level watch the video before those in a lower level. We encoded the video bitstream with rateless codes and used trees to transmit the encoded symbols. Trees are constructed to minimize the transmission rate for the source while maximizing the number of served peers and guaranteeing on-time delivery and reliability at the peers. We formulated this objective as a height bounded spanning forest problem with nodal capacity constraint and computed a solution using a heuristic polynomial time algorithm. We conducted ns-2 simulations to study the trade-off between used bandwidth and video quality for various packet loss rates and link latencies. We also developed the real application to evaluate our proposal under real life conditions with end users.
In a multioverlay live video sharing service consisting of multiple independent peer-to-peer live video streaming systems, a user can simultaneously watch multiple live video streams. A major challenge for such services is the interoverlay bandwidth competition problem, which is to find an upload bandwidth allocation between the overlays each peer has subscribed to. So far, no solution has been proposed in the literature for the important case where the overall system is under-provisioned, that is, when peers do not have enough upload bandwidth to ensure a distribution of videos at full quality. We showed that an allocation of upload resources that minimizes the wastage of resources (i.e., minimizes the upload bandwidth allocated to overprovisioned overlays) can be computed in polynomial time. Then we presented a generic model that allows the design of different strategies for the management of the resource deficit in underprovisioned systems. Finally, we provided relevant simulation results to demonstrate the gains in video quality resulting from the implementation of our solutions.
Virtual environments (VEs) are 3-D virtual worlds in which a huge number of participants play roles and interact with their surroundings through virtual representations called avatars. VEs are traditionally supported by a client/server architecture. However, centralized architectures can lead to bottleneck on the server due to high communication and computation overhead during peak loads. Thus, P2P overlay networks are emerging as a promising architecture for VEs. However, exploiting P2P schemes in VEs is not straightforward, and several challenging issues related to data distribution and state consistency should be considered.
One of the key aspects of P2P-based VEs is the logical platform consisting of connectivity, communication and data architectures, on which the VE is based. The connectivity architecture is the overlay topology structure, which defines how peers are connected to each other. The communication architecture is the routing protocol defining how peers can exchange messages, while the data architecture defines how data are distributed over the logical overlay. The design of these architectures has significant influence on the performance and scalability of VEs.
We first proposed a connectivity architecture based on the fact that a user sees only a portion of the virtual world, commonly known as Area of Interest (AOI), where she/he performs actions (i.e., moving around, manipulating objects, communicating with other users, etc.). It is essential to dynamically organize the overlay network with respect to users' positions in the virtual world by having each user only connect to the set of neighboring users lying in its vicinity, and ensure that a user only receives state update messages of events happening within its AOI. Towards this end, the well-known Delaunay Triangulation can be used and is widely accepted as a basic construct to provide connectivity between VE users based on virtual proximity. However, a Delaunay Triangulation clearly suffers from high maintenance cost as it is subject to high connection change rate due to continuous users' movement in the virtual world. Therefore, we proposed a connectivity architecture based on a new triangulation algorithm, Relaxed Triangulation, that provides overlay network connectivity to support P2P VEs while dramatically decreasing maintenance overhead by reducing the number of connection changes due to users' insertion and movement.
In P2P-based systems, since global knowledge of the system is not available to the participating peers, the routing decisions should be made based only on peers' local information. In this context, an important feature of Delaunay Triangulation is that it supports greedy routing, whereby a message is successfully delivered to its destination. However, greedy routing over general overlay topologies, including Relaxed Triangulation, might suffer from the local optimum problem where a message gets stuck at a node, called local optimum, unable to reach its final destination. To overcome this shortcoming, we then proposed a new message routing protocol as a connectivity architecture that guarantees message delivery over Relaxed Triangulation using only peers' local knowledge of their neighboring environment. Our solution thus provides guaranteed message delivery over a flexible overlay structure with low maintenance cost.
In P2P-based virtual environments, the efficient distribution and management of avatar and object states remains a highly challenging issue since objects change their states, but less frequently their positions, while avatars frequently change their positions in the VE. The characteristic differences between objects and avatars generally lead P2P VE systems to focus on, and manage only, one of them. We finally addressed this issue, and proposed as a data architecture a twofold approach to data management in P2P-based VEs. Our system combines a structured P2P overlay used for object state management with a triangulation overlay used for avatar state and group membership management. Our system avoids overloading a given manager when avatar clustering occurs by using a flexible overlay for avatar management, while preventing unnecessary entity data transfer between managers by using a more stable assignment of objects to managers independently from avatars' positions in the VE.
Distributed Hash Tables provide a scalable way of indexing shared data items in P2P systems. The scalability of DHTs mainly relies on the mechanism by which the system's load is fairly balanced among all participating nodes. This is usually achieved through a uniform hash function that randomly maps data items to nodes in the DHT. While this scheme provides item-balancing guarantees, it fails in balancing the actual load of the system by (1) making the simplifying assumption that all items are equally popular, i.e., they incur the same query-load on their hosting nodes, and (2) assuming that all nodes have comparable capacities.
We addressed this issue by considering a more practical characterization of a system's load, and proposed a load balancing mechanism that takes into account data (un)popularity and node heterogeneity. The notion of load is first redefined in terms of the number of queries per time frame. Load is then balanced by dynamically adjusting the DHT structure so that it best captures node capacities and query-load distributions over data items. We also evaluated our solution in the context of ring-based DHT structures and showed its performance gain over basic DHT load balancing schemes.