I have researched various cutting-edge areas, including Digital Wellbeing, where I studied the impact of smartphone use, problematic internet use (PIU), and psychological traits on sleep quality. I also developed a Distributed Graph Neural Networks Framework for scalable and efficient processing of large-scale graphs. My work on a Distributed Cloud Platform for CCTV Video Big Data Analytics focused on real-time video analysis for security and surveillance. Additionally, I explored Lifestyle Pattern Mining Using Companion Big Data, developed Graph Compression and Summarization techniques, and implemented expert ranking systems for Online Collaborative Social Q&A Platforms. Furthermore, I have extensive experience in complex event analysis in multi-stream environments, where I integrated and analyzed real-time data from diverse sources, providing insights for smart city applications and security. This diverse experience reflects my expertise in distributed systems, big data analytics, and AI-driven solutions for real-world challenges. The detailed description of some of my research projects is listed below.
Principal Investigator: Professor Dr. Raian Ali
Role: Team Member
January 2021 - December 2025
During my second postdoc at HBKU Qatar, I explored a new research area: Digital Wellbeing. An increasing body of research has examined the potential mental health risks associated with problematic internet use. The growing time people spend on smartphones, along with the wide range of activities available on these devices, has led researchers to investigate their impact on users' mental health and well-being. In the literature, smartphone addiction is defined as excessive smartphone use that interferes with daily life priorities, including sleep.
Sleep is crucial for overall health, daily functioning, and performance. It is a particularly important and potentially modifiable health risk in teens. Sleep deprivation has been linked to numerous physical and mental health issues, such as weakened immune response, mental illness, reduced fertility, impaired cognition, hypertension, and hyperglycemia. Excessive smartphone use can disrupt the sleep-wake cycle, potentially contributing to these health problems.
Personality traits are also believed to influence human behavior, particularly in relation to internet and digital device use. Studies have shown that individuals with problematic smartphone use, characterized by difficulty controlling their internet habits, are more likely to delay sleep. Similarly, using smartphones near bedtime can negatively affect sleep quality. The goal of this project was to investigate how personality traits, problematic internet use, and smartphone usage impact sleep quality.
Figure. An abstract view of e-sleep research.
The significant methodological limitation of existing literature in this context was using subjective smartphone usage data. It has been shown that self-reported technology usage does not match actual technology usage. In our work, smartphone usage and sleep parameters were objectively collected from real-world smartphone users.
We investigated the role of objectively recorded smartphone usage and personality traits in sleep quality.
We studied the associations of objectively collected smartphone near-bedtime usage and problematic internet usage with sleep quality parameters.
We investigated whether PT mediates partially or fully the relationship between smartphone usage and sleep quality.
We utilized objective data on smartphone usage and sleep and investigated the mediating relation of smartphone usage between problematic internet usage and sleep quality.
Primarily, our findings can be used to build smart social media and smartphone applications that are more accountable to users’ digital well-being, where personalization is one of the acceptance factors and sleep quality is the main well-being factor.
We suggested that like other domains such as drug abuse, intervention, and preventative methods can be personality-tailored also in the case of smartphone users.
Our findings showed that smartphone users who score low on conscientiousness and high on neuroticism may be recognized as having a considerably higher risk for poor sleep quality. Thus, in this context, additional features can be added to digital well-being apps.
The findings of this project were published in different media channels and online media in Qatar, e.g., https://www.qf.org.qa/ar/stories/op-ed-future-designs-for-digital-wellness.
Under this project, we have successfully published the following journals and two more in the review process.
Alam, Aftab*, Sameha Al-Shakhsi, Dena Al-Thani, and Raian Ali. "Do near-bedtime usage of smartphones and problematic internet usage impact sleep? A study based on objectively recorded usage data." Behavior & Information Technology (2023): 1-16.
Alam, Aftab*, Sameha Alshakhsi, Dena Al-Thani, and Raian Ali. "The role of objectively recorded smartphone usage and personality traits in sleep quality." PeerJ Computer Science 9 (2023): e1261.
Tauseef Ur Rahman, Zahoor Jan, Aftab Alam*, Vasilis Katos, Raian Ali. Problematic Internet Usage, and smartphone use: their effects on sleep quality across groups using mediation analysis (Journal of Behaviour and Information Technology)
Rahman, Tauseef Ur, Zahoor Jan, Aftab Alam*, and Raian Ali. "Smartphone use and personality: Their effects on sleep quality across groups using mediation analysis." Digital Health 10 (2024): 20552076241295797.
Principal Investigator: Professor Young-Koo Lee
Role: Teamlead
January 2021 - December 2025
Sponsor: Institute of Information & Communications Technology Planning & evaluation , South Korea
Deep Learning is effective at detecting hidden patterns in Euclidean data (images, text, videos). But what about applications in which data is provided from non-Euclidean domains and represented as graphs with complex object relationships and interdependencies? That’s where Graph Neural Networks (GNN) come as a significant solution. Because of the implicit data dependence in the graphs, it is hard for industrial communities to exploit these methods to address real-world challenges at scale. Furthermore, the graphs are typically large, containing hundreds of millions of nodes and several billions of edges. The main drawbacks in the existing system are:
They are unable to scale due to memory capacity and bandwidth constraints between graph stores and workers.
They require extra development of graph stores without well exploiting mature infrastructures such as MapReduce that guarantee good system properties.
They focus on training but ignore the optimization of inference over graphs.
This makes them an unintegrated system and also does not provide an open-source abstraction to assist developers. Under this project, we will design a Distributed Graph Learning Library (DGLL) a scalable, fault-tolerance, and integrated system, with fully functional training and inference for GMLs and GNNs. Our system will follow a layered design architecture and consist of three main layers, i.e., Graph Data Layer. Graph Optimization Layer, Graph Machine Learning Layer. The first layer will utilize a state-of-the-art distributed file system and a scalable graph data store. The second layer will provide distributed graph processing using graph programming models while optimizing and hiding the underlying complexity of advanced message passing, graph partitioning, and graph sampling techniques. Finally, we will develop graph learning and graph neural network modules on top of the second layer for fast and memory-efficient training and inference.
Figure. An abstract view of the proposed GDLL framework. The GDLL framework’s first layer, GDL, is responsible for graph data management throughout the training lifecycle. In step-1, the raw data are persisted into the RAW DS, then mapped into graph format for storing in Graph DS (Step-3). The GOL then loads the Graph in distributed in-memory to compute the subgraph (Step-4). These subgraphs are then indexed in the FHDFS (Step-5) and then retrieved by the GLL from FHDFS for distributed training (Step-6).
We proposed a scale-out, share-nothing architecture-based distributed GNN framework.
We proposed the notion of F-HDFS to improve the read-write performance of HDFS in the context of k-hop-based large-scale GNNs training.
We addressed issues of distributed k-hop-based GNNs training, i.e., varying size subgraph while assuming parameter server
This work is one of the initial studies that demonstrated how to develop industrial-scale GNN system.
The notion of the optimized version of HDFS, i.e., K-HDFS can be utilized by industrial scale GNN system for efficient subgraph reading and writing during GNN training.
Under this project, we have successfully published the following paper.
Van, D. T. T., Khan, M. N., Afridi, T. H., Ullah, I., Alam, A., & Lee, Y. K. (2022). GDLL: A Scalable and Share Nothing Architecture Based Distributed Graph Neural Networks Framework. IEEE Access, 10, 21684-21700
Principal Investigator: Professor Young-Koo Lee
Role: Team lead
January 2017 - December 2021
Sponsor: Institute of Information & Communications Technology Planning & evaluation , South Korea
During Ph.D., I was supposed to work in the domain of shared-nothing architecture-based distributed computing solutions for big data analytics in the cloud, i.e., designing industrial-scale distributed computing architectures for big data analytics in the cloud. Subsequently, addressing the research issues and challenges being identified in the designed video big data system. I proposed a reference architecture for video big data analytics in the cloud called Cloud-based Video Analytics System (CVAS) and a distributed data curation framework called TORNADO for video big data analytics in the cloud. An abstract view of the proposed CVAS and TORNADO, an abstract overview, and a short description are given bellow.
Figure 1. Lambda CVAS: A reference architecture for real-time and batch intelligent video analytics in the cloud. We propose a distributed, layered, service-oriented, and lambda-style-inspired reference architecture for large-scale intelligent video analytics in the cloud. The base layer of the proposed architecture, i.e., the video big data curation layer, is based on the notion of Intermediate Results (IR) orchestration, which can play a significant role in the optimization of the intelligent video analytics pipeline. The Video Data Processing Layer (VDPL) is in charge of pre-processing and extracting the significant features from the raw videos and input to the Video Data Mining Layer (VDML). The VDML is accountable for producing high-level semantic results from the features generated by the VDPL. The Knowledge Curation Layer (KCL) deploys video ontology and creates knowledge based on the extracted higher-level features obtained from VDML. In the application scenario, the VDPL, VDML, and KCL are pipelined in a specific context and become an (Intelligent Video Analytics) IVA service to which TORNADO users can subscribe video data sources under the IVAaaS paradigm.
Figure 2: TORNADO is composed of six components, i.e., Real-time Video Stream Acquisition and Synchronization (RVSAS), Immediate Structured Big Data Store (ISBDS), Distributed Persistent Big Data Storage (DPBDS), ISBDS Representation and Mapping (ISBRM), TORNADO Business Logic, and TORNADO Web Services. The RVSAS component provides interfaces and acquires large-scale video streams from device-independent video stream sources for on-the-fly processing. The video stream sources are synchronized based on the user identification and the timestamp of the video stream generation. Then, it is queued in the form of mini-batches in a distributed stream buffer for RIVA. The RIVA may vary in the context of a business domain, cross-linked with the video stream sources and user’s profile. RIVA services are deployed in a cluster of computers for distributed video stream processing to extract the value for contextual decision-making. The video stream queued in the form of a mini-batch can be accessed while using Video Stream Consumer Services (VSCS). During RIVA service pipelining, the IR is maintained through the Intermediate Results Manager (IR-Manager). Similarly, RVSAS is equipped with a Lifelong Video Stream Monitor (LVSM) to provide a push-based notification response to the client with the help of a publish-subscribe mechanism. The extracted values (features and anomalies) and the actual video streams are then persisted into ISBDS and DPBDS, respectively.
We proposed a distributed, layered, service-oriented, and lambda style [29] inspired reference architecture for large-scale IVA in the cloud. Each layer of the proposed Cloud-based Video Analytics System (CVAS) has been elaborated technologically, i.e., layerwise available big data technological alternatives.
We developed and optimized high-level abstraction on top of big data technologies for video big data analytics that hides the underlying complexity of the big data stack.
Effectively resolved the data curation issues throughout the life cycle of the machine learning-based video analytics pipeline by developing distributed data management modules both for real-time and offline analytics.
Under the proposed TORNODADO, we proposed a unified scale-out middleware called IR-Middleware against issues like intermediate results dimensionality management (high-level and low-level features), and machine learning pipeline orchestration and optimization in the cloud.
Under this project, we propose IntelliBVR, a knowledge curation framework for video big data intelligent search, retrieval, and complex event analysis.
We propose a distributed, layered, service-oriented, and lambda-style-inspired reference architecture for large-scale real-time and offline video analytics in the cloud. Under the proposed architecture, we perform a thorough investigation of scalable traditional video analytics and deep learning techniques and tools on distributed infrastructures along with popular computer vision benchmark datasets. This work has helped and will help industries, researchers, and practitioners in designing such systems.
The proposed IR-Middleware will help how to optimize the machine learning pipeline in the cloud and has an impact both on the processing time and on the resource’s utilization in the cloud. The IR-Middleware could be utilized by industrial solutions.
The proposed IntelliBVR knowledge model will help the research community how to exploit the semantic web technologies for complex events analysis in a multi-video stream environment.
For video big data analytics in the cloud higher-level abstraction was developed on top of distributed frameworks, i.e., Apache Spark, Kafka, HDFS, and HBASE. The developed higher-level APIs were made available over Github for research/practitioners.
Further, under this project, we identify the research gap and list several open research issues and challenges for the research community. Such issues are summarized in the following table.
Under this project, we have successfully published the following journals and conferences.
Aftab Alam, Irfan Ullah, and Young-Koo Lee. "Video Big Data Analytics in the Cloud: A Reference Architecture, Survey, Opportunities, and Open Research Issues." IEEE Access 8 (2020): 152377-152422.
Aftab Alam, and Young-Koo Lee. "TORNADO: Intermediate results orchestration-based service-oriented data curation framework for intelligent video big data analytics in the cloud." Sensors 20.12 (2020): 3581.
Uddin, Md Azher, Aftab Alam, Nguyen Anh Tu, Md Siyamul Islam, and Young-Koo Lee. "SIAT: A distributed video analytics framework for intelligent video surveillance." Symmetry 11, no. 7 (2019): 911.
Khan, Muhammad Numan, Aftab Alam, and Young-Koo Lee. "FALKON: Large-Scale Content-Based Video Retrieval Utilizing Deep-Features and Distributed In-memory Computing." In 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 36-43. IEEE, 2020.
Alam, Aftab, Muhammad Numan Khan, Jawad Khan, and Young-Koo Lee. "IntelliBVR-Intelligent large-scale video retrieval for objects and events utilizing distributed deep-learning and semantic approaches." In 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 28-35. IEEE, 2020.
Alam, Aftab, et al. "Architecture for Intelligent CCTV Cloud Platform." 한국정보과학회 학술발표논문집 (2016): 152-154.
Alam, Aftab, et al. "Distributed Data Model for Managing Large-Scale Video Data in the Cloud Using Hadoop-HBase." 한국정보과학회 학술발표논문집 (2017): 79-81.
Principal Investigator: Professor Young-Koo Lee
Role: Team member
March 2015 - March 2020
Sponsor: Institute of Information & Communications Technology Planning & evaluation, South Korea
Individuals use smartphones for a variety of purposes like photography, schedule planning, playing games, and so on, apart from benefiting from the core tasks of call-making and short messaging. These services are sources of personal data generation. Therefore, any application that utilizes the personal data of a user from his/her smartphone is truly a great witness of his/her interests and this information can be used for various personalized services. Under this project, we were supposed to develop a personalized application for mining the lifestyle patterns of a smartphone user. The proposed app uses the personal photograph collections of a user, which reflect the day-to-day photos taken by a smartphone, to recognize scenes (called objects of interest in our work). These are then mined to discover lifestyle patterns. Modeling data in the form of graphs is effective in preserving the lifestyle behavior maintained over time. Graph-modelled lifestyle data enabled us to apply a variety of graph-mining techniques for pattern discovery. To demonstrate the effectiveness of our proposal, we have developed a prototype system for LPaMI to implement its end-to-end pipeline. We have also conducted an extensive evaluation for various phases of LPaMI using different real-world datasets. We understand that the output of LPaMI can be utilized for a variety of pattern discovery application areas like trip and food recommendations, shopping, and so on. The outcome of this project was:
Figure. Pipeline of Lifestyle Pattern Mining from personal image collections in a smartphone.
Principal Investigator: Professor Young-Koo Lee
Role: Team member
March 2016 - March 2018
Sponsor: National Research Foundation of Korea (NRF)
Large graphs such as social networks, web graphs, and biological networks, are complex and face the challenges of processing and visualization. Motivated by such issues, Taivonen et al. [1] proposed models and sequential algorithms for weighted graphs with the intention to generate a candidate compress graph. The proposed compression algorithm is expensive in terms of computation time because of the sequential process. The weighted graph compression algorithms can be made faster while adopting parallel processing techniques. In this paper, we adopt a parallel processing technique for the weighted graph compression problem while using multi-selection nodes to perform a merge-able technique with various graph clustering algorithms to avoid overlapping between nodes from different threads. For the performance evaluation purposes of the proposed method, we carry out a series of tests on the real networks. We perform extensive experiments on parallel graph summarization while using different graph clustering algorithms. Our results demonstrate their effectiveness for parallel graph compression on real networks.
Supervisor: Professor Dr. Shah Khusro
Role: Lead
Duration: March 2010 - March 2014
My first research project focused on developing, researching, and proposing a user reputation model for social question-and-answer (SQA) systems. I introduced a dynamic points-based user reputation model (as shown in the given equation), which leverages the graph structure of the SQA community and takes user ratings and social network analysis as input. The impact weight of each relationship and user ratings depends on the current levels of both the asker and answerer and the difficulty level of the question.
The static nature of traditional points-based user reputation models creates a situation where equal points are assigned to all users, regardless of their differing abilities and expertise. As a result, these models fail to account for factors such as question difficulty, asker level, and answerer level. The development of the proposed system, which integrates social networking, social Q&A, and our dynamic user reputation model, addresses some of these shortcomings, leading to increased user satisfaction. The novel and key elements of the proposed model include using a dynamic points-based system and the combination of two approaches for calculating user reputation: social network analysis and user ratings. Social network analysis evaluates the impact of relationships between participating users, while user ratings provide a direct assessment of a user's reputation based on their previous experience.
The outstanding points of this research work compared with peers are:
Compared with peers, we have put forward a novel social community structure and dynamic points-based user reputation model that accounts for the difficulty level of the question and the expertise of the asker and answerer while performing system-bounded activities.
Research impact on international development
Under this project, we investigated relevant literature, and analysed several state-of-the-art user reputation models, while identifying numerous open research issues and challenges for the research community.
The proposed model is impactful and can play a significant role in ranking users in online social question-and-answer systems.
Achievements
Under this work, the following journals have been published.
Aftab Alam, Shah Khusro, Irfan Ullah, and Muhammad Shuaib Karim. "Confluence of social network, social question and answering community, and user reputation model for information seeking and experts generation." Journal of Information Science 43, no. 2 (2017): 260-274.
Khusro, Shah, Aftab Alam, and Shah Khalid. "Social question and answer sites: the story so far." Program (2017).