Research Area
Research Area
Our vision is to eventually reach the state of full autonomic computing through the development of AI-based system management techniques. It is extremely challenging to manage large-scale distributed systems and cloud systems to achieve a high degree of availability. Researchers have tried for a long time to advance this field. Now that we have a set of advanced AI techniques, it is the perfect time for systems researchers to apply them to solve many previously difficult problems in system management. We are getting closer every day toward the state of autonomic computing!
In this topic, we try to quickly find out the root cause of the errors, failures, or anomalies of target applications and fix them. This requires sophisticated data collection and data analytics using state-of-the-art AI techniques. We also perform deep monitoring of various data processing applications to understand observed internal behaviors and to explain why they behave in such ways. Through these lines of work, we aim to advance the Self-Healing and Self-Configuration capabilities of Autonomic Computing. These are some of the specific topics we work on.
Searching for the root cause and solutions from online forums using multi-modal heterogeneous DL (Deep Learning) models
Detecting and fixing mis-configurations
Understanding the performance characteristics of eBPF monitoring technique
Introspection of NoSQL database to determine the case of slow performances
NoSQL database performance problem diagnosis using system call traces
NoSQL database shared resource accounting
Causal execution path discovery
Composition of performance models using execution path and resource usage information
Log analysis is an important technique for understanding the behavior of modern, increasingly complex distributed systems, diagnosing the problem and finding the root cause of the problems. As distributed systems grow in size and complexity, the volume of logs has grown beyond human-consumable capacity, and it is impossible to analyze them without employing automated techniques. This topic aims to build techniques to analyze logs to identify important hidden information in huge amounts of log data near real-time and apply them to learn the build-up of anomalies or problems. We are working on the following exciting topics.
Log-based root cause analysis
Log-based misconfiguration detection
Log template discovery problem
Static log template extraction
Log-based execution model building
This topic covers the security aspect of containers. We try to improve the security strength of container by observing system calls and detecting the attacks. We are also designing better ways to build seccomp policies for containers. These are tied to the Self-Protection component of the Autonomic Computing.
Container system call exposure quantification
System call analysis of exploit codes
Secure container runtime behavior analysis
Container image vulnerability analysis
HPC container comparison
Container monitoring for security enhancement
We are interested in the performance aspect of distributed systems and distributed applications, specifically for the class of applications we call 'data systems'. Data systems refer to computing systems for supporting data processing/analysis operations. It includes traditional databases, data warehouses, and a distributed big data processing platform. This topic cover the Self-Optimization component of the Autonomic Computing concept.
Data Analysis Performance
Data Analysis Accuracy
Performance Measurements
Monitoring for Performance
The goal of this project is to develop a technique that can significantly improve the query execution time for the IoT streaming query. Previous studies have focused on performing data sampling in the fog and the query execution in the cloud. Our approach is to perform both the data sampling and the query execution within the fog so that almost all traffic to the cloud is eliminated. This can significantly shorten the query latency as it removes the need to travel all the way to the cloud from the sensor data source.
We are formulating this problem as an optimization problem in which we are trying to find the optimal nodes to perform the sampling as well as the query execution. This search space is huge, thus, we need to develop intelligent heuristics to reduce the search space.