Extremely Scalable In-Memory Platform Supporting Rich Semantics

One of the main themes of our research group, in-memory database, began in the early 2000s with the development of P*TIME (Parallel* Transact In-Memory Engine), an enterprise database of Transact In Memory, a Silicon Valley company which Prof. Cha founded.

Transact In Memory was merged with SAP, the best global business S/W company in 2005, and in-memory technologies of P*TIME served as a corner stone of developing SAP HANA, the first distributed enterprise in-memory database system enabling real-time analytics over transactionally integrated row and column stores. Today, SAP and many other companies run ERP, CRM, business warehouse on HANA.

Based on our existing high-speed big data processing and analysis technology, we are currently researching and developing the beyond traditional database.

The main topics are as follows:

  • Tight integration of AI features and in-memory database

  • Efficient graph data distribution management and analysis


Related Projects
Genome Scale Protein Structure Modeling
Multi-Agent System Problems on Edge AI Network


Specification

Extremely Scalable In-Memory Platform

We are researching and developing a new era platform that goes beyond the limitations of traditional in-memory databases and big data platforms.

The rich experience associated with in-memory data management accumulated by the success of SAP HANA is evolving to meet the new requirements of the times.

We aim at a platform that can accommodate a variety of data semantics, and the main features are as follows.

  • Provides native storage layout optimized for each data semantic. This not only performs efficient semantic algorithms but also enables them to take advantage of appropriate compression.

  • Various semantic algorithms and libraries are used in data exploration, analysis, preprocessing, ML/DL model training, etc.

  • Distributed platforms provide a scalable environment by optimization of data partitioning and placement.


Distributed Deep Neural Network Computation on Large Graphs

Graph Neural Network has become prevalent and widely applied to many real-world applications such as recommendation systems, social graphs, and so on.

However, current graph computation frameworks are either only support traditional iterative algorithms like PageRank or not capable of executing in distributed environments.

We aim at bridging the distributed computing and graph computation by advancing graph partitioning methods with our powerful in-memory storage engine.


Unified Version Management for Heterogeneous Semantics

Data comes in many forms depending on its own semantic, and is updated over time.

Our platform provides a way to integrate and manage versions of these disparate semantic data.

Data-stores for containing heterogeneous semantic data were implemented using a data structure that provides a basic API for data management.

This data structure(a) provides a generalized version management and logging/recovery method.

This bottom-up implementation of the data-store ensures lightweight management of various semantic data and the scalability of the platform's functionality.

Tight Integration ML/DL with In-memory Data Management

We embrace both the advantages of C++ as a low-level language and the rich libraries provided by the Python community.

Seamless integration of C++-implemented engines and Python modules not only provides users with an flexible application development environment, but also minimizes duplication of data in pipelines up to data collection, preprocessing and ML/DL model development and enables applying real-time data.