Research

Current Projects

Smart Agent-Based Modeling

Smart agents are intelligent, adaptive, and computational entities. While humans are the canonical smart agents, the advent of foundation models - imbued with remarkable language, vision, and reasoning abilities that emulate human behavior - enables us to expand the concept of smart agents to agent-based modeling (ABM). This evolution leads to the introduction of smart agent-based modeling (SABM). Unlike traditional ABM, SABM incorporates foundation models as agents and formulates models using natural language. We employ SABM to investigate natural processes across various fields such as economics and behavioral science. We believe that SABM offers a more nuanced and realistic approach to enhancing our comprehension of natural systems.

Selected Publications

Data Lake Management

With the trends of open data movements by governments and the dissemination of data lake solutions in industries, we are provided with more opportunities to obtain a huge number of tables from data lakes and make use of them to enrich our local data. We study several fundamental problems of data lake management, including data cleaning, data integration, and data augmentation. For example, our solutions are able to identify and suggest useful attributes to data science engineers. 

Selected Publications

Data Science Methods for Digital Humanities

Data science methods, leveraging computational techniques and analytical tools, are increasingly vital in digital humanities, spanning disciplines such as sociology, psychology, history, and education. By employing statistical analysis, machine learning, and data visualization, researchers can uncover patterns and insights in large datasets, ranging from historical documents to social media trends. This interdisciplinary approach enables a deeper analysis of cultural, social, and historical phenomena, transforming traditional humanities research into a more dynamic and data-driven field.

Selected Publications

Similarity Query Processing

A similarity query is to find similar objects in one or more datasets. It is an important operation in many applications, such as entity matching, plagiarism detection, and image retrieval. We target a variety of data types (sets, strings, high-dimensional data, etc.) and develop efficient query processing methods.  

Selected Publications

- Generic Algorithms -

- High-Dimensional Data -

- Sets -

- Strings -

Past Projects

Trajectory Analysis in Road Networks

Querying and retrieving moving object trajectories in road networks is becoming important as they are key data in modern data-driven automotive applications, such as autonomous cars, cloud car navigation systems, and intelligent transportation systems. Our goal is to address fundamental problems of trajectory analysis in road networks.  

Selected Publications

Query Autocompletion

Autocompletion is an interactive feature that automatically completes an input while reducing the typing effort. It has been utilized in search engines, input methods (IMEs), integrated development environments (IDEs), and mobile applications. We develop novel autocompletion techniques that delivers high quality suggestions in an efficient way for various online services.  

Selected Publications

Graph Structural Search

Graphs are widely used to model complex data in many applications, such as bioinformatics, chemistry, social networks, and pattern recognition. A fundamental and critical query primitive is to search structures in a large collection of graphs. We develop efficient methods to process advanced structural search in graph databases.  

Selected Publications