The recent success of AI / Machine Learning is largely due to the availability of Big data and the infrastructure to scale computation. As a result, the integration of AI / Machine Learning and Big data management techniques becomes inevitable. The Data Intelligence lab performs research on this integration in both directions. First, we explore Machine Learning techniques that can improve Big data management. Second, we explore Big data management techniques that are needed throughout a Machine Learning lifecycle. Here is a short introduction on Big Data and AI integration topics that we are currently interested in.

Large-scale Data Labeling

As Deep Learning becomes more prevalent, the bottleneck of Machine Learning shifts from feature engineering to data collection. We investigate large-scale (semi-)automatic data labeling techniques (e.g., data programming) that are suitable for today's Machine Learning application needs.

Automatic and Actionable Model Analysis

As Machine Learning is used more widely, analyzing a trained model can be a daunting task for users who do not have expertise in Machine Learning or engineering. We investigate techniques for automatically identifying problematic data slices where models perform poorly and providing concrete action items.