The recent success of AI / Machine Learning is largely due to the availability of Big data and the infrastructure to scale computation. As a result, the integration of Big data and AI becomes inevitable. The Data Intelligence lab performs research on this integration in both directions. First, we explore Machine Learning techniques that can improve Big data management. Second, we explore Big data management techniques that are needed throughout a Machine Learning lifecycle.

A key application for Big data - AI integration is manufacturing where "smart" factories are becoming more automated and reliant on machine learning for product quality. Interestingly, the more automated a factory becomes, the more data it produces. We believe our research is on a sweet spot where analyzing this data becomes increasingly critical as well.

Here is an general introduction on our current research.

Large-scale Data Collection

As Deep Learning becomes more prevalent, the bottleneck of Machine Learning shifts from feature engineering to data collection. We investigate large-scale (semi-)automatic data labeling techniques (e.g., data programming) that are suitable for today's Machine Learning application needs.


Automatic and Actionable Model Analysis

As Machine Learning is used more widely, analyzing a trained model can be a daunting task for users who do not have expertise in Machine Learning or engineering. We investigate techniques for automatically identifying problematic data slices where models perform poorly and providing concrete action items.