The DREAM Lab focuses on diverse topics, including probabilistic databases, privacy-preserving data analysis, mining and analysis of social networks and graph data, secure database architectures, database auditing, data stream processing, sensor data management, flash-based database management, provenance, causality, reverse data management, diversity and fairness, among others.
Project Description
Vector indexes are commonly used in information retrieval, relational databases, and generative AI to efficiently select semantically relevant text passages or images using natural language queries based on embedding vectors generated by AI models. Distributed vector indexes can scale storage to very large datasets, but they do not scale search throughput as effectively because servers search their local shards without coordination with other servers. This project will simulate a distributed vector index that adaptively coordinates search based on global intermediate search results.
Learning Objectives:
Understand how to use AI to search through unstructured data.
Simulate a distributed vector index and measure its performance.
Analyze, interpret, and discuss the results.
Skills to learn:
Knowledge of programming; basic knowledge of C/C+