Scaling vector databases to multi-GPU systems

Project 2024-25

Project Title: Efficient Vector Databases for Large-Scale Datasets

Professor: Marco Serafini

Lab/Research Group: DREAM Lab

Project Description

This project provides a practical foundation for understanding and solving real-world challenges in the domain of large-scale data management and distributed databases.

This project focuses on the development of techniques to manage and query large datasets using vector databases. The aim is to explore methods that enable vector databases to efficiently handle data volumes that exceed the main memory capacity of a single server.

Learning Objectives:

Students will gain hands-on experience with distributed systems, vector databases, and large-scale data management. By the end of the project, they will have developed a prototype distributed vector database capable of managing datasets beyond the memory limits of a single server while maintaining high query performance and reliability.

Skills needed:

- Understanding of vector databases and their applications

- Knowledge of distributed systems and data partitioning techniques

- Experience with parallel computing and network communication

- Hands-on practice with large-scale data processing and optimization techniques

Report abuse