Developing relevant infrastructure and systems that will support advances in AI/ML and improve AI capability.
We research ways to improve the computing infrastructure for machine learning (ML) computations. Such improvements include ways to make training and inference computations less data-intensive and energy-consuming, and also more secure and trustworthy. We strive for these goals to be met without compromising the traditional goals of ML systems, namely the quality of the outputs or the performance of the computations. For example, one of our projects, HyperJump, is developing ways to significantly accelerate the hyperparameter optimization process by integrating a risk modelling component to predict and skip over configurations that are unlikely to perform well, achieving several fold speed improvements in various deep learning and kernel-based learning tasks compared to traditional methods. This advancement allows for more efficient model training, making it particularly beneficial in scenarios with limited computational capacity.
Key researchers: Rodrigo Rodrigues, Paolo Romano
Several members of our Unit use ML as a tool to improve the way that the systems that comprise today's computing infrastructure are built. Examples include improving today's cloud infrastructure by using ML algorithms to estimate the performance and monetary cost of a certain computation that a user wishes to outsource when running on different hardware platforms or system configurations (e.g., varying the processor type, number of cores, or amount of RAM that is used), paving the way for a new interface to cloud systems where users are freed from having to reason about low-level resource management.
Key researchers: Rodrigo Rodrigues, Paolo Romano