A central challenge in modern AI is that large language models are incredibly expensive to run. My recent work focuses on designing and building efficient systems for LLM inference. I develop prediction-augmented algorithms that optimize for latency, throughput, and cost. My research also explores model-informed systems, where we use internal signals from the model itself to create a feedback loop that allows the serving system to optimize its own execution in real time.
Selected Publications:
Fast Inference for Augmented Large Language Models [openreview] (NeurIPS 2025).
Rana Shahout, Cong Liang, Shiji Xin, Qianru Lao, Yong Cui, Minlan Yu, Michael Mitzenmacher.
Don't Stop Me Now: Embedding Based Scheduling for LLMs (ICLR 2025) [openreview]
Rana Shahout, Eran Malach, Chunwei Liu, Weifan Jiang, Minlan Yu, Michael Mitzenmacher.
Queueing, Predictions, and Large Language Models: Challenges and Open Problems (Stochastic Systems 2025) [paper]
Michael Mitzenmacher, Rana Shahout.
Faster, Cheaper, Just as Good: Cost- and Latency-Constrained Routing for LLMs (ICLR workshop 2025) [openreview].
Javid Lakha, Minlan Yu and Rana Shahout.
Prefix and Output Length-Aware Scheduling for Efficient Online LLM Inference (ICLR workshop 2025) [openreview]
Iñaki Arango, Ayush Noori, Yepeng Huang, Rana Shahout and Minlan Yu.
Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing (CIDR 2025).
Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baile Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Rana Shahout, Gerardo Vitagliano.
Preprints:
From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing [arXiv]
Rana Shahout, Colin Cai, Yilun Du, Minlan Yu, Michael Mitzenmacher.
Intra-Request Branch Orchestration for Efficient LLM Reasoning [arXiv]
Weifan Jiang, Rana Shahout, Yilun Du, Michael Mitzenmacher, Minlan Yu.
Classical algorithms often assume a worst-case world and do not adapt to patterns in real-world data. My work explores how predictions from machine learning models can be used to enhance the performance of traditional algorithms. I design learning-augmented algorithms for fundamental problems in streaming and scheduling that preserve theoretical guarantees while adapting to workload dynamics and remaining robust to prediction errors.
Selected Publications:
SkipPredict: When to Invest in Predictions for Scheduling (NeurIPS 2024)[openreview]
Rana Shahout, Michael Mitzenmacher.
Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams (IEEE ICNP 2024) [IEEE]
Rana Shahout, Michael Mitzenmacher
Learning-Augmented Frequency Estimation in Sliding Windows (IEEE ICNP workshop 2024) [IEEE]
Rana Shahout, Ibrahim Sabek, Michael Mitzenmacher.
Modern data-intensive systems must process massive, continuously arriving data streams under tight resource constraints. My doctoral research focused on developing core methods for analyzing these streams when storing all the data is infeasible. I designed compact data structures called sketches to efficiently handle refined queries (e.g., over specific time intervals or data subsets) and to optimize memory consumption in fluctuating environments.
Selected Publications:
Distributed Recoverable Sketches (OPODIS 2024) [paper]
Diana Cohen, Roy Friedman, Rana Shahout
Geometric Sketch: an Inflatable and Shrinkable Sketch (AINA 2024) [paper]
Dvir Biton, Roy Friedman, Rana Shahout.
Sketching the Path to Efficiency: Lightweight Learned Cache Replacement (OPODIS 2023) [paper]
Rana Shahout, Roy Friedman
Together is Better: Heavy Hitters Quantile Estimation (ACM SIGMOD 2023) [ACM]
Rana Shahout, Roy Friedman, Ran Ben Basat
Box Queries over Multidimensional Streams (Information Systems 2022)
Roy Friedman, Rana Shahout*
* The authors are listed alphabetically
CELL: Counter Estimation for Per-flow Traffic in Streams and Sliding Windows (ICNP 2021) [IEEE]
Rana Shahout, Roy Friedman, Dolev Adas
Box Queries over Multidimensional Streams (DEBS 2021). [ACM]
Roy Friedman, Rana Shahout*
* The authors are listed alphabetically
Stream Frequency Over Interval Queries (VLDB 2019). [VLDB]
Ran Ben-Basat, Roy Friedman, Rana Shahout*
* The authors are listed alphabetically
Frequent elements on query defined ranges (INFOCOM 2018) [IEEE]
Ran Ben Basat, Rana Shahout, Roy Friedman