My Ph.D. research focused on developing contextual bandit (CB) algorithms for adaptive experimentation, often leveraging tools from causal inference and machine learning. Many of the questions we studied arose from the challenges we encountered when deploying contextual bandits for a charitable giving setting [arXiv]. At a high level, CB algorithms effectively describe an entire machine learning pipeline, from regressing rewards and tracking estimation quality to quantifying uncertainty and collecting valuable data while not hurting user experience. We study questions at all steps of this process.
What loss should the regression step in CB algorithms minimize? Currently, most tractable contextual bandit algorithms rely on squared loss minimization to estimate expected rewards at each context. We argue that for decision-making, we only care about estimating heterogeneous treatment effects (HTEs). Based on this intuition, we show clear theoretical (and some empirical) benefits of HTE estimation with R-loss minimization over squared-loss minimization. [AISTATS 2023]
How good is the above-estimated model? This often depends on the bias and variance of the class used for estimation. Since bias is unknown, prior work has often assumed it to be zero. We develop methods that account for model class bias [AISTATS 2021, ICML 2021] and are also developing procedures for model selection [AISTATS 2024]. Building on these insights, we developed a (surprisingly simple) data-driven approach to estimate model quality without requiring class complexity as input [arXiv].
How do we quantify uncertainty at any context? Under a continuous context distribution, the probability of observing any specific context is zero. Hence it is statistically impossible to construct a valid non-trivial confidence interval at every context. To overcome this issue, prior work (e.g., LinUCB) assumes the true conditional expected reward for each action is a sufficiently well-structured function (e.g., linear) of the contexts. Since for arbitrary function classes, UCB-style confidence intervals can be unhelpfully wide. This unnecessarily restricts the function class we work with.
We take a different approach, proposing a new form of uncertainty quantification that doesn't require the above assumptions! We develop conformal arm sets (CASs), which describe a set of arms at each context for a given risk level. Similar to conformal prediction, CASs may not contain the optimal arm at each context but contain the optimal arm for a fraction (depending on the associated risk level) of randomly sampled contexts. [NeurIPS 2023]
What context-dependent distribution over actions do we use to collect more data? Depending on your goal (e.g. simple/cumulative regret minimization), we describe a family of algorithms that achieve near-optimal guarantees using the above uncertainty quantification. [NeurIPS 2023]
My early research work was with Siddharth Barman in the area of Algorithmic Game Theory and Computational Social Choice. In particular, I worked on algorithms for fair decision-making in multi-agent resource allocation. Developing state-of-the-art algorithms to search over the space of fair/efficient allocations in a computationally efficient manner. It has been very rewarding to see our results/techniques taught in tutorials (e.g., [1], [2]) and used in several follow-up papers.