Recommender systems work by filtering the content on digital platforms, providing personalised suggestions for each user. The central challenge is to learn each user’s tastes from the feedback they give on consumed items, usually in the form of ratings. My postdoctoral research explores whether preferences can be learned from a different mode of feedback: comparisons. Such data is both more abundant—a user's click can be seen as a preference for one item over others—and often more informative, providing fine-grained orderings that are obfuscated in discrete ratings. Within the learning-from-comparisons framework, I have worked on both the offline and online recommender systems.
Given that each user only compares a few pairs of items, is it possible to learn personalised preferences for each user? Inspired by prior work on matrix completion, we adopt a low-rank utility model and combine it with the Bradley-Terry model, where users choose the item with higher (noisy) utility. The resulting maximum likelihood problem is non-convex, making it challenging to solve. Despite this challenge, we prove that applying gradient descent to the loss function recovers the true parameters. We show this by first establishing structural properties for the infinite data case, and then, using the matrix Bernstein inequality, extend the analysis to a finite sample setting. The main takeaway from this work is that learning from comparisons is practically viable in recommender systems. This paper was accepted in ICML 2025; you can find the preprint here.
The aforementioned work does not shed light on how to effectively collect comparison data. In practical systems, the recommender engine also has the flexibility to choose which comparison questions to ask from the user. In follow-up work, we develop a new contextual bandit model for such a recommender systems. Every time the user consumes a new item, the platform asks them to compare it with one that the user has consumed in the past. This choosing-from-history model imposes constraints on the choice queries, complicating the analysis. We design a computationally efficient algorithm for this model and prove performance guarantees comparable to state-of-the-art bandit algorithms. The critical step of our analysis is to show the following: even with a small amount of exploration, the user’s prior experience is rich enough to make meaningful comparisons. Thus, our theoretically-derived algorithm is intuitive and interpretable. This work is under review.
My earlier work on learning from comparison data assumes that humans make choices based on the Bradley-Terry model. My current research critically examines this assumption. In our UAI 2025 paper, we collected human response data through online surveys. Using a custom-built statistical test, we demonstrated for the first time that humans do not make similarity judgements according to this model. I am currently working on designing models that capture context effects—the systematic distortion of perceived utility, based on the set of items offered or prior experiences. These richer choice models can potentially capture human preferences much better, as they capture effects that are simply ascribed to noise in simpler models. Next, I aim to test these new models on the task of LLM alignment via human preference data, where popular methods rely on the Bradley-Terry model.
Blockchains are peer-to-peer systems that maintain an ever-growing, tamper-proof ledger. This ledger forms the basis of many applications, ranging from cryptocurrencies to NFTs. All blockchains are powered by an underlying consensus protocol, using which the peers stay in agreement about the ledger. The design and analysis of secure, efficient consensus protocols (layer 1) has been an active area of research recently. In parallel, there is ongoing research into developing tools that can help blockchains scale to the speed and volume required of today's web-based services (layer 2). Last, but not the least, developers are actively building application platforms suited for blockchains (layer 3). My research has spanned all three layers.
A major thrust of my research was on analyzing the security of blockchains under adverse network conditions. In particular, I focused on the longest-chain protocol, the backbone of cryptocurrencies like Bitcoin and Ethereum.
My first work on this theme studies the longest-chain protocol under a communication model where delays are random, i.i.d., and potentially unbounded. This work establishes the protocol's security under more general and realistic network conditions.
My second work outlines the design of an add-on gadget that can reinforce the security of the longest-chain protocol under arbitrary network delays. The work reveals nuances in the blockchain CAP theorem, an impossibility result in distributed consensus.
I have also worked towards improving the efficiency of blockchains. My focus has been on the following two directions:
Improving storage efficiency: A stateless client, built using cryptographic accumulators, vastly reduces the storage requirement of blockchain users without compromising in security. Our work highlights a principled design for such clients and improves the efficiency of an open-source project, UTREEXO.
Improving transaction throughput: Payment channel networks are a layer-2 solution to improve the transaction throughput and reduce fees on blockchains. Currently, I am developing new routing and pricing protocols in payment channel networks. I'm also studying fundamental limits of routing in such networks.
On the applied front, I have worked on two projects, both of which arise in the context of supply chain management. In both these projects, a blockchain serves as a replacement for a trusted intermediary among distrustful or competing agents.
Collaborative shipping: We built a platform which helps individual shippers bundle together shipments traveling over the same segment.
Inventory pooling: We designed a multi-agent reinforcement learning framework for independent businesses to control a joint inventory.