Research

Abstract: Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits

Sharpe Ratio (SR) is a critical parameter in characterizing financial time series as it jointly considers the reward and the volatility of any stock/portfolio through its variance. Deriving online algorithms for optimizing the SR is particularly challenging since even offline policies experience constant regret with respect to the best expert Even-Dar et al. (2006). This paper focuses on optimizing the regularized square SR (RSSR). We consider two settings for the RSSR, regret minimization (RM) and best arm identification (BAI). In this regard, we propose a novel multi-armed bandit (MAB) algorithm for RM called UCB-RSSR for RSSR maximization. We derive a path-dependent concentration bound for the estimate of the RSSR. Based on that, we derive the regret guarantees of UCB-RSSR and show that it evolves as O (log n) for the two-armed bandit case played for a horizon n. We also consider a fixed budget setting for well-known BAI algorithms, i.e., sequential halving and successive rejects, and propose SHVV, SHSR, and SuRSR algorithms. We also derive the upper bound for the error probability of all proposed BAI algorithms. We demonstrate that UCB-RSSR outperforms the only other known SR-optimizing bandit algorithm, U-UCB Cassel et al. (2023). We also establish its efficacy with respect to other benchmarks derived from the GRA-UCB and MVTS algorithms. We further demonstrate the performance of proposed BAI algorithms for multiple different setups. Our research highlights that our proposed algorithms will find extensive applications in risk-aware portfolio management problems. Consequently, our research highlights that our proposed algorithms will find extensive applications in risk-aware portfolio management problems.

Abstract: Variance-optimal Arm Selection : Regret Minimization and Best Arm Identification

This paper focuses on selecting the arm with the highest variance from a set of independent arms. Specifically, we focus on two settings: (i) regret setting, that penalizes the number of pulls of suboptimal arms in terms of variance, and (ii) fixed-budget BAI setting, that evaluates the ability of an algorithm to determine the arm with the highest variance after a fixed number of pulls. We develop a novel online algorithm called UCB-VV for the regret setting and show that its upper bound on regret for bounded rewards evolves as where is the horizon. By deriving the lower bound on the regret, we show that UCB-VV is order optimal. For the fixed budget BAI setting, we propose the SHVV algorithm. We show that the upper bound of the error probability of SHVV evolves as, where represents the complexity of the problem, and this rate matches the corresponding lower bound. We extend the framework from bounded distributions to sub-Gaussian distributions using a novel concentration inequality on the sample variance. Leveraging the same, we derive a concentration inequality for the empirical Sharpe ratio (SR) for sub-Gaussian distributions, which was previously unknown in the literature. Empirical simulations show that UCB-VV consistently outperforms epsilon-greedy across different sub-optimality gaps, though it is surpassed by VTS, which exhibits the lowest regret, albeit lacking in theoretical guarantees. We also illustrate the superior performance of SHVV, for a fixed budget setting under 6 different setups against uniform sampling. Finally, we conduct a case study to empirically evaluate the performance of the UCB-VV and SHVV in call option trading on stocks generated using geometric Brownian motion (GBM).

Publications:

S. Khurshid, M. S. Abdulla, and G. Ghatak, Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits, in Springer's Machine Learning. Preprint here: arXiv
S. Khurshid G. Ghatak, and M. S. Abdulla, Variance-Optimal Arm Selection: Regret Minimization and Best Arm Identification, submitted to IEEE Transactions on Signal Processing. Preprint here: arXiv

Poster Presentation:

S. Khurshid, M. S. Abdulla, and G. Ghatak, "Optimizing Risk-Adjusted Decision-Making: Sharpe Ratio Maximization Bandit" IEEE ISIT-24, Athens, Greece. (Recent Results Session)
S. Khurshid, M. S. Abdulla, and G. Ghatak, "Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits." ACML-24, Hanoi, Vietnam.
S. Khurshid, Mohammad Taha Shah, and G. Ghatak, "Sharpe Ratio-Optimized Thompson Sampling for Risk-Aware Online Learning ." NeurIPS-25, San Diego, USA.

Work In Progress

Abstract: Thompson Sampling for Risk-Adjusted MABs: Optimizing the Sharpe Ratio in Finances

In applications such as finance, the goal is to strike a balance between the risk and the expected return of a portfolio (or arm). This paper incorporates the Sharpe Ratio (SR), a risk-adjusted return measure, into the MAB framework. We propose Thompson Sampling (TS) based algorithms to optimize the SR for two cases of reward distributions: either bounded, specifically, the Bernoulli distribution, or heavy-tailed, specifically, the Pareto distribution, which are common in financial time-series analysis. We rigorously characterize the regret bounds, quantifying the performance compared to an omniscient oracle. Notably, our algorithms achieve state-of-the-art regret guarantees with less stringent assumptions on the reward distributions compared to existing methods like Mean-Variance TS, which requires the variance of the reward distributions to be less than 1, and Conditional Value at Risk (CVaR) TS, which works on bounded distributions only. This translates to better theoretical performance in practical scenarios. Furthermore, we demonstrate the efficacy of our proposed methods for real-world data sets and various synthetic examples. We run experimental results on portfolios created from S&P 500 stocks and show their performance with respect to other existing algorithms focusing on risk-aware reward optimization. This work paves the way for more risk-aware online decision-making for finance within the MAB framework.

Abstract: Financial time-series as Autoregressive model: Sharpe ratio prediction with MAB

An asset return is a collection of random variables across time and yields a financial time series. e.g., the stock price of a single stock for the past 10 days. The returns in financial time series cannot be assumed to be independent. We have models like GARCH, ARMA, and time-varying GARCH to analyze such dynamic structures. However, an important prerequisite for time series analysis is stationarity, i.e., the mean, variance, and covariance of the time series must be time-invariant, or estimation cannot be achieved. We consider AR(1) processes, which model returns from financial time series, and then calculate the AR(1) parameter for all. This is followed by selecting the larger of two parameter values because this will correspond to a greater SR. Modeling random process as AR(1) does not necessarily guarantee favorable returns `SR--> SR. However, this is the inherent risk we must accept when relying solely on how stocks performed historically on average. So, we extrapolate future performance based on historical patterns.

Page updated

Google Sites

Report abuse