Sharpe Ratio (SR) is a critical parameter in characterizing financial time series as it jointly considers the reward and the volatility of any stock/portfolio through its variance. Deriving online algorithms for optimizing the SR is particularly challenging since even offline policies experience constant regret with respect to the best expert Even-Dar et al. (2006). This paper focuses on optimizing the regularized square SR (RSSR). We consider two settings for the RSSR, regret minimization (RM) and best arm identification (BAI). In this regard, we propose a novel multi-armed bandit (MAB) algorithm for RM called UCB-RSSR for RSSR maximization. We derive a path-dependent concentration bound for the estimate of the RSSR. Based on that, we derive the regret guarantees of UCB-RSSR and show that it evolves as O (log n) for the two-armed bandit case played for a horizon n. We also consider a fixed budget setting for well-known BAI algorithms, i.e., sequential halving and successive rejects, and propose SHVV, SHSR, and SuRSR algorithms. We also derive the upper bound for the error probability of all proposed BAI algorithms. We demonstrate that UCB-RSSR outperforms the only other known SR-optimizing bandit algorithm, U-UCB Cassel et al. (2023). We also establish its efficacy with respect to other benchmarks derived from the GRA-UCB and MVTS algorithms. We further demonstrate the performance of proposed BAI algorithms for multiple different setups. Our research highlights that our proposed algorithms will find extensive applications in risk-aware portfolio management problems. Consequently, our research highlights that our proposed algorithms will find extensive applications in risk-aware portfolio management problems.
This paper focuses on selecting the arm with the highest variance from a set of independent arms. Specifically, we focus on two settings: (i) regret setting, that penalizes the number of pulls of suboptimal arms in terms of variance, and (ii) fixed-budget BAI setting, that evaluates the ability of an algorithm to determine the arm with the highest variance after a fixed number of pulls. We develop a novel online algorithm called UCB-VV for the regret setting and show that its upper bound on regret for bounded rewards evolves as where is the horizon. By deriving the lower bound on the regret, we show that UCB-VV is order optimal. For the fixed budget BAI setting, we propose the SHVV algorithm. We show that the upper bound of the error probability of SHVV evolves as, where represents the complexity of the problem, and this rate matches the corresponding lower bound. We extend the framework from bounded distributions to sub-Gaussian distributions using a novel concentration inequality on the sample variance. Leveraging the same, we derive a concentration inequality for the empirical Sharpe ratio (SR) for sub-Gaussian distributions, which was previously unknown in the literature. Empirical simulations show that UCB-VV consistently outperforms epsilon-greedy across different sub-optimality gaps, though it is surpassed by VTS, which exhibits the lowest regret, albeit lacking in theoretical guarantees. We also illustrate the superior performance of SHVV, for a fixed budget setting under 6 different setups against uniform sampling. Finally, we conduct a case study to empirically evaluate the performance of the UCB-VV and SHVV in call option trading on stocks generated using geometric Brownian motion (GBM).
Publications:
S. Khurshid, M. S. Abdulla, and G. Ghatak, Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits, in Springer's Machine Learning. Preprint here: arXiv
S. Khurshid G. Ghatak, and M. S. Abdulla, Variance-Optimal Arm Selection: Regret Minimization and Best Arm Identification, submitted to IEEE Transactions on Signal Processing. Preprint here: arXiv
Poster Presentation:
S. Khurshid, M. S. Abdulla, and G. Ghatak, "Optimizing Risk-Adjusted Decision-Making: Sharpe Ratio Maximization Bandit" IEEE ISIT-24, Athens, Greece. (Recent Results Session)
S. Khurshid, M. S. Abdulla, and G. Ghatak, "Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits." ACML-24, Hanoi, Vietnam.
S. Khurshid, Mohammad Taha Shah, and G. Ghatak, "Sharpe Ratio-Optimized Thompson Sampling for Risk-Aware Online Learning ." NeurIPS-25, San Diego, USA.
Work In Progress