List of papers

Tags: [QFIN] - quantitative finance, [RL] - reinforcement learning, [EC] - evolutionary computation, [ML] - machine learning

Journal papers

Gašperov, Bruno, and Zvonko Kostanjčar. "Market making with signals through deep reinforcement learning." IEEE Access 9 (2021): 61611-61622. [QFIN] [RL]

ABSTRACT: Deep reinforcement learning has recently been successfully applied to a plethora of diverse and difficult sequential decision-making tasks, ranging from the Atari games to robotic motion control. Among the foremost such tasks in quantitative finance is the problem of optimal market making. Market making is the process of simultaneously quoting limit orders on both sides of the limit order book of a security with the goal of repeatedly capturing the quoted spread while minimizing the inventory risk. Most of the existing analytical approaches to market making tend to be predicated on a set of strong, naïve assumptions, whereas current machine learning-based approaches either resort to crudely discretized quotes or fail to incorporate additional predictive signals. In this paper, we present a novel framework for market making with signals based on model-free deep reinforcement learning, addressing these shortcomings. A new state space formulation incorporating outputs from standalone signal generating units, as well as a novel action space and reward function formulation, are introduced. The framework is underpinned by both ideas from adversarial reinforcement learning and neuroevolution. Experimental results on historical data demonstrate the superior reward-to-risk performance of the proposed framework over several standard market making benchmarks. More specifically, the resulting reinforcement learning agent achieves between 20-30% higher terminal wealth than the benchmarks while being exposed to only around 60% of their inventory risks. Finally, an insight into its policy is provided for the sake of interpretability.

Gašperov, Bruno, et al. "Reinforcement learning approaches to optimal market making." Mathematics 9.21 (Special Issue Advances in Markovian Dynamic and Stochastic Optimization Models in Diverse Application Areas) (2021): 2689. [QFIN] [RL]

ABSTRACT: Market making is the process whereby a market participant, called a market maker, simultaneously and repeatedly posts limit orders on both sides of the limit order book of a security in order to both provide liquidity and generate profit. Optimal market making entails dynamic adjustment of bid and ask prices in response to the market maker’s current inventory level and market conditions with the goal of maximizing a risk-adjusted return measure. This problem is

naturally framed as a Markov decision process, a discrete-time stochastic (inventory) control process. Reinforcement learning, a class of techniques based on learning from observations and used for solving Markov decision processes, lends itself particularly well to it. Recent years have seen a very strong uptick in the popularity of such techniques in the field, fueled in part by a series of successes of deep reinforcement learning in other domains. The primary goal of this paper is to provide a comprehensive and up-to-date overview of the current state-of-the-art applications of (deep) reinforcement learning focused on optimal market making. The analysis indicated that reinforcement learning techniques provide superior performance in terms of the risk-adjusted return over more standard market making strategies, typically derived from analytical models.

Gašperov, Bruno, and Zvonko Kostanjčar. "Deep Reinforcement Learning for Market Making Under a Hawkes Process-Based Limit Order Book Model." IEEE Control Systems Letters 6 (2022): 2485-2490. [QFIN] [RL]

ABSTRACT: The stochastic control problem of optimal market making is among the central problems in quantitative finance. In this paper, a deep reinforcement learning-based controller is trained on a weakly consistent, multivariate Hawkes process-based limit order book simulator to obtain market making controls. The proposed approach leverages the advantages of Monte Carlo backtesting and contributes to the line of research on market making under weakly consistent limit order book models. The ensuing deep reinforcement learning controller is compared to multiple market making benchmarks, with the results indicating its superior performance with respect to various risk-reward metrics, even under significant transaction costs.

Jakobovic, Domagoj, Marko Ðurasević, Stjepan Picek, and Bruno Gašperov. "ECF: A C++ Framework for Evolutionary Computation" (2023) Accepted for publication in Software X.  [EC]

ABSTRACT:  Metaheuristics have been shown to be efficient techniques for addressing a wide range of complex optimization problems. Developing flexible, reliable, and efficient frameworks for evolutionary computation metaheuristics is of great importance. With this in mind, ECF - Evolutionary Computation Framework, a versatile open-source framework for evolutionary computation written in C++, was developed. In addition to a wide range of efficiently implemented algorithms, it offers a variety of genotypes, parallelism with MPI, plug-and-play components, predefined problems, a configurable environment, as well as seamless integration between its components. By combining user-friendliness and customizability, ECF caters to both novice users and experienced practitioners. Its versatility and performance have been demonstrated through extensive applications to various continuous and combinatorial optimization problems. This paper delves into the framework’s key features, provides practical usage examples, highlights the impact of ECF, and outlines the plans for its future development.

Conference proceedings

Gašperov, Bruno, Marko Đurasević, and Domagoj Jakobović. "Leveraging More of Biology in Evolutionary Reinforcement Learning." (2023) Accepted for publication in International Conference on the Applications of Evolutionary Computation (Part of EvoStar) 2024. [EC] [RL]

ABSTRACT:  In this paper, we survey the use of additional biologically inspired mechanisms, principles, and concepts in the area of evolutionary reinforcement learning (ERL). While recent years have witnessed the emergence of a swath of metaphor-laden approaches, many merely echo old algorithms through novel metaphors. Simultaneously, numerous promising ideas from evolutionary biology and related areas, ripe for exploitation within evolutionary machine learning, remain in relative obscurity. To address this gap, we provide a comprehensive analysis of innovative, often unorthodox approaches in ERL that leverage additional bio-inspired elements. Furthermore, we pinpoint research directions in the field with the largest potential to yield impactful outcomes and discuss classes of problems that could benefit the most from such research.

Gašperov, Bruno, Marko Đurasević, and Domagoj Jakobović. "Finding Near-Optimal Portfolios With Quality-Diversity." (2023) Accepted for publication in International Conference on the Applications of Evolutionary Computation (Part of EvoStar) 2024. [EC] [QFIN]

ABSTRACT:  The majority of standard approaches to financial portfolio optimization (PO) are based on the mean-variance (MV) framework. Given a risk aversion coefficient, the MV procedure yields a single portfolio that represents the optimal trade-off between risk and return. However, the resulting optimal portfolio is known to be highly sensitive to the input parameters, i.e., the estimates of the return covariance matrix and the mean return vector. It has been shown that a more robust and flexible alternative lies in determining the entire region of near-optimal portfolios. In this paper, we present a novel approach for finding a diverse set of such portfolios based on quality-diversity (QD) optimization. More specifically, we employ the CVT-MAP-Elites algorithm, which is scalable to high-dimensional settings with potentially hundreds of behavioral descriptors and/or assets. The results highlight the promising features of QD as a novel tool in PO.

Carlet, Claude, Marko Đurasević, Bruno Gašperov, Domagoj Jakobović, Luca Mariot, Stjepan Picek. "A New Angle: On Evolving Rotation Symmetric Boolean Functions." (2023) Accepted for publication in International Conference on the Applications of Evolutionary Computation (Part of EvoStar) 2024. [EC] 

ABSTRACT:  Rotation symmetric Boolean functions represent an interesting class of Boolean functions as they are relatively rare compared to general Boolean functions. At the same time, the functions in this class can have excellent properties, making them interesting for various practical applications. The usage of metaheuristics to construct rotation symmetric Boolean functions is a direction that has been explored for almost twenty years. Despite that, there are very few results considering evolutionary computation methods. This paper uses several evolutionary algorithms to evolve rotation symmetric Boolean functions with different properties. Despite using generic metaheuristics, we obtain results that are competitive with prior work relying on customized heuristics. Surprisingly, we find that bitstring and floating point encodings work better than the tree encoding. Moreover, evolving highly nonlinear general Boolean functions is easier than rotation symmetric ones.

Bauman, Tessa, Sven Goluža, Bruno Gašperov, and Zvonko Kostanjčar. "Deep Reinforcement Learning for Goal-Based Investing Under Regime-Switching." Northern Lights Deep Learning Conference 2024. November 2023. [QFIN] [RL]

ABSTRACT:  Goal-based investing focuses on helping investors achieve specific financial goals, shifting away from the volatility-based risk paradigm. While numerous methods exist for this type of problem, the majority of them struggle to properly capture the non-stationary dynamics of real-world financial markets. This paper introduces a novel deep reinforcement learning framework for goal-based investing that addresses market non-stationarity through prompt reactions to regime switches. It relies on the integration of regime probability estimates directly into the state space. The experimental results indicate that the proposed method significantly outperforms several benchmarks commonly used in goal-based investing.

Gašperov, Bruno, and Marko Đurasević. "On evolvability and behavior landscapes in neuroevolutionary divergent search." (2023). GECCO 2023: Proceedings of the Genetic and Evolutionary Computation Conference. July 2023. 1203–1211. [EC] [RL]

ABSTRACT:  Evolvability refers to the ability of an individual genotype (solution) to produce offspring with mutually diverse phenotypes. Recent research has demonstrated that divergent search methods, in particular novelty search, promote evolvability by implicitly creating selective pressure for it. The main objective of this paper is to provide a novel perspective on the relationship between neuroevolutionary divergent search and evolvability. In order to achieve this, we first modify several types of walks from the literature on fitness landscape analysis. Following this, the interplay between neuroevolutionary divergent search and evolvability under varying amounts of evolutionary pressure and under different diversity metrics is investigated. To this end, experiments are performed on Fetch Pick and Place, a robotic arm task. Moreover, the performed study in particular sheds light on the structure of the genotype-phenotype mapping (the behavior landscape). Finally, a novel definition of evolvability that takes into account the evolvability of offspring and is appropriate for use with discretized behavior spaces is proposed, together with a Markov chain-based estimation method for it.

Gašperov, Bruno, Marko Đurasević, and Domagoj Jakobović. "A Search For Nonlinear Balanced Boolean Functions by Leveraging Phenotypic Properties." (2023) GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation. July 2023. 2047-2055. [EC]

ABSTRACT:  In this paper, we consider the problem of finding perfectly balanced Boolean functions with high non-linearity values. Such functions have extensive applications in domains such as cryptography and error-correcting coding theory. We provide an approach for finding such functions by a local search method that exploits the structure of the underlying problem. Previous attempts in this vein typically focused on using the properties of the fitness landscape to guide the search. We opt for a different path in which we leverage the phenotype landscape (the mapping from genotypes to phenotypes) instead. In the context of the underlying problem, the phenotypes are represented by Walsh-Hadamard spectra of the candidate solutions (Boolean functions). We propose a novel selection criterion, under which the phenotypes are compared directly, and test whether its use increases the convergence speed when compared to a competitive fitness function used in the literature. The results reveal promising convergence speed improvements for Boolean functions of sizes N=6 to N=9.

Bauman, Tessa, Bruno Gašperov, Stjepan Begušić, and Zvonko Kostanjčar. "Deep Reinforcement Learning for Robust Goal-Based Wealth Management." In IFIP International Conference on Artificial Intelligence Applications and Innovations, pp. 69-80. Cham: Springer Nature Switzerland, 2023. [QFIN] [RL]

ABSTRACT:  Goal-based investing is an approach to wealth management that prioritizes achieving specific financial goals. It is naturally formulated as a sequential decision-making problem as it requires choosing the appropriate investment until a goal is achieved. Consequently, reinforcement learning, a machine learning technique appropriate for sequential decision-making, offers a promising path for optimizing these investment strategies. In this paper, a novel approach for robust goal-based wealth management based on deep reinforcement learning is proposed. The experimental results indicate its superiority over several goal-based wealth management benchmarks on both simulated and historical market data.

Gašperov, Bruno, et al. "Adaptive rolling window selection for minimum variance portfolio estimation based on reinforcement learning." 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO). IEEE, 2020. [QFIN] [RL]

ABSTRACT:  When allocating wealth to a set of financial assets, portfolio optimization techniques are used to select optimal portfolio allocations for given investment goals. Among benchmark portfolios commonly used in modern portfolio theory, the global minimum variance portfolio is becoming increasingly popular with investors due to its relatively good performance which stems from both the low-volatility anomaly and the avoidance of the estimation of first moments i.e. mean returns. However, estimates of minimum variance portfolio weights significantly depend on the size of the rolling window used for estimation, especially considering the non-stationarity of the underlying market dynamics. In this paper, we use a model-free policy-based reinforcement learning framework in order to directly and adaptively determine the optimal size of the rolling window. Training is done on a subset of trading stocks from the NYSE. The resulting agent achieves superior performance when compared against multiple benchmarks, including those with fixed rolling window sizes.

Preprints

Gašperov, Bruno "Novelty Search And Quality-diversity For Portfolio Optimization And Beyond" [QFIN] [EC]

ABSTRACT:  In this paper, we provide an overview of exploration algorithms, especially novelty search and quality diversity, as they may apply to financial portfolio optimization and related problems in the field of quantitative finance. These algorithms partly or fully rely on the idea of divergent search, where the goal is to seek novel unexplored solutions, as opposed to simply maximizing the explicitly stated objective function. Given their favorable exploration properties, they are particularly powerful in the context of evolutionary reinforcement learning. As such, they have recently been successfully applied to a variety of domains, including procedural generation of video game levels and learning robot movements. However, their applications in portfolio optimization and quantitative finance in general, remain largely unexplored. We argue that this intersection presents significant untapped potential, primarily due to natural connections between divergent search and several concepts central to portfolio optimization, such as diversification, individuality of risk preferences, and high sensitivity to market regimes. Finally, we outline several research avenues that are hoped to pave the way towards more efficient leveraging of exploration algorithms in portfolio optimization and beyond.

Other

Gašperov, Bruno, et al. "Prediction of Cashflow Timing and Patterns in International Bank Accounts". STAges de REcherche BEI-EIB final report. (2021) [ML]

ABSTRACT:  In this project, we have designed and implemented a machine learning-based model capable of generating intraday cashflow timing predictions of sufficient accuracy. More specifically, we have opted for a random forest, an ensemble of decision trees. The results point to the ability of the proposed model to demonstrate predictive power, with random forests lending themselves particularly well to the underlying problem. Moreover, by incorporating prediction intervals into the model that enable quantifying the reliability of the generated predictions, we have provided a probabilistic perspective to the problem.

Theses

Gašperov, Bruno. Deep reinforcement learning for market making with time-varying order arrival intensities. Doctoral dissertation. University of Zagreb. Faculty of Electrical Engineering and Computing. Department of Electronic Systems and Information Processing, 2022. [QFIN] [RL]

ABSTRACT (EXTENDED)Market making is a problem of the optimal placement of limit orders on both sides of the limit order book with the goal of maximizing the trader’s terminal wealth while minimizing the related risks. Such risks particularly include inventory, execution, latency, adverse selection, and model uncertainty risks. Especially salient is the inventory risk, arising from the fluctuations in the value of the asset held in the market maker’s inventory, which is typically non-zero, since it depends on when and whether the placed orders get executed. Consequently, effective market making requires dynamic adaptation to changes in the current inventory level and other relevant market and market maker-related variables. The underlying problem of stochastic optimal control can be naturally cast as a discrete Markov Decision Process (MDP). Existing analytical approaches to market making tend to be predicated upon a set of naïve assumptions and are ill-suited to market making on order-driven markets as they fail to consider the discreteness of the limit order book in general. Moreover, they do not factor in the market microstructure dynamics, especially the time variability of order arrival intensities. Promisingly, methods based on (deep) reinforcement learning are known to lend themselves well to solving problems formulated as MDPs and hence offer a potential alternative to tackling market making. Moreover, considering that the model of the market maker's environment is typically unknown, model-free deep reinforcement learning methods, capable of learning directly from data without any explicit modeling of the underlying dynamics or prior knowledge, are of pivotal importance. Bearing this in mind, as well as the shortcomings of the current approaches, in this thesis novel model-free deep reinforcement learning methods for market-making on order-driven markets with time-varying order arrival intensities are proposed. The first method is based on two standalone supervised learning-based signal generating units and a deep reinforcement learning unit for market making that exploits the generated signals. Special attention is paid to demands on the sufficient granularity of the resulting market making policies and to the methods' robustness to variations in the market microstructure dynamics. To this end, a procedure for training market making agents robust to such variations, based on adversarial reinforcement learning, is also proposed. Moreover, an evaluation framework for testing the proposed method with respect to the interpretability and the risk-adjusted return metrics is proposed. The second method is concerned with market making under a weakly consistent, multivariate Hawkes process-based LOB model. The experimental results are discussed, analyzed, and juxtaposed against the results of several market making benchmarks. It is found that the proposed methods outperform the benchmarks with respect to multiple risk-adjusted reward performance metrics.

Gašperov, Bruno. Signaling in generalized Spence two-job market models and its effect on social welfare. Master thesis. Wirtschaftsuniversität Wien (Vienna University of Economics and Business). Institute for Statistics and Mathematics, 2016.

ABSTRACT:  The overall goal of this thesis is to analytically study the role of education as signaling in Spencian job market models. We begin by using the job market model that consists of two markets, one informationally symmetric and the other asymmetric to investigate the effect of signaling on the adverse selection problem (lemons problem).  Then we generalize the Spence two-job market model in two realistic ways: first, by assuming 

under/over-qualification of workers and random choice of the missing workforce, and second, by assuming that certain workers are misinformed about their own productivity. We study the effect of signaling on individual and social welfare in such a generalized model. The emphasis is put on the conditions needed for individual and social interests to coincide. Our results confirm that, even with unproductive education, signaling may improve social welfare by enabling optimal job allocation. Finally, we discuss the limitations of the model and recommended some possible venues for further research.