Title: Option Exercise Games and the q Theory of Investment
Abstract: Firms shall be able to respond to their competitors’ strategies over time. Back and Paulsen (2009) thus advocate using closed-loop equilibria to analyze classic real-option exercise games but point out difficulties in defining closed-loop equilibria and characterizing the solution. We define closed-loop equilibria and derive a continuum of them in closed form. These equilibria feature either linear or nonlinear investment thresholds. In all closed-loop equilibria, firms invest faster than in the open-loop equilibrium of Grenadier (2002). We confirm Back and Paulsen (2009)’s conjecture that their closed-loop equilibrium (with a perfectly competitive outcome) is the one with the fastest investment and in all other closed-loop equilibria firms earn strictly positive profits. This work is jointly with Zhaoli Jiang and Neng Wang.
Title: Chain or Channel? Payment Optimization with Heterogeneous Flow
Abstract: Payment-channel networks (PCNs) such as the Lightning Network enable offchain payments secured by the channels' balances as alternatives to on-chain transactions. This paper solves the optimal channel management problem for two agents who pay each other arbitrarily distributed amounts. Agents optimally choose the channel's size and whether to make each payment on-chain or on-channel, depending on their current balance. With unidirectional flows, payments below some balancedependent chain amount happen on-channel while others on-chain. As the balance falls below the reserve level, payments are always made on-channel if feasible. Below the refill level, the channel is reset to its initial state. Symmetric bidirectional flows entail distinct chain thresholds and reset levels for both directions, but channels may last indefinitely. Asymmetric flows lead to a more complex optimal policies, in which both, either, or no party resets the channel. The paper characterizes optimal channels and payment policies, describing an algorithm to obtain them, given payments' frequency and distribution.
Title: Weighted Quantile Regression
Abstract: Quantile regression is a powerful tool for robust modeling in linear regression, particularly useful when data are contaminated by outliers or heavy-tailed error distributions. However, traditional quantile regression can suffer from high sampling variance, especially when error density is low at the targeted quantile. To mitigate this, weighted or composite quantile regression methods have been introduced, which incorporate multiple quantile levels in the loss function to improve stability. Despite their advantages, these methods are often criticized for their computational demands.
In this talk, we present a novel approach to weighted quantile regression that allows for arbitrary quantile weights, extending beyond finite support, and addresses the computational challenges of previous methods. By leveraging optimal monotone couplings, we develop an efficient algorithm for estimating model coefficients, significantly reducing computational burden. We also characterize the asymptotic distribution of the resulting estimators using rank test theory.
We demonstrate that with a Gaussian quantile weighting scheme, our estimator uniformly outperforms ordinary least squares across all error distributions with finite Fisher information. Additionally, we show a deeper connection between weighted quantile regression and convex loss function estimation, proving that each convex loss function arising from maximum likelihood estimation corresponds uniquely to a quantile weight. Consequently, the proposed method achieves the asymptotic Cramer-Rao lower bound for variance, equaling the efficiency of maximum likelihood estimators for specific error laws.
Title: Non-Standard Dynamic Utility Maximization
Abstract: This talk will survey recent research on non-standard dynamic utility maximization, where the utility function may not be concave or increasing. Examples include (1) payoffs option-type managerial compensation or from equity-linked life insurance contracts, (2) nonconcave utility functions used in behavioral economics, (3) the goal problems in household finance, (4) dynamic mean-variance analysis, and (5) median and quantile maximization. The latter two also have time inconsistency issues.
Title: Data-Driven Sequential Sampling for Tail Risk Mitigation
Abstract: Given a finite collection of stochastic alternatives, we study the problem of sequentially allocating a fixed sampling budget to identify the optimal alternative with a high probability, where the optimal alternative is defined as the one with the smallest value of extreme tail risk. We particularly consider a situation where these alternatives generate heavy-tailed losses whose probability distributions are unknown and may not admit any specific parametric representation. In this setup, we propose data-driven sequential sampling policies that maximize the rate at which the likelihood of falsely selecting suboptimal alternatives decays to zero. We rigorously demonstrate the superiority of the proposed methods over existing approaches, which is further validated via numerical studies.
Title: Corporate Investment and Savings Demand
Abstract: Why do some firms save more than others? We revisit this important, widely studied question by developing a tractable continuous-time capital-accumulation model for financially constrained firms facing costly external equity. In addition to including the standard building blocks for a q theory of investment (Hayashi, 1982, Econometrica; Abel and Eberly, 1994, AER): capital accumulation, capital adjustment costs, and a persistent stochastic productivity process, we assume that instantaneous profits (cashflows generated by productive capital) are stochastic. This assumption is a key difference between our model and widely used quantitative corporate finance models, e.g., Hennessy and Whited (2007) and Riddick and Whited (2009), in which instantaneous profits (conditional on productivity and capital stock) are deterministic.
Firm value depends on capital stock, cash balance, and productivity. Using the homogeneity property, we analytically characterize the solution with a variational inequality for the cash-capital ratio and productivity. Importantly, we show that making instantaneous profits stochastic is key to generating quantitatively meaningful cash holdings. Consider the alternative that instantaneous profits are (locally) deterministic. Then the firm’s cash balance at the end of the period would also be deterministic. This substantially reduces its precautionary savings demand, as locally the risk-neutral firm (as long as it is not too cash-strapped) faces little liquidation risk and hence holding cash has limited value. Mathematically, whether (conditional) instantaneous profits are stochastic implies whether the second-order derivative of firm value concerning cash appears in the Bellman equation, and economically, this term has a first-order effect on precautionary savings demand. This is a joint work with Xavier Giroud (Columbia University), Ling Qin (ShanghaiTech University), Neng Wang (Cheung Kong Business School)
Title: Equity Valuation Without DCF
Abstract: We propose using discounted alphas rather than discounted cash flows (DCF) to estimate the fundamental cash-flow value of an individual stock. Our novel approach builds on prior research on short-term abnormal returns (alphas) while eliminating the need for stock-level cost of equity estimates that are necessary for DCF. We document several new empirical findings on asset valuations and firm-level costs of equity, including that the market values of firm equity are ‘almost efficient’ by Black’s (1986) definition and that long-term discount rates do vary substantially across firms.
Title: Oligopolistic Market Equilibrium and the Role of Noise Observability
Abstract: We develop a continuous time equilibrium model of an oligopolistic market in which the insider's optimal strategy may include a nonzero martingale component derived from observed noise. This challenges the standard assumption of absolute continuity. To clarify this, we study a sequence of models: a standard discrete time framework, a noise observable variant, and their continuous time limits under different strategy classes. We show that insider access to noise information determines whether the continuous time limit permits martingale components. Our results highlight a structural link between discrete and continuous time under imperfect competition and the role of noise observability in shaping strategic trading behavior.
Title: Global maximum principle for optimal control of stochastic Volterra equations with singular kernels
Abstract: In this talk, we consider optimal control problems of stochastic Volterra equations (SVEs) with singular kernels, where the control domain is not necessarily convex. We establish a global maximum principle by means of the spike variation technique. To do so, we first show a Taylor type expansion of the controlled SVE with respect to the spike variation, where the convergence rates of the remainder terms are characterized by the singularity of the kernels. Next, assuming additional structure conditions for the kernels, we convert the variational SVEs appearing in the expansion to their infinite dimensional lifts. Then, we derive first and second order adjoint equations of the form of infinite dimensional backward stochastic evolution equations (BSEEs), and provide a necessary condition for a given control process to be optimal.
Title: Breaking the Dimensional Barrier: A Pontryagin-Guided Direct Policy Optimization for Continuous-Time Multi-Asset Portfolio
Abstract: Solving large-scale, continuous-time portfolio optimization problems involving numerous assets and state-dependent dynamics has long been challenged by the curse of dimensionality. Traditional dynamic programming and PDE-based methods, while rigorous, typically become computationally intractable beyond few state variables (~3-6 limit in prior studies). To overcome this critical barrier, we introduce the Pontryagin-Guided Direct Policy Optimization (PG-DPO) framework. PG-DPO leverages Pontryagin's Maximum Principle (PMP) and backpropagation-through-time (BPTT) to guide neural network policies, handling exogenous states without dense grids. This PMP-guided approach holds potential for a broad class of sufficiently regular continuous-time control problems. Crucially, our computationally efficient "Two-Stage'' variant exploits rapidly stabilizing BPTT costate estimates, converting them into near-optimal Pontryagin controls after only a short warm-up, significantly reducing training overhead. This enables a breakthrough in scalability: numerical experiments show PG-DPO successfully tackles problems with dimensions previously considered far out of reach (up to 50 assets and 10 state variables). The framework delivers near-optimal policies, offering a practical and powerful alternative for high-dimensional continuous-time portfolio choice.
Title: Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study
Abstract: We study continuous-time mean--variance portfolio selection in markets where stock prices are diffusion processes driven by observable factors that are also diffusion processes yet the coefficients of these processes are unknown. Based on the recently developed reinforcement learning (RL) theory for diffusion processes, we present a general data-driven RL algorithm that learns the pre-committed investment strategy directly without attempting to learn or estimate the market coefficients. For multi-stock Black--Scholes markets without factors, we further devise a baseline algorithm and prove its performance guarantee by deriving a sublinear regret bound in terms of Sharpe ratio. For performance enhancement and practical implementation, we modify the baseline algorithm into four variants, and carry out an extensive empirical study to compare their performance, in terms of a host of common metrics, with a large number of widely used portfolio allocation strategies on S&P 500 constituents. The results demonstrate that the continuous-time RL strategies are consistently among the best especially in a volatile bear market, and decisively outperform the model-based continuous-time counterparts by significant margins. This is a joint work with Yilie Huang and Xun Yu Zhou.
Title: Dynamic Mean-Variance Efficient Fractional Kelly Portfolios in a Stochastic Volatility Model
Abstract: In this paper, we improve the mean-variance efficiency of the fractional Kelly strategy, a popular investment strategy in the market, by minimizing the variance of the return of a portfolio with as high expected return as the fractional Kelly strategy. In view of time inconsistency arising from the mean-variance criterion, we consider so-called equilibrium portfolio strategies that can be consistently implemented by the investor. We derive the equilibrium portfolio strategy in closed form and show that it always leads to a smaller variance of return than the fractional Kelly strategy. By calibrating the model parameters to market data, we show that the reduction of variance achieved by the equilibrium portfolio can be economically significant. We also show in an out-of-sample test that the equilibrium portfolio strategy outperforms the fractional Kelly strategy under various performance measures.
Title: Roughness in finance via Schauder Representation
Abstract: This presentation will explain two distinct concepts for measuring the roughness of financial data. We first introduce the idea of the p-th variation of a real-valued continuous function along a general class of refining partition sequences. We demonstrate that the finiteness of the p-th variation of a given path is closely linked to the finiteness of the ℓp-norm of the coefficients along a Schauder basis, analogous to how the Hölder exponent relates to the ℓ∞-norm of the Schauder coefficients. This result establishes an isomorphism between the space of Hölder continuous functions with finite (generalized) p-th variation along a given partition sequence and a subclass of infinite-dimensional matrices, equipped with an appropriate norm, in the spirit of Ciesielski.
Title: MarketGANs: Multivariate time-series market data augmentation with GANs
Abstract: In this study, we propose a generative neural network for augmenting multi-dimensional time-series data in stock markets. Our model, termed MarketGAN, is based on TCN-GANs conditioned on factors, and characteristics of individual assets and macro-economic variables. We evaluate the performance of our model from three perspectives: marginal distributions, cross-sectional correlations and inter-temporal behaviors. Finally, we demonstrate an application of MarketGAN in the context of portfolio optimization.
Title: A Reduced Form Approach to Liquidity Provision in Automated Market Making
Abstract: We consider a method of liquidity provision in a decentralized exchange of two tokens. Particularly, we look at constant product automated market makers with concentrated liquidity mechanism. The wealth position of a liquidity provider is then determined by her position in token pools as well as transaction fees paid by arbitrageurs up to time t. We study the problem of selecting a profitable interval for liquidity concentration when the true (exchange) price follows a geometric Brownian motion.
Title: On the stability of Lipschitz continuous control problems and its application to reinforcement learning
Abstract: In this talk, I will explore the stability properties of the Hamilton–Jacobi–Bellman (HJB) equation in the context of model-free reinforcement learning. The focus is on optimal control problems under Lipschitz continuous control policies, where we analyze how the associated value functions behave as the Lipschitz constraint varies. By leveraging the connection between Lipschitz-constrained and classical optimal control problems, we provide both theoretical and empirical insights into the convergence and robustness of value functions. I will also introduce a general reinforcement learning framework tailored for Lipschitz continuous settings, and share results from numerical experiments that benchmark our method against existing approaches.
Title: Optimal consumption and portfolio rules with dynamic adjustment of consumption bounds
Abstract: We develop a comprehensive model to derive optimal consumption-investment choices for agents aiming to sustain spending power over an infinite horizon. Our model accounts for agents constrained by costly adjustable spending bounds and evaluates utility based on both current and future minimum allowable consumption levels, as well as actual consumption. Higher minimum levels assure investors that their consumption will not fall below a certain threshold, thereby generating utility from this anticipating security. However, adjusting these minimum levels incurs utility costs, leading to a non-trivial trade-off. The problem is transformed into an optimal switching problem through its conversion to a dual problem. The optimal switching problem is analytically characterized as a two-dimensional double obstacle problem. Our explicit solutions reveal an empirically consistent consumption pattern and U-shaped portfolio choices, implying that investors exhibit loss aversion toward changes in consumption. This is joint work with Junkee Jeon (Kyung Hee University) and Kexin Chen (The Hong Kong Polytechnic University).
Title: Optimal Recursive Utility Maximization with Debt-to-Income Limits
Abstract: We study a continuous-time optimal consumption and portfolio selection problem when an economic agent with recursive utility faces stochastic income and debt-to-income (DTI) borrowing limits. The recursive utility setup with time-varying borrowing constraints yields novel implications for optimal investment and marginal propensity to consume (MPC). We find that the optimal portfolio's dependency on the elasticity of intertemporal substitution (EIS) arises specifically due to borrowing constraints, regardless of constant investment opportunities. Our model generates the result consistent with the MPC heterogeneity reported by recent empirical literature. We also provide a novel testable implication that, particularly when constrained, active stock traders exhibit fairly higher MPCs compared to individuals not engaged in stock trading. Additionally, we make a technical contribution by developing a new transform to address problems associated with recursive utility.
Title: Mean-Variance Efficient Prediction of Asset Returns
Abstract: Markowitz laid the foundation of portfolio theory through the mean-variance optimization (MVO) framework. However, the effectiveness of MVO is contingent on the precise estimation of expected returns, variances, and covariances of asset returns, which are typically uncertain. Machine learning models are becoming useful in estimating uncertain parameters, and such models are trained to minimize prediction errors, such as mean squared errors (MSE), which treat prediction errors uniformly across assets. Recent studies have pointed out that this approach would lead to suboptimal decisions and proposed Decision-Focused Learning (DFL) as a solution, integrating prediction and optimization to improve decision-making outcomes. While studies have shown DFL's potential to enhance portfolio performance, the detailed mechanisms of how DFL modifies prediction models for MVO remain unexplored. This study aims to investigate how DFL adjusts stock return prediction models to optimize decisions in MVO, addressing the question: "MSE treats the errors of all assets equally, but how does DFL reduce errors of different assets differently?" Answering this will provide crucial insights into optimal asset return prediction for constructing efficient portfolios.
Title: Monotone Curve Estimation via Convex Duality
Abstract: A principal curve serves as a powerful tool for uncovering underlying structures of data through 1-dimensional smooth and continuous representations. On the basis of optimal transport theories, this paper introduces a novel principal curve framework constrained by monotonicity with rigorous theoretical justifications. We establish statistical guarantees for our monotone curve estimate, including expected empirical and generalized mean squared errors, while proving the existence of such estimates. These statistical foundations justify adopting the popular early stopping procedure in machine learning to implement our numeric algorithm with neural networks. Comprehensive simulation studies reveal that the proposed monotone curve estimate outperforms competing methods in terms of accuracy when the data exhibits a monotonic structure. Moreover, through two real-world applications on future prices of copper, gold, and silver, and avocado prices and sales volume, we underline the robustness of our curve estimate against variable transformation, further confirming its effective applicability for noisy and complex data sets. We believe that this monotone curve-fitting framework offers significant potential for numerous applications where monotonic relationships are intrinsic or need to be imposed.
Title: Designing funding rates for perpetual swaps in cryptocurrency markets: A BSDE approach
Abstract: In cryptocurrency markets, a significant challenge for perpetual swap issuers is ensuring that the perpetual swap price remains aligned with the underlying asset value. This paper addresses this issue by exploring the relationship between funding rates and perpetual swap prices. Given specific funding rates, we uniquely determine the price and replicating portfolio of perpetual swaps through an arbitrage argument. Our findings indicate that by appropriately designing funding rates, the perpetual swap can be pegged to the underlying asset value. Additionally, we provide approximate funding rates for practical applications and investigate the difference between the original funding rates and the approximate funding rates. To achieve these results, our study employs path-dependent infinite-horizon backward stochastic differential equations (BSDEs) in conjunction with arbitrage pricing theory. Our main results are derived by establishing the existence and uniqueness of solutions to these BSDEs and developing the corresponding Feynman-Kac formula. This work was conducted in collaboration with Jaehyun Kim.
Title: Scaling limits for multi-period distributionally robust optimization problems
Abstract: In this talk, we examine the scaling limit of multi-period distributionally robust optimization (DRO) via a semigroup approach. Each step involves a worst-case maximization over distributions in a Wasserstein ball around a reference process, and the multi-period problem arises through sequential composition. When the Wasserstein ball’s radius scales linearly with time, we show that the scaling limit of the multi-period DRO yields a strongly continuous monotone semigroup on Cb. Furthermore, we show that its infinitesimal generator is equal to the generator associated with the non-robust scaling limit plus an additional perturbation term induced by the Wasserstein uncertainty. As an application, when the reference process follows an Itô process, we show that the viscosity solution of the associated nonlinear PDE coincides with the value of a continuous-time stochastic differential game. This is based on joint work with Max Nendel (U. Waterloo), Ariel Neufeld (NTU Singapore) and Alessandro Sgarabottolo (LMU Munich).
Title: Stochastic Income and Optimal Policies: A New Analysis
Abstract: This paper provides a new analysis for optimal consumption and investment policies with stochastic income. Our analysis gives some kind of procedure for solving the Bellman equation with the dimension reduction scheme developed. In particular, we provide an interpretation of the reduced problem based on the probabilistic approach. The value function and optimal strategies are all explicitly characterized with analytic comparative statics provided, which is helpful to understand much of the economics in the paper. We also develop with the convergence theorem a numerical algorithm for policy iteration and solve the Bellman equation as a sequence of solutions to ordinary differential equations. Finally, the decision to invest more or fewer equity in the stock market in the presence of stochastic income is influenced, to a large extent, by the way in which how high risk aversion and income-to-wealth ratio are exhibited. This is join work with Alain Bensoussan (UT Dallas), Adannah Duruoha (UT Dallas), and Viswanath Ramakrishna (UT Dallas).
Title: Existence of optimal contract for principal-agent problem with quadratic cost function
Abstract: With recent advances in the mathematics community, the continuous-time principal-agent problem, a special case of the Stackelberg game, can be reformulated as a classical stochastic control problem. However, the existence of an optimal contract remains an open question. In the Markovian setting, this issue reduces to the existence of a classical solution to the associated Hamilton-Jacobi-Bellman (HJB) equation. The main technical difficulty arises from the degeneracy of the HJB equation. In this work, we consider the case where the agent’s effort cost function is quadratic. By exploiting the specific structure of the problem, we construct a classical solution. Furthermore, based on this result, we illustrate the principal’s contract and the agent’s optimal effort through numerical analysis.
Title: Renewable energy investment under stochastic interest rate with regime-switching volatility
Abstract: We examine the impact of the interest rate and its characteristics, such as long run mean and instantaneous variance risk (VR), on renewable energy investments in the power sector. The model has stochastic electricity price, stochastic interest rate, and variance regime switches. We show that an increase in the interest rate, while generally increasing the value of a power project, can have a non-monotone effect if the subsidy is sufficiently large. VR increases (reduces) the project value in the high variance regime, if the subsidy is sufficiently large (low). Under a fixed price contract, value declines and it is optimal to delay investment following an increase in the interest rate. The model helps to explain the US offshore industry experience in 2023.
Title: Rational Expectations Equilibrium with Endogenous Information Acquisition Time
Abstract: In this talk, we establish equilibrium in the presence of heterogeneous information. In particular, there is an insider who receives a private signal, an uninformed agent with no private signal, and a noise trader with semi price-inelastic demand. The novelty is that we allow the insider to decide (optimally) when to acquire the private signal. This endogenizes the entry time and stands in contrast to the existing literature which assumes the signal is received at the beginning of the period. Allowing for optimal entry also enables us to study what happens before the insider enters with private information, and how the possibility for future information acquisition both affects current asset prices and creates demand for information related derivatives. Results are valid in continuous time, when the private signal is a noisy version of the assets’ terminal payoff, and when the quality of the signal depends on the entry time.
Title: Understanding the Commodity Futures Term Structure Through Signatures
Abstract: Signature methods have successfully been used as a tool for feature extraction in statistical learning methods, notably in mathematical finance. The specific reason for their success is often much less clear, besides a general hand-waving to path-dependence. This presentation aims to explain their success in a particular task, namely classifying commodity futures markets according to storability. We provide a regular perturbation of the signature of the futures term structure in terms of the convenience yield and identify the volatility of the convenience yield as major discriminant.
Title: Portfolio Selection in Contests
Abstract: In an investment contest with incomplete information, a finite number of agents dynamically trade assets with idiosyncratic risk and are rewarded based on the relative ranking of their terminal portfolio values. We explicitly characterize a symmetric Nash equilibrium of the contest and rigorously verify its uniqueness. The connection between the reward structure and the agents' portfolio strategies is examined. A top-heavy payout rule results in an equilibrium portfolio return distribution with high positive skewness, which suffers from a large likelihood of poor performance. Risky asset holding increases when competition intensifies in a winner-takes all contest. This talk is based on joint work with Yumin Lu.
Title: PreFER: Interactive Robo-Advisor with scoring mechanism
Abstract: Instead of asking a client to specify his risk preference or learning it from his investment choice, we propose an inverse reinforcement learning (IRL) framework to learn his risk preference or the reward function by scoring. Specifically, the robo-advisor requests the client to score unadopted investment advice and extracts information from adopted ones. We develop the IRL through discrete-time Predictable Forward Exploratory Reward (PreFER) processes, where the exploration is regularized by Tsallis entropy. By interpreting the score as the acceptance probability of an advice, the preference learning becomes an inverse problem of finding the exploratory investment distribution of the client for a given investment distribution recommended by the robo-advisor and an acceptance probability in the context of acceptance-rejection method proposed by Von Neumann. Demonstrations are made for both classes of CARA and CRRA utilities. We prove that the density function of the optimal exploratory control attains maximum at the classic optimal strategy in the absence of exploration. In addition, as long as the scores are consistent in ordering, the biasness of scores does not affect the identification of the client’s risk aversion for a sufficient large number of interactions. The PreFER process further predicts the risk preference at the next time point from the one just learned, leading to an aggregation of learning power. This is a joint work with Yuwei Wang.
Title: Optimal portfolio selection with VaR and portfolio insurance constraints under the rank-dependent expected utility theory
Abstract: We investigate two optimal portfolio selection problems for a rank- dependent utility investor who needs to manage his risk exposure: one with a single Value-at-Risk (VaR) constraint and the other with joint VaR and portfolio insurance constraints. The two models generalize existing models under expected utility theory and behavioral theory. The martingale method, quantile formulation, and relaxation method are used to obtain explicit optimal solutions. We have specifically identified an equivalent condition under which the VaR constraint is effective. A numerical analysis is carried out to demonstrate theoretical results, and additional financial insights are presented. We find that, in bad market states, the risk of the optimal investment outcome is reduced when compared to existing models without or with one constraint. This talk is based on a joint work with Hui Mi from Nanjing Normal University.
Title: The Reversal of the Price-Liquidity Relationship in the OTC market
Abstract: In a search-based trading model, we uncover how assets with identical fundamentals can exhibit a liquidity premium or its reversal. As trading focuses on the more liquid asset, liquidity naturally diverges. When buyers dominate, a liquidity premium arises, reflecting immediacy versus trading gains. Conversely, increased selling prompts sellers to seek higher prices for illiquid assets, reversing the liquidity premium. Our model unifies liquidity spread dynamics in normal and crisis periods, suggesting that supporting the illiquid market paradoxically benefits the liquid market, correcting the reversed liquidity premium. This is joint work with Jaewon Choi, Jungsuk Han, and Sean Shin.
Title: Continuous-time q-learning for jump-diffusion models under Tsallis entropy
Abstract: This paper studies the continuous-time reinforcement learning in jump-diffusion models by featuring the q-learning (the continuous-time counterpart of Q-learning) under Tsallis entropy regularization. Contrary to the Shannon entropy, the general form of Tsallis entropy renders the optimal policy not necessary a Gibbs measure, where the Lagrange and KKT multipliers naturally arise from some constraints to ensure the learnt policy to be a probability density function. As a consequence, the characterization of the optimal policy using the q-function also involves a Lagrange multiplier. In response, we establish the martingale characterization of the q-function under Tsallis entropy and devise two q-learning algorithms depending on whether the Lagrange multiplier can be derived explicitly or not. In the latter case, we need to consider different parameterizations of the optimal q-function and the optimal policy and update them alternatively in an Actor-Critic manner. We also study two financial applications, namely, an optimal portfolio liquidation problem and a non-LQ control problem. It is interesting to see therein that the optimal policies under the Tsallis entropy regularization can be characterized explicitly, which are distributions concentrated on some compact support. The satisfactory performance of our q-learning algorithms is illustrated in each example.
Gu-gyum Ha (Sogang University)
Title: The Obstacle Problem Arising from the American Chooser Option
Abstract: We study the obstacle problem associated with the American chooser option. The obstacle is given by the maximum of an American call option and an American put option, which, in turn, can be expressed as the maximum of the solutions to the corresponding obstacle problems. This structure makes the obstacle problem particularly challenging and non-trivial. Using theoretical analysis, we overcome these difficulties and establish the existence and uniqueness of a strong solution. Furthermore, we rigorously prove the monotonicity and continuity of the free boundary arising from the obstacle problem.
Jaehyun Kim (Seoul National University)
Title: Long-term decomposition of robust pricing kernels under G-expectation
Abstract: This study develops a BSDE method for the long-term decompo- sition of pricing kernels under the G-expectation framework. We es- tablish the existence, uniqueness, and regularity of solutions to three types of quadratic G-BSDEs: finite-horizon G-BSDEs, infinite-horizon G-BSDEs, and ergodic G-BSDEs. Moreover, we explore the Feynman– Kac formula associated with these three types of quadratic G-BSDEs. Using these results, a pricing kernel is uniquely decomposed into four components: an exponential discounting component, a transitory com- ponent, a symmetric G-martingale, and a decreasing component that captures the volatility uncertainty of the G-Brownian motion. Further- more, these components are represented through a solution to a PDE. This study extends previous findings obtained under a single fixed prob- ability framework in [Hansen and Scheinkman, 2009], [Hansen, 2012] and [Qin and Linetsky, 2018] to the G-expectation context. This research was conducted in collaboration with Hyungbin Park.
Jongjin Park (Seoul National University)
Title: Long time asymptotics for HJB equations
Abstract: This work investigates the relationship between Hamilton-Jacobi-Bellman (HJB) equations and ergodic-type elliptic eigenvalue problems. We establish the existence and uniqueness of the associated critical eigenpair and analyze the asymptotic behavior of solutions to the HJB equation over time. As an application, we consider the utility maximization problem in an incomplete market. We demonstrate the fund separation property for long-term optimal investments by analyzing the corresponding ergodic-type elliptic eigenvalue problems. This research was conducted in collaboration with Hyungbin Park and Stephan Sturm.