Research Summary
In our laboratory, we primarily study the statistical analysis of stochastic processes and their applications to financial data. Stochastic processes mathematically model phenomena that change randomly over time. We mainly deal with the following stochastic processes:
Brownian Motion: Models the irregular movement of particles and is one of the most fundamental and important continuous-time stochastic processes.
Stochastic Differential Equations: Equations used to analyze complex stochastic processes, including Brownian motion, featuring continuous paths and used as stock price models.
Jump-Diffusion Processes: Stochastic processes obtained by adding discontinuous variations to stochastic differential equations, helpful in understanding sudden price movements in financial markets.
Point Processes (Integer-valued Stochastic Processes): Models the number of specific events within a period and used as models for buy and sell orders in the stock market, as well as models for earthquake occurrences.
In terms of statistical analysis of stochastic processes, we are researching parametric and non-parametric estimation methods. In analyzing "high-frequency data" in the stock market, which records all transactions of stocks within a day, complex observational structures emerge, and we are also studying statistical methods that consider these structures. We conduct research on constructing maximum likelihood and Bayesian estimation methods, and deriving theoretical properties such as the estimator's asymptotic variance being the smallest among any estimation method.
Regarding the application to financial data, we analyze data from the Tokyo Stock Exchange by fitting stochastic differential equations and jump-diffusion processes as stock price models for high-frequency data, and we use neural networks to learn stock price models from high-frequency data. Additionally, we model the information on buy and sell orders in the stock market (order book information) with self-exciting point processes and conduct research on predicting future order arrivals.
Modeling and Analysis of High-Frequency Data Using Stochastic Processes and Neural Networks
Modeling and Statistical Analysis of High-Frequency Data
High-frequency data, such as records of all intraday transactions of stocks, contains much more information compared to data with lower frequencies, such as daily data. It is expected that this can lead to more accurate predictions of variance and covariance. However, the analysis of high-frequency data is challenging due to its large volume and the following unique observational structures:
- Asynchronous Observations: Since stock prices are observed only when new transactions occur, the observation times for multiple securities do not match. This discrepancy makes it difficult to estimate the covariance of stock price movements. This issue is referred to as "asynchronous observations."
- Market Microstructure Noise: When modeling high-frequency data with stochastic differential equations, models that assume direct observation of the stochastic differential equations do not match empirical results. Instead, models that assume observations are made with some hypothetical noise are more plausible. This noise is referred to as "market microstructure noise." There is a need to develop methods for estimating the variance and covariance of the latent process in the presence of this noise.
Modeling with Neural Networks
In addition to the complexity of their observational structure, high-frequency data also exhibits various characteristics in the underlying stock price model (e.g., intraday seasonality, volatility clustering, increased correlation in the tails of the distribution). The approach of learning structures with neural networks is effective given the vast amount of high-frequency data. The parametric model of the coefficients of the stochastic differential equations is described using neural networks, and the stock price variation model is learned through maximum likelihood estimation.
Modeling and Statistical Analysis of Stock Order Book Information Using Point Processes
We are researching the modeling of pre-trade buy and sell orders (limit order book) using point processes to elucidate the mechanism of stock price formation. The modeling of order book information employs a self-exciting point process known as the Hawkes process, which is characterized by a temporary increase in the frequency of jumps due to its own jumps, leading to a chain reaction of jumps. This effectively captures the cluster structure in the stock market.
By modeling order book information with the Hawkes process and estimating the model's parameters from actual market data, it is possible to predict the future arrival frequency of buy and sell orders. This enables the determination of optimal execution strategies for large volume stock transactions.
The property known as Local Asymptotic Mixed Normality (LAMN) is crucial when discussing the optimality of estimation methods for unknown parameters in statistical models. LAMN refers to the property where the behavior of the likelihood function around the parameter converges to that of a "mixed normal distribution." When LAMN holds, a lower bound for the asymptotic variance, which represents the estimation error of any parameter estimator, is provided. Estimators that achieve this lower bound are considered optimal in terms of asymptotic variance, allowing us to discuss the optimality of estimators under LAMN.
In the case of stochastic processes, LAMN has been demonstrated for several models, and the optimality of estimators, such as maximum likelihood-type estimators and Bayes-type estimators, has been shown. To prove LAMN, it is necessary to identify the asymptotic behavior of the likelihood function. For statistical models based on stochastic differential equations, Malliavin calculus, which is a calculus framework for Brownian motion, is utilized.
Examples of Student Research Themes
Modeling of Stock Order Book Information Using Self-Exciting Point Processes and Model Selection
Modeling stock order book information with Hawkes processes, model selection using Bayesian Information Criterion, and theoretical analysis and numerical simulation of the validity of these methods.
Testing Methods for Covariation in Asynchronous Observations of Stochastic Differential Equation Models
Considering the issue of "asynchronous observations" that appear in high-frequency data, estimating and testing the covariation between two stocks from the data.
Non-convex optimization using reversible diffusion processes
In Langevin dynamics for non-convex optimization, using a stochastic differential equation model known as a reversible diffusion process to improve convergence in cases where there are multiple local solutions
Learning diffusion models with neural networks with a quasi-likelihood
Improving estimation accuracy for diffusion models with non-constant diffusion coefficients by learning the backward process stochastic differential equation using neural networks with a quasi-likelihood.