PUBlications

 [ * refers to a supervised student co-author or postdoc co-author; † denotes the corresponding author. ]

Journal Papers

Generative Portfolio Optimization with Attention-Powered Sequential Learning


Chuting SUN, Qi WU, Xing YAN.


Journal of Economic Dynamics and Control. (R&R). [pdf]

We present a dynamic generative factor model that utilizes the Attention-GRU network to learn the parameter dynamics of return distribution, focusing particularly on the tail-side properties of multivariate stock returns. Experimental results on stock data demonstrate that this model leads to portfolio strategies with higher rewards, reduced tail risks, and smaller maximum drawdowns.

Neural Learning of Online Consumer Credit Risk

Di WANG, Qi WU† and Wen ZHANG. 

Management Science. (R&R). [arXiv] 

This paper introduces the "NeuCredit" model, a deep learning approach to analyze consumer credit risk in e-commerce platforms offering unsecured credit. This study demonstrates the information value of incorporating shopping behavioral data for consumer credit risk assessment. It provides an interpretable default probability by decomposing risks into subjective, objective, and behavioral components using a unique dataset from a large e-commerce platform.

Probabilistic Learning of Multivariate Time Series with Temporal Irregularity


Yijun LI*, Cheuk Hang LEUNG*, and Qi WU†.


IEEE Transactions on Pattern Analysis and Machine Learning. (under review). [ssrn, pdf]

Multivariate sequential data collected in practice often exhibit nonuniform time intervals and component misalignment. However, if uneven spacing and asynchrony are endogenous characteristics of the data rather than a result of insufficient observation, for example, observable events themselves arrive at inherently random times or with varying frequencies, then these irregularities reflect the underlying cause of these properties. This paper proposes an end-to-end trainable Recurrent Flow Network (RFN) to address the limitations of existing approaches in forecasting the joint distribution of such data. 

Figure (a) and (b) are sample path examples of Syn-MTS and Asyn-MTS. Solid circle dots mark the observed data points. While the time intervals between consecutive observation times are unevenly spaced in both cases, component observations of the Syn-MTS sample path are always aligned. In contrast, in the Asyn-MTS case, no observation time has complete observations. This demonstrates that uneven spacing originates at the univariate level, while asynchrony arises exclusively in the multivariate context. 

Figure: Transaction prices and times of the Tencent stock and its options with moneyness levels 0.96 (in the money), 1.00 (at the money), and 1.10 (out the money) from 01/Dec/2014 to 31/Dec/2017. Values are "missing" means there are no transaction records during the 3-minute intervals. 

Figure: The framework of RFNs for Asyn-MTS. It has three component variables X^1_t , X^2_t , X^3_t . The solid dots are observations. Different colors refer to different variables. In the marginal block, each variable has its own set of hidden states h^1(t), h^2(t), h^3(t). The hidden states of a particular variable will be updated only when it has an observation. In the multivariate block, the time-tk base distribution parameters, µ_{t_k} and Σ_{t_k} , are functions of the hidden states h_{t_{k−}} = [h^_{t_{k−}}; h^2_{t_{k−}}; h^3_{t_{k−}}] and additionally the component variables X^{−d}_{t_k} := [x^1_{t_k}, · · · , x^{d−1}_{t_k}]

Representation Balancing with Decomposed Patterns for Treatment Effect Estimation

Y.Y. Huang*, S.Y. Wang*, C.H. Leung*, Q. Wu†, D.D. Wang, Z.X. Huang

Transaction on Machine Learning Research. (under review). [openreivew] 

Estimating treatment effects from observational data is subject to a covariate shift problem incurred by selection bias. Recent research has sought to mitigate this problem by balancing the distribution of representations between the treated and controlled groups. The rationale behind this is that counterfactual estimation relies on (1) preserving the predictive power of factual outcomes and (2) learning balanced representations. However, there is a trade-off between achieving these two objectives. In this paper, we propose a novel model, DIGNet, which is designed to capture the patterns that contribute to outcome prediction (task 1) and representation balancing (task 2) respectively. Specifically, we derive a theoretical upper bound that links the concept of propensity confusion to representation balancing, and further transform the balancing Patterns into Decompositions of Individual propensity confusion and Group distance minimization (PDIG) to capture more effective balancing patterns. Moreover, we suggest decomposing proxy features into Patterns of Pre-balancing and Balancing Representations (PPBR) to preserve patterns that are beneficial for outcome modeling. Extensive experiments confirm that PDIG and PPBR follow different pathways to achieve the same goal of improving treatment effect estimation. We hope our findings can be heuristics for investigating factors influencing the generalization of representation balancing models in counterfactual estimation. 

Robust Orthogonal Machine Learning of Treatment Effects

Yiyan HUANG*, Cheuk Hang LEUNG*, Qi WU, and Xing YAN. 

IEEE Transactions on Pattern Analysis and Machine Learning. (under review). [arXiv]

Causal learning is the key to obtaining stable predictions and answering what if problems in decision-makings. In causal learning, it is central to seek methods to estimate the average treatment effect (ATE) from observational data. The Double/Debiased Machine Learning (DML) is one of the prevalent methods to estimate ATE. However, the DML estimators can suffer from an error-compounding issue and even give extreme estimates when the propensity scores are close to 0 or 1. Previous studies have overcome this issue through some empirical tricks such as propensity score trimming, yet none of the existing works solves it from a theoretical standpoint. In this paper, we propose a Robust Causal Learning (RCL) method to offset the deficiencies of DML estimators. Theoretically, the RCL estimators i) satisfy the (higher-order) orthogonal condition and are as consistent and doubly robust as the DML estimators, and ii) get rid of the error-compounding issue. Empirically, the comprehensive experiments show that: i) the RCL estimators give more stable estimations of the causal parameters than DML; ii) the RCL estimators outperform traditional estimators and their variants when applying different machine learning models on both simulation and benchmark datasets, and a mimic consumer credit dataset generated by WGAN. 

A Unified Domain Adaptation Framework with Distinctive Divergence Analysis

Z.R. YUAN*, X.X. Hu*, Q. Wu, S.M. Ma*, C.H. Leung*, X. Shen, and Y.Y. Huang*. 

Transaction on Machine Learning Research. (2023) [openreivew] 

Unsupervised domain adaptation enables knowledge transfer from a labeled source domain to an unlabeled target domain by aligning the learnt features of both domains. The idea is theoretically supported by the generalization bound analysis in Ben-David et al. (2007), which specifies the applicable task (binary classification) and designates a specific distribution divergence measure. Although most distribution-aligning domain adaptation models seek theoretical grounds from this particular bound analysis, they do not actually fit into the stringent conditions. In this paper, we bridge the long-standing theoretical gap in literature by providing a unified generalization bound. Our analysis can well accommodate the classification/regression tasks and most commonly-used divergence measures, and more importantly, it can theoretically recover a large amount of previous models. In addition, we identify the key difference in the distribution divergence measures underlying the diverse models and commit a comprehensive in-depth comparison of the commonly-used divergence measures. Based on the unified generalization bound, we propose new domain adaptation models that achieve transferability through domain-invariant representations and conduct experiments on real-world datasets that corroborate our theoretical findings. We believe these insights are helpful in guiding the future design of distribution-aligning domain adaptation algorithms. 

Deep into The Domain Shift: Transfer Learning through Dependen Regularization


S.M. Ma*, Z.R. Yuan*, Q. Wu†, Y.Y. Huang*, X.X. Hu*, C.H. Leung*, D.D. Wang and Z.X. Huang.


IEEE Transactions on Neural Networks and Learning Systems. (2023) [arXiv, ssrn]

Classical Domain Adaptation methods regularize overall distributional discrepancies between labeled source domain features and unlabeled target domain features, without distinguishing between marginals and dependence structures. This paper introduces an approach that separately measures differences in internal dependence structure and marginals. By optimizing relative weights between them, the proposed regularization strategy enhances transferability and enables focused attention on crucial areas of divergence. 

Figure: Visualization of three 2-D Gaussian distributions. The KL divergences between P^X and P^Y and between P^Z and P^Y are the same, which is 1/2. However, P^X differs from P^Y in the 2nd marginal distribution, whereas the difference between P^Z and P^Y lies in the covariance matrix. This example illustrates that a single divergence measure based the joint distribution alone cannot distinguish whether the distributional differences come from the marginals or the dependence structure. 

Counter-cyclical Margins for Options Portfolios

Xing YAN* and Qi WU†. 

Journal of Economic Dynamics and Control. 146. 104572 (2022) [ssrn

We propose a counter-cyclical initial margin model for option portfolios. Our model explores the intrinsic netting within a given portfolio of European options and outputs a constant upper bound of the maximum possible loss. This feature would allow option clearinghouses and regulators to gauge the tightest margin levels that are stable. We compare our model with the scenario-based SPAN model and the sensitivity-based SIMM model in terms of the netting efficiency and the procyclical property. Using the SPX options and the interest rate swaptions as examples, we quantify the minimum amount of additional margins needed to make them fully counter-cyclical. We then show how to strike a balance between risk-sensitivity and counter-cyclicality if needed by mixing our model flexibly with a prevailing risk-sensitive margin model

Capturing Deep Tail Risk via Sequential Learning of Quantile Dynamics

Xing YAN* and Qi WU†. 

Journal of Economic Dynamics and Control. 109. 103771 (2019) [ssrn

This paper develops a conditional quantile model that can learn long term and short term memories of sequential data. It builds on sequential neural networks, but can outputs interpretable dynamics, and consistenly outperforms the GARCH family as well as models using filtered historical simulation, conditional extreme value theory, and dynamic quantile regression for Value-at-Risk forecasts. Upon applying the model to asset return time series across eleven asset classes using historical data from the 1960s to 2018, it is revealed and confirmed that conditional quantiles of asset return have persistent sources of risk that are not coming from those responsible for volatility clustering. 

Persistence and Procyclicality in Margin Requirements

Paul Glasserman and Qi WU. 

Management Science. Vol.64, No.12. 5705 - 5724. (2018) [ssrn]

Margin requirements for derivative contracts serve as a buffer against the transmission of losses through the financial system by protecting one party to a contract against default by the other party. However, if margin levels are proportional to volatility, then a spike in volatility leads to potentially destabilizing margin calls in times of market stress. Risk-sensitive margin requirements are thus procyclical in the sense that they amplify shocks. We use a GARCH model of volatility and a combination of theoretical and empirical results to analyze how much higher margin levels need to be to avoid procyclicality while reducing counterparty credit risk. Our analysis compares the tail decay of conditional and unconditional loss distributions with comparable stable and risk-sensitive margin requirements. Greater persistence and burstiness in volatility leads to a slower decay in the tail of the unconditional distribution and a higher buffer needed to avoid procyclicality. The tail decay drives other measures of procyclicality as well. Our analysis points to important features of price time series that should inform “antiprocyclicality” measures but are missing from current rules. 

Series Expansion of the SABR Joint Density

Qi WU. 

Mathematical Finance. Vol.22, No.2. 310 - 345. (2012)  [ssrn]

Under the SABR stochastic volatility model, pricing and hedging contracts that are sensitive to forward smile risk (e.g., forward starting options, barrier options) require the joint transition density. This paper provides closed-form representations, asymptotically, of the joint transition density. 

Forward and Future Implied Volatility

Paul Glasserman and Qi WU. 

International Journal of Theoretical and Applied Finance. Vol.14, No.03. 407 - 432. (2011) [ssrn]

We find that model-based forward volatility extracts this predicative information better than a standard "model-free" measure of forward volatility and better than spot implied volatility. The enhancement to out-of-sample forecasting accuracy gained from model-based forward volatility is greatest at longer forecasting horizons. 

Conference Papers

DeLELSTM: Decomposition-based Linear Explainable LSTM to Capture Instantaneous and Long-term Effects in Time Series

C.Q. Wang*, Y.J. Li*, X.Q. Sun*, Q. Wu†, D.D. Wang and Z.X. Huang. 

IJCAI 2023. (link).

Conventional interpretation of recurrent networks focuses on the variables' importance.This paper takes a simple approach to improve the interpretability of LSTM by distinguishing between the instantaneous influence of new coming data and the long-term effects of historical data. By decomposing the hidden states H(t) into the linear combination of past information h(t-1) and the fresh information h(t)-h(t-1), one obtains the instantaneous influence and the long-term effect of each feature. The advantage of linear regression makes the explanation transparent and clear. 

Towards Balanced Representation Learning for Credit Policy Evaluation 

Y.Y. Huang*, C.H. Leung*, S.M. Ma*, Z.R. Yuan*, Q. Wu†, S.Y. Wang*, D.D. Wang and Z.X. Huang. 

AISTATS 2023. [ssrn, pdf]

Covariate balancing aims to mitigate selection bias in the representation space to obtain the domain-invariant features. But it leads to the loss of domain-discriminative information in the process. This paper introduces a doubly robust objective leveraging the treatment and outcome predictions, serving as a prerequisite for covariate balancing to mitigate the over-balancing issue. In addition, we investigate how to improve treatment effect estimations by exploiting the unconfoundedness assumption.

A Unified Perspective on Regularization and Perturbation in Differentiable Subset Selection

Xiangqian Sun*, Cheuk Hang LEUNG*, Yijun Li*, and Qi Wu†. 

AISTATS 2023. [ssrn, pdf]

Subset selection identifies a subgroup of items from a given collection to achieve specific goals. Researchers typically choose between the regularization method and the perturbation method to make the selection operator differentiable in order to implement it in end-to-end learning frameworks. This paper unifies these two schemes through a probabilistic interpretation for regularization relaxation. We build some concrete examples to show the generic connection between these two relaxation, and we evaluate the perturbed selector as well as the regularized selector on two tasks: the maximum entropy sampling problem and the feature selection problem.

Robust Causal Learning for the Estimation of Average Treatment Effects

Y.Y. Huang*, C.H. Leung*, X. Yan*, Q. Wu†, S.M. Ma*, Z.R. Yuan*, D.D. Wang and Z.X. Huang. 

IJCNN 2022. (Oral) [arXiv, ssrn]

Double/Debiased Machine Learning (DML) method aims to remove the covariates' impact to obtain unbiased estimation of treatment effects. The essence of DML lies in the use of two possibly mis-specified models: one for predicting the treatment variable based on covariates and another for predicting the outcome variable based on covariates. The problem is existing DML estimators is prone to the error-compounding issue when the propensity scores are close to zero or one. This paper proposes a new estimator that is just as consistent and doubly robust as existing DML estimators but not susceptible to this problem. 

Moderately-Balanced Representation Learning for Treatment Effects with Orthogonality Information

Y.Y. Huang*, C.H. Leung*, S.M. Ma*, Q. Wu†, D.D. Wang and Z.X. Huang. 

PRICAI 2022. [arXiv, ssrn]

This paper proposes a moderately-balanced representation learning (MBRL) framework based on recent covariates balanced representation learning methods and orthogonal machine learning theory. This framework protects the representation from being over-balanced via multi-task learning. Simultaneously, MBRL incorporates the noise orthogonality information in the training and validation stages to achieve a better ATE estimation.

The Causal Learning of Retail Delinquency 

Y.Y. Huang*, C.H. Leung*, X. Yan*, Q. Wu†, N.B. Peng, D.D. Wang and Z.X. Huang. 

AAAI 2021. [arXiv, ssrn]

The paper proposes a deep learning model called "NeuCredit" to incorporate shopping behavioral into consumer's spending and payment records for the assessment of consumer credit risk in e-commerce platforms. The model captures heterogeneous serial dependences and nonlinear cross-sectional interactions among different time-evolving features. It decomposes the predicted delinquency probability into three components: the subjective risk indicating the consumers' willingness to repay, the objective risk indicating their ability to repay, and the behavioral risk indicating consumers' behavioral differences. 

Memory-Gated Recurrent Networks 

Y.Q. Zhuang*, Q. Wu†, N.B. Peng, M. Dai and J. Zhang* and H. Wang. 

AAAI 2021. [arXiv, ssrn]

At the center of multivariate sequential learning is the extraction of dependencies in data. These datasets often exhibit not only strong serial dependencies in the individual components (the "marginal" memory) but also non-negligible memories in the cross-sectional dependencies (the "joint" memory). This paper constructs a recurrent network architecture with gates explicitly regulating two distinct types of memories: the marginal memory and the joint memory.

Risk and Return Prediction for Pricing Portfolios of Non-performing Consumer Credit 

S.Y. Wang*, X. Yan*, B.Q. Zheng, H. Wang, W.L. Xu, N.B. Peng and Q. Wu†. 

ICAIF 2021. [arXiv, ssrn] 

The rise of fintech lending has increased the need to trade portfolios of non-performing consumer credit loans for risk transfer. However, analyzing and pricing these portfolios pose technical challenges, and research in this area is limited. This paper proposes a bottom-up architecture that models the repayment rate distribution for individual loans and the overall repayment rate distribution for the portfolio. It employs simultaneous quantile regression, R-copula, and Gaussian one-factor copula models to address these challenges. This approach represents the first successful adoption of a bottom-up system for analyzing credit portfolio risks in real-world consumer loan business tasks.   

Cross-sectional Learning of Extremal Dependence among Financial Assets

Xing YAN*, Qi WU† and Wen ZHANG. 

NeurIPS 2019. [arXiv, ssrn, pdf]

We propose a novel probabilistic model to facilitate the learning of multivariate tail dependence of multiple financial assets. Our method allows one to construct from known random vectors, e.g., standard normal, sophisticated joint heavytailed random vectors featuring not only distinct marginal tail heaviness, but also flexible tail dependence structure. The novelty lies in that pairwise tail dependence between any two dimensions is modeled separately from their correlation, and can vary respectively according to its own parameter rather than the correlation parameter, which is an essential advantage over many commonly used methods such as multivariate t or elliptical distribution. It is also intuitive to interpret, easy to track, and simple to sample comparing to the copula approach. 

Parsimonious Quantile Regression of Asymmetrically Heavy-tailed Financial Return Series

Xing YAN*, Weizhong ZHANG, Lin MA, Wei LIU and Qi WU†. 

NeurIPS 2018. [ssrn, pdf]

We propose a parsimonious quantile regression framework to learn the dynamic tail behaviors of financial asset returns. Our model captures well both the timevarying characteristic and the asymmetrical heavy-tail property of financial time series. It combines the merits of a popular sequential neural network model, i.e., LSTM, with a novel parametric quantile function that we construct to represent the conditional distribution of asset returns. Our model also captures individually the serial dependences of higher moments, rather than just the volatility. Across a wide range of asset classes, the out-of-sample forecasts of conditional quantiles or VaR of our model outperform the GARCH family. Further, the proposed approach does not suffer from the issue of quantile crossing, nor does it expose to the ill-posedness comparing to the parametric probability density function approach. 

Book Chapters

Procyclicality in Sensitivity-Based Margin Requirements

Paul Glasserman and Qi WU. 

"Margin in Derivatives Trading". Risk Books. Chapter 15. 293 - 309. (2018)

This paper shows that the industry standard model for initial margin in the non-cleared market, the Standard Initial Margin Model (SIMM) is nevertheless subject to procyclicality through the dependence of price sensitivities on market conditions, although it includes features to reduce procyclicality. The degree of procyclicality varies across contract types and market conditions. Anticipating potential margin spikes requires regular liquidity stress testing and would benefit from greater transparency in the updating of model parameters.

Working Papers

Surge pricing finds equilibrium prices in periods of excessive demand or scarce supply. Effective or not, it is a business risk when riders perceive spiking prices as exploitation of people's emergency. This paper studies subsidy policies that avoid the downside of surge pricing when accommodating the fluctuation of demand. We show that the re-usability of driver supply presents a hidden capacity. Tapping it wisely through supply-side subsidies prescribes a non-pricing alternative to the current pricing policies without the need for either hiking price or recruiting new drivers. We use a queueing model together with the Stackelberg game to analyze how to optimally subsidize a reusable pool of driver supply, both myopically and in the long run. Knowing drivers are self-interested and given a specific structure of base trip fare, our analysis shows that injecting a healthy dose of myopic subsidy into the matching process reverts unfavorable decisions of individual drivers. In aggregate, the induced supply multiplier effect significantly boosts vehicle circulation and effective demand. We also show that, on the other hand, the impact of myopic subsidies on long-run throughput is not monotonic. There is a physical limit in terms of how much the platform can tap this hidden capacity. However large the budget of incentive, its size is intrinsically constrained by the spatial structure of base trip fare and the distribution of customer travel distances.  

Invariant learning methods try to find an invariant predictor across several environments and have become popular in OOD generalization. However, in situations where environments do not naturally exist in the data, they have to be decided by practitioners manually. Environment partitioning, which splits the whole training dataset into environments by algorithms, will significantly influence the performance of invariant learning and has been left undiscussed. A good environment partitioning method can bring invariant learning to applications with more general settings and improve its performance. We propose to split the dataset into several environments by finding low-correlated data subsets. Theoretical interpretations and algorithm details are both introduced in the paper. Through experiments on both synthetic and real data, we show that our Decorr method can achieve outstanding performance, while some other partitioning methods may lead to bad, even below-ERM results using the same training scheme of IRM. 

We develop explicit asymptotic expansions of the portfolio Value-at-Risk (VaR) and portfolio Expected Shortfall (ES) for a large family of multivariate elliptical distributions. The family includes distributions of exponential type such as Kotz distributions, and power type such as the multivariate Student t-distribution. Our results imply that the difference between the portfolio ES and its VaR depends on the tail heaviness of the joint asset return distribution. For assets exhibiting exponential tail decay, the ratio between ES and VaR is asymptotically zero, whereas for assets exhibiting power type tail decay, the portfolio ES is strictly larger than its VaR. The amount of the risk reduction through merging subportfolios into a single portfolio depends solely on the dispersion of the joint asset return distribution. 

We present a stochastic-volatility, short rate term structure model, which extends the classic multi-factor Hull-White model. This model is designed to fit the swaption implied volatility cube and to incorporate the two-curve modeling paradigm. The model exhibits non-Gaussian forward swap rates whose distributions are parameterized across the dimensions of the volatility cube: underlying tenor, option strike and option expiration. To facilitate rapid model calibration, we establish suitable asymptotic expressions for the bond prices. Furthermore, we derive an effective SABR dynamics for each forward swap rate. Finally, we use the mean field approximation to match the effective SABR parameters corresponding to each swaption to the market levels. 

BERT-based Financial Sentiment Index and LSTM-based Stock Return Predictability

J.Z.G. Hiew, X. Huang, H. Mou, D. Li, Q. WU, and Y.B. Xu. [arXiv]

Traditional sentiment construction in finance relies heavily on the dictionary-based approach, with a few exceptions using simple machine learning techniques such as Naive Bayes classifier. While the current literature has not yet invoked the rapid advancement in the natural language processing, we construct in this research a textual-based sentiment index using a well-known pre-trained model BERT developed by Google, especially for three actively trading individual stocks in Hong Kong market with at the same time the hot discussion on Weibo.com. On the one hand, we demonstrate a significant enhancement of applying BERT in financial sentiment analysis when compared with the existing models. On the other hand, by combining with the other two commonly-used methods when it comes to building the sentiment index in the financial literature, i.e., the option-implied and the market-implied approaches, we propose a more general and comprehensive framework for the financial sentiment analysis, and further provide convincing outcomes for the predictability of individual stock return by combining LSTM (with a feature of a nonlinear mapping). It is significantly distinct with the dominating econometric methods in sentiment influence analysis which are all of a nature of linear regression.