Publications without abstracts can be found here.
Leung, M. F., Chan, K. W. and Shao X.(2025+)
Online Generalized Method of Moments for Time Series.
Under review.
Abstract: Online learning has gained popularity in recent years due to the urgent need to analyse large-scale streaming data, which can be collected in perpetuity and serially dependent. This motivates us to develop the online generalized method of moments (OGMM), an explicitly updated estimation and inference framework in the time series setting. The OGMM inherits many properties of offline GMM, such as its broad applicability to many problems in econometrics and statistics, natural accommodation for over-identification, and achievement of semiparametric efficiency under temporal dependence. As an online method, the key gain relative to offline GMM is the vast improvement in time complexity and memory requirement. Building on the OGMM framework, we propose improved versions of online Sargan--Hansen and structural stability tests following recent work in econometrics and statistics. Through Monte Carlo simulations, we observe encouraging finite-sample performance in online instrumental variables regression, online over-identifying restrictions test, online quantile regression, and online anomaly detection. Interesting applications of OGMM to stochastic volatility modelling and inertial sensor calibration are presented to demonstrate the effectiveness of OGMM.
Ma, T. T. and Chan, K. W. (2025+)
Self-normalized jump test for high-frequency data.
Under review.
Abstract: Jump detection based on the high-frequency log-price of stock data is crucial for decision-making in the financial market. Existing jump detection tests commonly admit the form of a jump estimator divided by a consistent normalizer, resulting in a pivotal limiting distribution in the absence of jumps. However, empirical studies have shown that this approach often suffers from a severely inflated type-I error rate. In this paper, a new approach for constructing jump-robust self-normalizers is proposed. The self-normalizer is constructed based on difference variates of several multipower variations of observations that are increasingly spaced away from each other. It converges to a non-degenerate distribution that is proportional to the variance of the jump estimator. The optimal differencing scheme that achieves the highest power is shown to be universal for any given order of multipower in our framework. The proposed tests control the type-I error rate more accurately than the non-self-normalized counterparts.
Lee, C. M. and Chan, K. W. (2025+)
Difference-based variance estimators with repeated measurements.
Under review.
Abstract: In this paper, we formulate a general differencing framework for variance estimation across a range of settings. We demonstrate that conventional difference-based noise variance estimators cannot achieve the desired bias-correcting power in nonparametric regression with repeated measurements. A new high-order bias-corrected differencing scheme, adapted to repeated measurements, is proposed by interlacing inter-group and intra-group differencing. The theoretical properties of the new sequences and estimators are studied. Our proposals are particularly efficient in finite samples and under high signal-to-noise ratio scenarios, where asymptotic convergence has not yet fully taken effect, due to their strong bias-correcting power.
Ma, T. T. and Chan, K. W. (2025+)
Jittered estimators and tests of jumps under infill asymptotics.
Under review.
Abstract: Detecting the jumps of a stochastic process with high-frequency observations is crucial in econometrics. In this paper, we introduce a new concept of autobipower variation as a lagged version of classical bipower variation, and propose a general class of jump test statistics that are averages of $\ell$ autobipower variations jittered by the optimally scaled quadpower variation as a control variate. An intriguing finding is that the asymptotic power of the jump test is universally maximized in the proposed class if a fixed number $\ell=3$ of lagged autobipower variation is used. The optimal scale for the jitter is also free of any data-dependent nuisance parameter. Hence, the proposed jump test is a tuning-free procedure. Theoretically, we derive, in closed form, the asymptotic variance of the estimator of the sum of squared jump magnitudes used in the proposed test statistic under an infill asymptotic framework, and show that it is uniformly smaller than the benchmark estimator based on standard bipower variation. It, therefore, achieves a promising power.
Gong, H. and Chan, K. W. (2025+).
Sensitivity Analysis for Non-ignorable missing data via Generalized Estimating Equation.
Under review.
Abstract: Many incomplete-data statistical inference procedures are developed under the missing at random (MAR) assumption. However, the MAR assumption has been criticized to be overly strong for real-data problems, and is unverifiable by using observed data. To handle data that are missing not at random, sensitivity analysis has been proposed to investigate how conclusions perturb if the unverifiable MAR assumption is violated to a certain degree. This article proposes a new framework called multiple sensitivity models (MSM) for performing general parameter estimation with the generalized estimating equation (GEE) method. Given user-specified sensitivity parameters, a range of estimators is derived by solving the roots of the bounds of MSM-assisted GEEs. Furthermore, we derive a representation for the proposed estimator so that it can be decomposed into several simpler estimators. It allows us to investigate the impact of different missing patterns. An asymptotically valid percentile bootstrap confidence region is also proposed. Theoretical justification is provided together with empirical evidence, which verifies the usefulness of the proposal sensitivity analysis.
Hui, L. L. and Chan, K. W. (2025+).
Rematching estimators for average treatment effects.
To appear in Statistica Sinica.
doi: 10.5705/ss.202024.0306
Abstract: Matching estimators are widely applied in practice for their great intuitive appeal. However, simple matching estimators with a fixed number of matches ($M_0$) are generally inefficient. In this article, we propose matching estimators with a variable number of matches to gain efficiency via rematching. Rather than increasing $M_0$ %the fixed number of matches to gain precision, which introduces an increase in bias, the key is to rematch the treated units from the opposite direction to utilize unmatched control units. Our rematching estimators are applicable to both the average treatment effect and its counterpart for the treated population. The proposed rematching estimators are proven asymptotically valid and uniformly more efficient than matching estimators with the same $M_0$. Simulation results confirm that the proposed rematching estimators substantially improve the simple matching estimators in finite samples. As an empirical illustration, we apply the estimators proposed in this article to the National Supported Work data.
Leung, M. F. and Chan, K. W. (2025+).
Principles of Statistical Inference in Online Problems.
To appear in Bernoulli.
arXiv: https://arxiv.org/abs/2209.05399
website: https://www.e-publications.org/ims/submission/BEJ/user/submissionFile/65105?confirm=4f765502
Abstract: To investigate a dilemma of statistical and computational efficiency faced by long-run variance estimators, we propose a decomposition of kernel weights in a quadratic form and some online inference principles. These proposals allow us to characterize efficient online long-run variance estimators. Our asymptotic theory and simulations show that this principle-driven approach leads to online estimators with a uniformly lower mean squared error than all existing works. We also discuss practical enhancements such as mini-batch and automatic updates to handle fast streaming data and optimal parameters tuning. Beyond variance estimation, we consider the proposals in the context of online quantile regression, online change point detection, Markov chain Monte Carlo convergence diagnosis, and stochastic approximation. Substantial improvements in computational cost and finite-sample statistical properties are observed when we apply our principle-driven variance estimator to original and modified inference procedures.
Liu, X. and Chan, K. W. (2024+).
Positive-definite Converging Kernel Estimation of Long-run Variance.
To appear in Journal of Business & Economic Statistics.
doi: 10.1080/07350015.2024.2432945
Abstract: Kernel estimators have been popular for decades in long-run variance estimation. To minimize the loss of efficiency measured by the mean-squared error in important aspects of kernel estimation, we propose a novel class of converging kernel estimators with three major properties: (a) the optimal bandwidth choice is model-free; (b) positive-definiteness is ensured through a principle-driven aggregation technique with no loss of theoretical efficiency; and (c) potentially misspecified prewhitening models and transformations of the time series do not harm the asymptotic efficiency. A shrinkage prewhitening transformation is proposed for more robust finite-sample performance. The estimator has a positive bias that diminishes with the sample size so that it is more conservative compared with the typically negatively biased classical estimators. The proposal improves upon standard kernel functions and can be well generalized to the multivariate case. We discuss its performance through simulation results and a real-data application in the forecast breakdown test.
Yu, S. Y., Chan, K. W., Lim, K., Siu, I. C. H., Wong, R. H. L., Wan, I. Y. P. (2024)
Lower Recurrence Rate After Surgical Treatment for Primary Spontaneous Pneumothorax Using a Digital Chest Drainage System.
To appear in Innovations: Technology and Techniques in Cardiothoracic and Vascular Surgery.
doi: 10.1177/15569845241272153
Abstract: Objective: This study assessed the impact of digital chest drainage systems for patients undergoing video-assisted thoracoscopic surgery (VATS) pleurodesis for primary spontaneous pneumothorax (PSP) as compared with conventional chest drainage. Methods: A retrospective analysis of patients who underwent VATS pleurodesis for PSP was conducted. The primary outcome was pneumothorax recurrence, while secondary outcomes included time to mobilization, degree of lung expansion, drainage duration, and length of hospital stay. These measures were expressed as average treatment effect and subsequently compared after propensity score adjustment. Results: In total, 125 consecutive patients over a 64-month period were analyzed, with 55 patients in the digital drainage system group and 70 patients in the conventional drainage system group. After propensity score adjustment, the use of a digital drainage system was significantly associated with earlier mobilization (−2.22 days, P < 0.001) and lower rate of recurrence (−11.2%, P = 0.049). Conclusions: The digital drainage system facilitated earlier postoperative free mobilization and resulted in lower pneumothorax recurrence rates.
Leung, C. W. D. and Chan, K. W. (2024).
Testing for Variance Changes under Varying Mean and Serial Correlation.
To appear in Statistica Sinica.
doi: 10.5705/ss.202023.0238
Abstract: Detection of variance change points is statistically difficult when the data exhibit a varying mean structure and autocorrelation. Existing variance change point tests either require the assumption of mean constancy or sacrifice testing power due to serial dependence. This article addresses these problems by proposing a trend-robust and autocorrelation-efficient variance change point test via a differencing approach. This approach removes the mean effect without fitting the mean function. It also allows the test to retrieve the reduced power due to serial dependence. We prove that the optimal difference-based test should minimize the long-run coefficient of variation of the sample second moment of the noises instead of the long-run variance in the presence of serial dependence. The optimal solution can be efficiently computed by fractional quadratic programming. The asymptotic relative efficiency under a local alternative hypothesis is derived. A rate-optimal long-run variance estimator is also proposed. It is proven to be doubly robust against varying mean and variance change points.
Chan, K. W. & Yau, C. Y. (2024).
Asymptotically Constant Risk Estimator of the Time-average Variance Constant.
Biometrika, 111, 825-842.
doi: https://doi.org/10.1093/biomet/asae003
Abstract: Estimation of the time-average variance constant is important for statistical analyses involving dependent data. This problem is difficult as it relies on a bandwidth parameter. Specifically, the optimal choices of the bandwidths of all existing estimators depend on the estimand itself and another unknown parameter that is very difficult to estimate. Thus, optimal variance estimation is unachievable. In this paper, we introduce a concept of converging flat-top kernels for constructing variance estimators whose optimal bandwidths are free of unknown parameters asymptotically and hence can be computed easily. We prove that the new estimator has an asymptotically constant risk and is locally asymptotically minimax.
Ip, M. F. and Chan, K. W. (2024).
Inference of Coarsened Time Series via Generalized Method of Moments.
Journal of Time Series Analysis, 45, 823-846.
doi: https://doi.org/10.1111/jtsa.12740
Abstract: We study statistical inference procedures in coarsened time series through the generalized method of moments. A new model for the coarsened time series via multiple potential outcomes is proposed. It can be naturally extended for inferring multi-variate coarsened time series. We show that this framework generates a general class of estimators. It neatly generalizes the classical Horvitz–Thompson estimator for handling coarsened time series data. Asymptotic properties, including consistency and limiting distribution, of the proposed estimators are investigated. Estimators of the optimal weight matrix and the long-run covariance matrix are also derived. In particular, confidence intervals of the mean function of the potential outcome as a function of coarsening index can be constructed. A real-data application on air quality in the USA is investigated.
Cheng, C. H. and Chan, K. W. (2024)
A General Framework For Constructing Locally Self-Normalized Multiple-Change-Point Tests.
Journal of Business & Economic Statistics, 42, 719–731.
doi: 10.1080/07350015.2023.2231041
Abstract: We propose a general framework to construct self-normalized multiple-change-point tests with time series data. The only building block is a user-specified single-change-detecting statistic, which covers a large class of popular methods, including the cumulative sum process, outlier-robust rank statistics, and order statistics. The proposed test statistic does not require robust and consistent estimation of nuisance parameters, selection of bandwidth parameters, nor pre-specification of the number of change points. The finite-sample performance shows that the proposed test is size-accurate, robust against misspecification of the alternative hypothesis, and more powerful than existing methods. Case studies of the Shanghai-Hong Kong Stock Connect turnover are provided.
To, H. K. and Chan, K. W. (2024).
Mean Stationarity Test in Time Series: A Signal Variance-based Approach.
Bernoulli, 30, 1231-1256.
doi: 10.3150/23-BEJ1630
Abstract: Inference of mean structure is an important problem in time series analysis. Various tests have been developed to test for different mean structures, for example, the presence of structural breaks, and parametric mean structures. However, many of them are designed for handling specific mean structures, and may lose power upon violation of such structural assumptions. In this paper, we propose a new mean stationarity test built around the signal variance. The proposed test is based on a super-efficient estimator which could achieve a convergence rate faster than $\sqrt{n}$ . It can detect non-constancy of the mean function under serial dependence. It is shown to have promising power, especially in detecting hardly noticeable oscillating structures. The proposal is further generalized to test for smooth trend structures and relative signal variability.
Yu, S. Y., Chan, K. W., Tsui, C. O., Chan S. and Thung K. H. (2023).
Non-Steroidal Anti-inflammatory Drugs Reduce Pleural Adhesion in Human: Evidence from Redo Surgery.
Scientific Reports, 13, 14578.
doi: 10.1038/s41598-023-41680-7
Abstract: Non-steroidal anti-inflammatory drugs (NSAIDs) reduced pleural adhesion in animal studies, but its effect on human had not been studied. A retrospective study was carried out for patients with solitary pulmonary nodules without a pre-operative tissue diagnosis positive for malignancy. The impact of the use of NSAIDs after stage one wedge resection was assessed by the degree of pleural adhesions encountered during second-stage, redo completion lobectomy. From April 2016 to March 2022, 50 consecutive patients meeting the inclusion criteria were included, and 44 patients were selected for analysis after exclusion (Treatment group with NSAID: N = 27; Control group without NSAID: N = 17). The preoperative characteristics and the final tumor pathologies were similar between the groups. The use of NSAID was significantly associated with lower risk of severe pleural adhesions and complete pleural symphysis (risk difference = −29%, p = 0.03). After controlling the effect of tumor size and chest drain duration, only the use of NSAID was statistically associated with the lowered risk of severe pleural adhesions and complete pleural symphysis. No statistically significant effects of NSAID on operative time (p = 0.86), blood loss (p = 0.72), and post-operative length of stay (p = 0.72) were demonstrated. In human, NSAIDs attenuated the formation of pleural adhesions after pleural disruptions. Physicians and surgeons should avoid the use of NSAIDs when pleural adhesion formation is the intended treatment outcome.
Chan, K. W. (2022).
Optimal Difference-based Variance Estimators in Time Series: A General Framework.
Annals of Statistics, 50, 1376–1400.
doi: 10.1214/21-AOS2154
Abstract: Variance estimation is important for statistical inference. It becomes nontrivial when observations are masked by serial dependence structures and time-varying mean structures. Existing methods either ignore or sub-optimally handle these nuisance structures. This paper develops a general framework for the estimation of the long-run variance for time series with nonconstant means. The building blocks are difference statistics. The proposed class of estimators is general enough to cover many existing estimators. Necessary and sufficient conditions for consistency are investigated. The first asymptotically optimal estimator is derived. Our proposed estimator is theoretically proven to be invariant to arbitrary mean structures, which may include trends and a possibly divergent number of discontinuities.
Chan, K. W. (2022).
General and Feasible Tests with Multiply-Imputed Datasets.
Annals of Statistics, 50, 930–948.
doi: 10.1214/21-AOS2132
Abstract: Multiple imputation (MI) is a technique especially designed for handling missing data in public-use datasets. It allows analysts to perform incomplete-data inference straightforwardly by using several already imputed datasets released by the dataset owners. However, the existing MI tests require either a restrictive assumption on the missing-data mechanism, known as equal odds of missing information (EOMI), or an infinite number of imputations. Some of them also require analysts to have access to restrictive or nonstandard computer subroutines. Besides, the existing MI testing procedures cover only Wald’s tests and likelihood ratio tests but not Rao’s score tests, therefore, these MI testing procedures are not general enough. In addition, the MI Wald’s tests and MI likelihood ratio tests are not procedurally identical, so analysts need to resort to distinct algorithms for implementation. In this paper, we propose a general MI procedure, called stacked multiple imputation (SMI), for performing Wald’s tests, likelihood ratio tests and Rao’s score tests by a unified algorithm. SMI requires neither EOMI nor an infinite number of imputations. It is particularly feasible for analysts as they just need to use a complete-data testing device for performing the corresponding incomplete-data test.
Chan, K. W. & Meng, X.-L. (2022).
Multiple Improvements of Multiple Imputation Likelihood Ratio Tests.
Statistica Sinica, 32, 1489–1514.
doi: 10.5705/ss.202019.0314
Abstract: Multiple imputation (MI) inference handles missing data by imputing the missing values m times, and then combining the results from the $m$ complete-data analyses. However, the existing method for combining likelihood ratio tests (LRTs) has multiple defects: (i) the combined test statistic can be negative, but its null distribution is approximated by an $F$-distribution; (ii) it is not invariant to re-parametrization; (iii) it fails to ensure monotonic power owing to its use of an inconsistent estimator of the fraction of missing information (FMI) under the alternative hypothesis; and (iv) it requires nontrivial access to the LRT statistic as a function of parameters instead of data sets. We show, using both theoretical derivations and empirical investigations, that essentially all of these problems can be straightforwardly addressed if we are willing to perform an additional LRT by stacking the m completed data sets as one big completed data set. This enables users to implement the MI LRT without modifying the complete-data procedure. A particularly intriguing finding is that the FMI can be estimated consistently by an LRT statistic for testing whether the m completed data sets can be regarded effectively as samples coming from a common model. Practical guidelines are provided based on an extensive comparison of existing MI tests. Issues related to nuisance parameters are also discussed.
Chan, K. W. (2022).
Mean-structure and Autocorrelation Consistent Covariance Matrix Estimation.
Journal of Business & Economic Statistics, 40, 201–215.
doi: 10.1080/07350015.2020.1796397
Abstract: We consider estimation of the asymptotic covariance matrix in nonstationary time series. A nonparametric estimator that is robust against unknown forms of trends and possibly a divergent number of change points (CPs) is proposed. It is algorithmically fast because neither a search for CPs, estimation of trends, nor cross-validation is required. Together with our proposed automatic optimal bandwidth selector, the resulting estimator is both statistically and computationally efficient. It is, therefore, useful in many statistical procedures, for example, CPs detection and construction of simultaneous confidence bands of trends. Empirical studies on four stock market indices are also discussed.
Yu, P., Chan, K. W., Lau, R., Wan, I., Chen, G. and Ng, C. (2021).
Uniportal vs. Multiportal Video-Assisted Thoracic Surgery for Major Lung Resection: Fewer Incisions, Less Immunochemokine Disturbances.
Scientific Reports, 11, 1–6.
doi: 10.1038/s41598-021-89598-2
Abstract: Multiportal video-assisted thoracic surgery (VATS) for major lung resection causes less immunochemokine production compared to thoracotomy. Whether uniportal VATS is similarly associated with lower early postoperative circulating levels of immunochemokines compared to multiportal VATS have not been studied. Selected patients who received uniportal or multiportal VATS major lung resection were recruited. Blood samples were collected preoperatively and on postoperative days 1 and 3 for enzyme linked immunosorbent assay of serum levels of Tissue Inhibitor of Metalloproteinase (TIMP)-1, Insulin Growth Factor Binding Protein (IGFBP)-3, and Matrix Metalloproteinase (MMP)-9. A linear mixed-effects models were used to analyze the effects of uniportal VATS on the postoperative circulating chemokine levels. From March 2014 to April 2017, 68 consecutive patients consented for the prospective study and received major lung resection by either uniportal VATS (N = 29) or multiportal VATS (N = 39) were identified. Uniportal VATS major lung resection was associated with lower post-operative levels of TIMP-1 and MMP-9 compared to multiportal VATS after controlling for the effects of the corresponding baseline level and the time of follow-up measurement. No difference was observed for the level of IGFBP-3. Less immunochemokine disturbances was observed after uniportal VATS major lung resection compared to multiportal VATS.
Chan, K. W. & Yau, C. Y. (2017).
High Order Corrected Estimator of Asymptotic Variance with Optimal Bandwidth.
Scandinavian Journal of Statistics, 44, 866–898.
doi: 10.1111/sjos.12279
Abstract: Estimation of time-average variance constant (TAVC), which is the asymptotic variance of the sample mean of a dependent process, is of fundamental importance in various fields of statistics. For frequentists, it is crucial for constructing confidence interval of mean, and serving as a normalizing constant in various test statistics, etc. For Bayesians, it is widely used for evaluating effective sample size, and conducting convergence diagnosis in Markov chain Monte Carlo method. In this paper, by considering high order corrections to the asymptotic biases, we develop a new class of TAVC estimators that enjoys optimal $\mathcal{L}^2$-convergence rates under different degrees of the serial dependence of stochastic processes. The high order correction procedure is applicable to estimation of the so-called smoothness parameter, which is essential in determining the optimal bandwidth. Comparisons to existing TAVC estimators are comprehensively investigated. In particular, the proposed optimal high order corrected estimator has the best performance in terms of mean squared error.
Chan, K. W. & Yau, C. Y. (2017).
Automatic Optimal Batch Size Selection for Recursive Estimators of Time-Average Covariance Matrix.
Journal of the American Statistical Association, 112, 1076-1089.
doi: 10.1080/01621459.2016.1189337
Abstract: The time-average covariance matrix (TACM) $\bm{\Sigma}:=\sum_{k\in\mathbb{Z}}\bm{\Gamma}_k$, where $\bm{\Gamma}_k$ is the auto-covariance function, is an important quantity for the inference of the mean of a $\mathbb{R}^d$-valued stationary process ($d\geq 1$). This paper proposes two recursive estimators for $\bm{\Sigma}$ with optimal asymptotic mean square error (AMSE) under different strengths of serial dependence. The optimal estimator involves a batch size selection, which requires knowledge of a smoothness parameter $\bm{\Upsilon}_{\beta}:=\sum_{k\in\mathbb{Z}} |k|^{\beta} \bm{\Gamma}_k$, for some $\beta$. This paper also develops recursive estimators for $\bm{\Upsilon}_{\beta}$. Combining these two estimators, we obtain a fully automatic procedure for optimal on-line estimation for $\bm{\Sigma}$. Consistency and convergence rates of the proposed estimators are derived. Applications to confidence region construction and Markov Chain Monte Carlo convergence diagnosis are discussed.
Chan, K. W. & Yau, C. Y. (2016).
New Recursive Estimators of The Time-average Variance Constant.
Statistics and Computing, 26, 609-627.
doi: 10.1007/s11222-015-9548-7
Abstract: Estimation of the time-average variance constant (TAVC) of a stationary process plays a fundamental role in statistical inference for the mean of a stochastic process. Wu (2009) proposed an efficient algorithm to recursively compute the TAVC with $O(1)$ memory and computational complexity. In this paper, we propose two new recursive TAVC estimators that can compute TAVC estimate with $O(1)$ computational complexity. One of them is uniformly better than Wu's estimator in terms of asymptotic mean squared error (MSE) at a cost of slightly higher memory complexity. The other preserves the $O(1)$ memory complexity and is better than Wu's estimator in most situations. Moreover, the first estimator is nearly optimal in the sense that its asymptotic MSE is $2^{10/3}3^{-2} \fallingdotseq 1.12$ times that of the optimal off-line TAVC estimator.