Below are the econometrics seminars scheduled at the School of Economics, University of Sydney. If you would like to find more information or would like to present in our seminars, please contact our seminar coordinator Dakyung Seong (dakyung.seong@sydney.edu.au)
The econometrics seminars will be held from 2:00 PM to 3:30 PM in the Social Sciences Building (SSB) at the University of Sydney.
Title: Online Updating for Linear Panel Regressions
Abstract: In this paper, we develop online updating methods for linear panel regression models. Online updating refers to procedures for sequentially updating parameter estimates as new data become available. In practice, the potential size of the dataset or data confidentiality constraints may preclude researchers from storing or accessing the entire dataset. We propose an online updating procedure for widely used linear regression models in panel data. Panel data involves two possible types of data expansions: (1) the arrival of new units, or (2) the arrival of additional time periods for existing units. We demonstrate procedures to update both regression coefficients and variance for each case.
Title: Estimation and Inference for Fiscal Multipliers Identified with External Instruments
Abstract: Many empirical macroeconomic studies use external instruments to estimate the fiscal multiplier. However, existing research often overlooks potential identification failures when an instrument is correlated with multiple components of the fiscal shock. Additionally, the issue of weak instruments has received little attention in the literature. In this paper, we propose an Anderson and Rubin (1949) type test that is applicable to the estimator suggested by Koo et al. (2022), which correctly identifies dynamic causal effects. This test statistic is fully robust to the weak instrument problem. Our empirical findings estimate the two-year cumulative government investment multiplier in Korea to be 2.8771.
Title: Measuring the Cost of Latency in Likelihood Inference for Continuous-Time Models with Latent Variables (joint with Chenxu Li and Yuheng Zheng)
Abstract: We establish the asymptotic properties of marginal-information maximum likelihood estimators for discretely-sampled continuous-time diffusion models with latent factors, and compare them to those of full-information estimators. We seek to quantify the cost of latency, which we define as the amount of information lost when a variable is latent instead of observed. We show that the cost of latency depends on the structure of the model, specifically whether the latent variable affects the drift or the diffusion functions. To obtain these results, we employ filtering techniques and develop a new theory of nonlinear continuous-time pseudo-filters based on stochastic partial differential equations. Monte Carlo evidence supports the theoretical findings.
Title: Universal Copulas
Abstract: Copulas have emerged over the last decades as primary statistical tools for modelling dependence between random variables. A copula is classically understood as a cumulative distribution function on the unit hypercube with standard uniform margins – we refer to such distributions as “Sklar’s copulas”, owing to their central role in the decomposition of multivariate distributions established by the celebrated Sklar's theorem. A standard argument in favour of copula models is that they separate the dependence structure (encoded by the copula) from the marginal behaviour of individual components. However, this interpretation holds only in the continuous case: outside it, copulas lose their “margin-free” nature, rendering Sklar’s construction unsuitable for modelling dependence between non-continuous variables. In this work, we argue that the notion of a copula need not be confined to Sklar’s framework. We propose an alternative definition -- universal copulas -- based on a fundamental characterisation of dependence. This new definition agrees with Sklar’s copulas in the continuous case, but yields distinct and more suitable constructions in discrete or mixed settings. Universal copulas retain key properties such as margin-freeness, making them sound and effective beyond the continuous realm. We illustrate their use through examples involving discrete variables and mixed pairs, such as one continuous and one Bernoulli variable.
Title: Network Analysis of ESG and SDG across Legal Origins
Abstract: We explore the relationship between ESG and SDG across legal origins (English, French, German, Scandinavian) using an integrated panel data model with local spillovers, global shocks and parameter heterogeneity. Applying the CCEX-IV approach and GCM network analysis, we find that civil legal origins display higher direct effects of ESG on SDG while common legal origin exhibits stronger spill-in effects. But, network analysis of aggregate ESG and its three pillars indicates that English legal origin achieves the highest direct effects while still becoming a downstream adopter of positive E/S/G influence on ESG. Furthermore, pronounced network spillover heterogeneity is observed within civil legal origins, with German legal origin being the upstream supplier of forward-looking ESG/SDG standards in the network. This challenges the presumption of the binary classification of the legal origin theory that one legal origin holds universal superiority over another, which may oversimplify the complex interplay between law systems and socio-economic relationships.
Title: p-Hacking Instrument Selection
Abstract: Recent evidence suggests that p-hacking is particularly common in IV research. This article posits that selection across multiple available instruments is a likely cause of this. We study the properties of the 2SLS/GMM estimator when researchers select among multiple instruments to minimize the p-value in either the first or second stage. Due to the mechanical correlation between the coefficients and standard error across samples in 2SLS/GMM, i.e. the power asymmetry problem first outlined in Keane and Neal (2023), p-hacking either stage severely exacerbates both size and median bias towards OLS. We show that the use of robust tests such as the Anderson-Rubin and Conditional Likelihood tests can somewhat alleviate these problems when p-hacking the second stage of the regression, but does not alleviate the problems associated with p-hacking the first stage of the regression. Implications for applied research in IV will be extensively discussed throughout.
Title: High-dimensional forecasting with known knowns and known unknowns
Abstract: Forecasts play a central role in decision making under uncertainty. After a brief review of the general issues, this paper considers ways of using high-dimensional data in forecasting. We consider selecting variables from a known active set, known knowns, using Lasso and OCMT, and approximating unobserved latent factors, known unknowns, by various means. This combines both sparse and dense approaches to forecasting. We demonstrate the various issues involved in variable selection in a high-dimensional setting with an application to forecasting UK inflation at different horizons over the period 2020q1-2023q1. This application shows both the power of parsimonious models and the importance of allowing for global variables.
Title: Robust Regression Discontinuity Extrapolation
Abstract: This paper studies the identification and estimation of regression discontinuity (RD) extrapolation processes that measure policy effects away from the running variable cutoff. Our proposed semi-parametric identification strategy uses weaker assumptions than those previously adopted in the literature and, at the same time, enjoys a new robustness property of reducing to classic nonparametric RD identification when the magnitude of extrapolation goes to zero. We applied our proposed method to extend the empirical analysis in Lindo et al. (2010) on college academic probation. Our method allows us to estimate the effect of academic probation for students not exactly at the probation GPA cutoff.
Title: What Impulse Response Do Instrumental Variables Identify?
Abstract: This paper addresses the limitations of the local projection IV (LP-IV) framework in identifying impulse responses to macroeconomic shocks. We show that an LP-IV estimand may lack a causal interpretation, as it represents a weighted average of component-wise impulse responses, potentially with negative weights. These negative weights arise when the IV and shock components exhibit opposite signs of correlation. To address these limitations, we develop novel identification strategies using additional disaggregated variables or sign restrictions, particularly in the presence of multiple IVs. Our approach combines multiple LP-IV estimands with sign restrictions to construct identified sets for component-wise impulse responses. In contrast, conventional methods such as two-stage least squares (2SLS) fail to identify structural parameters in this context, and the identified set may be unbounded, challenging the validity of Bayesian approaches. We illustrate the practical significance of our findings through a detailed analysis of government spending multipliers and monetary policy shocks, highlighting the importance of accounting for the composite nature of macroeconomic shocks in policy analysis.
Title: Identification of dynamic treatment effects when treatment histories are partially observed
Abstract: This paper proposes a class of methods for identifying and estimating dynamic treatment effects when outcomes depend on the entire treatment path and treatment histories are only partially observed. We advocate for the approach which we refer to as ‘robust’ that identifies path-dependent treatment effects for different mover subpopulations under misspecification of any one of three models involved (outcome, propensity score, or missing data models). Our approach can handle fixed, absorbing, sequential, or simultaneous treatment regimes where missing treatment histories may obfuscate identification of causal effects. Numerical experiments demonstrate how the proposed estimator compares to traditional complete-case methods. We find that the missingness-adjusted estimates have negligible bias compared to their complete-case counterparts. As an illustration, we apply the proposed class of adjustment methods to estimate dynamic effects of COVID-19 on voter turnout in the 2022 U.S. general elections. We find that counties that experienced above-average number of cases in 2020 and 2021 had a statistically significant reduction in voter turnout compared to those that did not.
Title: Inference for Linear IV Models with Misspecification and Weak Identification
Abstract: In this paper, we study the inference problem under misspecification in a linear instrumental variable (IV) regression with potential weak identification. Under weak IV, inference is typically done via hypothesis testing. Thus we first introduce the concept of pseudo-true parameter value for a test and investigate the pseudo-true value for the 2SLS-t test, the Anderson-Rubin (AR) test, the Lagrange Multiplier (LM), and the conditional likelihood ratio (CLR) test. We find that only the 2SLS-t test has a unique proper pseudo-true value that has an economic interpretation in a commonly considered scenario. We then focus attention on the 2SLS-t test pseudo-true value and design a group of tests that are robust to both weak IV and misspecification.
Title: Identification in Many-to-Many Two-Sided Matching without Transfers
Abstract: This paper develops a structural econometric model of many-to-many matching with endogenous link formation under max-min preferences. Agents on both sides of the market form subsets of partners subject to capacity constraints, choosing links to maximize the minimum utility received across all connections. I characterize the limiting behavior of stable matchings as market size grows and establish convergence of conditional choice probabilities to closed-form expressions governed by inclusive value functions. I use degree distributions as a novel source of identification, and show that the degree of each agent follows a binomial distribution with parameters determined by inclusive values, which are in turn pinned down by equilibrium behavior. The results extend existing one-to-one matching frameworks to more complex many-to-many environments.
Title: Flexible Negative Binomial Mixtures for Credible Mode Inference in Heterogeneous Count Data from Finance, Economics and Bioinformatics
Abstract: In several scientific fields, such as finance, economics and bioinformatics, important theoretical and practical issues exist involving multimodal and asymmetric count data distributions due to heterogeneity of the underlying population. For accurate approximation of such distributions we introduce a novel class of flexible mixtures consisting of shifted negative binomial distributions, which accommodates a wide range of shapes that are commonly seen in these data. We further introduce a convenient reparameterization which is more closely related to a moment interpretation and facilitates the specification of prior information and the Monte Carlo simulation of the posterior. This mixture process is estimated by the sparse finite mixture Markov chain Monte method since it can handle a flexible number of non-empty components. Given loan payment, inflation expectation and DNA count data, we find coherent evidence on number and location of modes, fat tails and implied uncertainty measures, in contrast to conflicting evidence obtained from well-known frequentist tests. The proposed methodology may lead to more accurate measures of uncertainty and risk which improves prediction and policy analysis using multimodal and asymmetric count data.
Title: Estimation of random cycles in persistent time series
Abstract: A number of economic, financial, and climatic time series exhibit persistent cycles which are characterised by dependence patterns and peaks in the spectrum. In this paper, we introduce a class of semiparametric cyclical-memory processes which enable the modelling of random cyclical patterns in stationary and non-stationary time series. We develop a theoretical background and asymptotic estimation theory for the frequency of a cycle represented by the location of a peak in the spectrum. The estimation procedure is easy to implement and allows for the construction of narrow confidence intervals around the location point. Monte Carlo simulations confirm the good finite sample performance of our estimator. We illustrate our method with three empirical applications. We uncover (quasi-) periodic cycles in macroeconomic series, both nominal and real (US nominal GDP and real industrial production), and CO2 concentration levels.
Title: Flexible Covariate Adjustments in Regression Discontinuity Designs
Abstract: Empirical regression discontinuity (RD) studies often use covariates to increase the precision of their estimates. In this paper, we propose a novel class of estimators that use such covariate information more efficiently than the linear adjustment estimators that are currently used widely in practice. Our approach can accommodate a possibly large number of either discrete or continuous covariates. It involves running a standard RD analysis with an appropriately modified outcome variable, which takes the form of the difference between the original outcome and a function of the covariates. We characterize the function that leads to the estimator with the smallest asymptotic variance, and show how it can be estimated via modern machine learning, nonparametric regression, or classical parametric methods. The resulting estimator is easy to implement, as tuning parameters can be chosen as in a conventional RD analysis. An extensive simulation study illustrates the performance of our approach.
Title: Multiple Testing for the Topology of Financial Networks
Abstract: This paper advances the econometric analysis of network connectedness by introducing exact simulation-based inference methods to assess pairwise and aggregated spillover effects among variables within vector autoregression models. While the estimation of connectedness using forecast error variance decompositions is well-established, our contribution lies in the development of hypothesis testing procedures that provide a statistical foundation to formally test the significance of connectedness measures. To address the multiple testing issue, we present detailed algorithms for both single-step and step-down p-value adjustments to effectively control the familywise error rate in finite samples. We also extend our methodology to include cluster-based analysis, thereby broadening the applicability of our approach. Simulation results confirm that our procedures not only ensure the simultaneous finite-sample correctness of the set of inferences but also demonstrate good statistical power. Furthermore, we illustrate our inference procedures through empirical applications that analyze return and volatility spillovers among global stock markets, revealing new insights into financial market linkages.
Title: Approximation bounds for conditional expectations and nonparametric regressions: theory and inference
Abstract: This paper proposes a bound approach to nonparametric regression. The object of interest, the conditional expectation E[Y|X], is in general unknown and difficult to identify. In the spirit of the parsimony principle, we employ a simple parametric model to approximate the true model and then bound the approximation error using concentration inequalities to build confidence sets for E[Y|X]. Our approach is valid under less stringent regularity assumptions than conventional nonparametric methods, such as kernel regression and the sieve method. In particular, our framework allows for incomplete identification of the regression function and inference takes the form of sets in a partial identification framework. We show that approximation bounds can be built using only moments of observables and discuss how shape restrictions (e.g. smoothness) can be exploited to improve such bounds. We study optimality and uniformity of the proposed bounds by using the concepts of sharpness and honesty criteria. Inference only requires estimation of a simple parametric model and moments of observables along with results from the theory of extremum estimation. Thus, it is easy to implement and enjoys favorable finite-sample properties. Our Monte Carlo simulation studies compare our method with alternative methods (Nadaraya-Watson, local linear, the sieve method, random forest, LASSO, and neural network) in terms of the average widths and coverage probabilities of associated confidence sets and the mean squared error of point estimates. Results show that the proposed method delivers valid confidence sets in cases where the other methods fail or cannot provide confidence sets at all. As an empirical application, we apply our method to inference for auto miles-per-gallon based on car attributes, the dataset of which is available from the UCI machine learning repository. Our method yields confidence sets with the shortest width while maintaining the size and generates best out-of-sample predictions based on point estimates. These findings support our theoretical results on finite-sample properties. In another application, we demonstrate how our bound approach provides economically significant information regarding the shape of regression curves, using household consumption data.
Title: Estimation and Inference in Dyadic Network Formation Models with Nontransferable Utilities
Abstract: This paper studies estimation and inference in a dyadic network formation model with unobserved individual fixed effects and nontransferable utilities. We propose a one-step debiased maximum likelihood estimator using the “bagging” technique for the homophily parameters that achieves the efficiency bound asymptotically and is easy to compute. We also provide {l_infinity}-consistent estimator for the fixed effects and prove asymptotic normality for the unconditional average partial effects. Simulation studies show that our method works well with finite samples, and an empirical application using the risk-sharing data from Nyakatoke highlights the importance of employing appropriate statistical inferential procedures.
Title: Extreme Value Theory with Heterogeneous Agents
Abstract: Extreme value processes are widespread in economics. Typically, agents receive a number of draws from some distribution and we examine the behavior of the maximum in the limit as the number of draws becomes large. This paper asks: Do the outcomes of such processes change when different agents may receive a different number of draws? To answer this, we allow the number of draws an agent receives from the underlying distribution (e.g. of productivities, ideas, or utility shocks) to be given by a search technology, which can be interpreted as representing either search frictions or heterogeneity across different types of agents. We derive a general expression for the extreme value distribution and show that it need not assume any of the three standard types. We generalize a result from Gabaix et al. (2016) regarding extreme value outcomes and consider some applications to aggregate productivity, markups, and social networks.
Title: On the Modelling and Prediction of High-Dimensional Functional Time Series
Abstract: We propose a two-step procedure to model and predict high-dimensional functional time series, where the number of function-valued time series p is large in relation to the length of time series n. Our first step performs an eigenanalysis of a positive definite matrix, which leads to a one-to-one linear transformation for the original high-dimensional functional time series, and the transformed curve series can be segmented into several groups such that any two subseries from any two different groups are uncorrelated both contemporaneously and serially. Consequently in our second step those groups are handled separately without the information loss on the overall linear dynamic structure. The second step is devoted to establishing a finite-dimensional dynamical structure for all the transformed functional time series within each group. Furthermore the finite-dimensional structure is represented by that of a vector time series. Modelling and forecasting for the original high-dimensional functional time series are realized via those for the vector time series in all the groups. We investigate the theoretical properties of our proposed methods, and illustrate the finite-sample performance through both extensive simulation and two real datasets.
Title: Minimax regret treatment rules with finite samples when a quantile is the object of interest
Abstract: We study minimax regret treatment rules in finite samples under matched treatment assignment in a setup where a policymaker, informed by a sample, needs to decide between T different treatments for a T≥2. Randomized rules are allowed for. We show that the generalization of the minimax regret rule derived in Stoye (2009) for the case T = 2 is minimax regret for general finite T > 2. We also show by example, that in the case of random assignment the generalization of the minimax rule in Stoye (2009) to the case T > 2 is not necessarily minimax regret and derive minimax regret rules for a few small sample cases, e.g. for N = 2 when T = 3. We also discuss numerical approaches to approximate minimax regret rules for unbalanced panels. We then study minimax regret treatment rules in finite samples when a specific quantile (rather than expected outcome) is the object of interest. We establish that all treatment rules are minimax regret under "matched" and "random sampling" schemes while under "testing an innovation" no-data rules are shown to be minimax regret.
Title: Testing general linear hypotheses under a high-dimensional multivariate regression model with spiked noise covariance
Abstract: We consider the problem of testing linear hypotheses under a multivariate regression model with a high-dimensional response and spiked noise covariance. The proposed family of tests consists of test statistics based on a weighted sum of projections of the data onto the estimated latent factor directions, with the weights acting as the regularization parameters. We establish asymptotic normality of the test statistics under the null hypothesis. We also establish the power characteristics of the tests and propose a data-driven choice of the regularization parameters under a family of local alternatives. The performance of the proposed tests is evaluated through a simulation study. Finally, the proposed tests are applied to the Human Connectome Project data to test for the presence of associations between volumetric measurements of human brain and behavioral variables.
Title: Common Trends and Long-Run Multipliers in Nonlinear Structural VARs
Abstract: While it is widely recognised that linear (structural) VARs may omit important features of macroeconomic time series, the use of nonlinear SVARs has to date been almost entirely confined to the modelling of stationary time series, because of a lack of understanding as to how common stochastic trends may be accommodated within nonlinear models. This has unfortunately circumscribed the range of macroeconomic series to which such models can be applied -- and/or required that these series be first transformed to stationarity, a potential source of misspecification -- and prevented the use of long-run identifying restrictions in these models. To address these problems, we develop a flexible class of nonlinear SVARs, which subsume models with threshold-type endogenous regime switching, both of the piecewise linear and smooth transition varieties. We extend the Granger-Johansen representation theorem to this class of models, obtaining conditions that specialise exactly to the usual ones when the model is linear. These models are shown capable of supporting both linear and nonlinear forms of cointegration, where the latter is understood in the profound sense of series having common nonlinear stochastic trends, with possibly nonlinear cointegrating relations between those trends. We further show that, as a consequence, these models are capable of supporting the same kinds of long-run identifying restrictions as are available in linearly cointegrated SVARs.
Title: The econometrics of financial duration models: likelihood-based estimation and asymptotics
Abstract: Financial durations models are widely used in finance to model time between events such as trades, stock price movements, or other financial events. A workhorse in the literature is the ACD(1,1) model by Engle and Russell (Econometrica, 1998). We show although the likelihood of the model resembles the Gaussian (G)ARCH likelihood, the asymptotic theory for ACD non-standard and different from GARCH asymptotics. In particular, the behavior of likelihood estimators in ACD models is highly sensitive to the tail behavior of the financial durations, with asymptotic normality breaking down when the tail indices of the durations are smaller than unity. Our results exploit the fact that for duration data the number of observations within any given time span is random.
Title: The econometrics of financial duration models: bootstrap-based testing and inference
Abstract: ACD models are a workhorse in the literature of financial duration modeling. Because of their similarity with GARCH models, estimation is usually carried out using quasi maximum likelihood. However, the asymptotics properties of the MLE are non-standard, with the finite sample distributions being different from its asymptotic approximation. We discuss the theory for the bootstrap in this framework. Because for duration data the number of observations within any given time span is random, bootstrap implementations are non-standard; extant bootstrap algorithms such those discussed in the literature on the general class of so-called multiplicative error models are not valid in the ACD setting. We propose discuss a novel bootstrap which works irrespectively of the tail index of the durations. The implications of our results to different point process and renewal models are also discussed.
Title: Local asymptotic minimax inference for set-identified impulse responses
Abstract: This paper considers a local asymptotic minimax inference for set-identified impulse responses. Structural impulse responses, the key causal parameters in applied macroeconomic models, are only identified as a set without stringent restrictions. Under set-identification, the bounds of the identified set are typically characterized as the maximum or minimum of some parametric functions, which may be non-differentiable, e.g. when multiple structural models are consistent with the boundary parameter value. In this case, the standard "plug-in" approach leads to invalid inference. In addition, robust Bayesian credible sets by Giacomini and Kitagawa (2021) are no longer valid confidence regions in the frequentist perspective. We demonstrate that our local asymptotic minimax framework provides adequate tools for defining estimator optimality and conducting inference even when these existing methods do not apply. The proposed confidence regions achieve desired asymptotic sizes, and are optimal in terms of the average volume. As an empirical illustration, we study inference for impulse responses to the monetary policy shock using the dataset in Jarocinski and Karadi (2020).
Title: A Unidimensional Representation of Multidimensional Inequality
Abstract: This paper introduces a novel approach for analyzing multidimensional inequality. We use a non-parametric copula-based approach and a transformed survival function to capture the impact of inequality at the individual level. We then define indices of multidimensional inequality as aggregations of these individual impacts and introduce two new graphical tools. These graphical tools leverage the unidimensional representation to provide an intuitive representation of multidimensional inequality. Through this representation we derive dominance conditions that allow for the identification of robust rankings of multidimensional inequality. This paper also pushes boundaries in stochastic dominance estimation and inference showcasing new non-parametric statistical estimators related to our new graphical tools and dominance conditions, as well as establishing the asymptotic properties of those estimators. As a practical demonstration, we apply our method to analyze data from the Harmonized Household Income and Expenditure Surveys for three Arab countries and the Arab Barometer survey for ten Arab countries.
Title: Dynamic spatial interaction models for a leader's resource allocation and followers' multiple activities
Abstract: This paper introduces a new spatial interaction model designed to explore the decision-making processes of two types of agents: a leader and followers. In our empirical study, these agents symbolize the central and local governments. The model's objective is to account for three pivotal features: (i) resource allocations from the leader to the followers, along with the resulting strategic interactions, (ii) followers' multiple activities, and (iii) interactions among followers. We develop a network game to delve into the micro-foundations of these processes. Within this game, the followers undertake multiple activities while the leader distributes resources among them. The game's Nash equilibrium (NE) subsequently informs our econometric framework. By producing equilibrium measures, this NE facilitates understanding the short-run impacts of changes in followers' characteristics and their long-term consequences. For estimating agent payoff parameters, we adopt the quasi-maximum likelihood (QML) method and study the asymptotic properties of the QML estimator, ensuring robust statistical inferences. Empirically, we explore interactions among U.S. states in public welfare and housing development, examining how federal grants influence these expenditures. Our findings indicate positive spillovers in states’ welfare spending and suggest that welfare and housing expenditures act as substitutes. Additionally, we observe significant positive effects of federal grants on both types of expenditures.