• William Li, Shanghai Advanced Institute of Finance

    • Title: On optimal designs in information-based optimal subdata - A systematic view of a data reduction strategy with application to second-order model

    • Abstract: With the urgent need of analyzing extraordinary amount of data, information-based optimal subdata selection (IBOSS) approach has gained considerable attention in the recent literature due to its ability of maintaining rich information within the full datset with limited subdata size. On the other hand, there is still lack of systematically exploring the framework, especially on the characterization of the optimal subset, the key step of developing the associated algorithm. Motivated by a real finance case study concerning the impact of corporate attributes on firm value, we systematically explore the framework consisting of the exact steps one can follow when employing the idea of IBOSS for data reduction. Considering the second-order effect model that contains main effects, quadratic effects, and interaction effects, we develop a novel algorithm of selecting an informative subdata. Empirical studies including a real example demonstrate that the new algorithm adequately addresses the trade-off between the computation complexity and statistical efficiency, one of six core research directions for theoretical data science research proposed by the US National Science Foundation.