Torsten Reuter, Otto von Guericke University Magdeburg

  • Abstract: Data reduction is a fundamental challenge of modern technology, where classical statistical methods are not applicable because of computational limitations. We consider a general linear model for an extraordinarily large amount of observations, but only a few covariates. Subsampling aims at the selection of a given percentage of the existing original data. Under distributional assumptions on the covariates, we derive subsampling designs for various settings of the linear model, which are based on the design criterion of D-optimality and study their theoretical properties. We make use of fundamental concepts of optimal design theory and an equivalence theorem from convex optimization. The thus obtained subsampling designs provide simple rules on whether to accept or to reject a data point and therefore allow for an easy algorithmic implementation.