On MachineLearned Classification of Variable Stars with Sparse and Noisy TimeSeries Data (abstract, PDF) In this paper, we introduce methodology for variablestar classification, drawing from modern machinelearning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of realnumbered metrics (features), detail methods to robustly estimate periodic lightcurve features, introduce tree ensemble methods for accurate variable star classification, and show how to rigorously evaluate the classification results using cross validation. On a 25class data set of 1542 wellstudied variable stars, we achieve a 22.8% overall classification error using the random forest classifier; this represents a 24% improvement over the best previous classifier on these data. The data set used for this paper, including all features, light curves, and class labels, is found here. The algorithms used in the paper are available in the R Statistical Computing Environment as contributed packages (rpart, randomForest, adabag, kernlab) or as part of the Clus software download. We present a novel method for the optimal selection of quasars using timeseries observations in a single photometric bandpass. Utilizing the damped random walk model of Kelly et al. (2009), we parameterize the ensemble quasar structure function in Sloan Stripe 82 as a function of observed brightness. We establish the typical rate of false alarms due to known variable stars as <3% (high purity). Applying the classification, we increase the sample of potential quasars relative to those known in Stripe 82 by as much as 29%, and by nearly a factor of two in the redshift range 2.5<z<3, where selection by color is extremeley inefficient. Python code for estimating the RW QSO features from a light curve is here.

Home >