On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data (abstract, PDF)
In this paper, we introduce methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of real-numbered metrics (features), detail methods to robustly estimate periodic light-curve features, introduce tree- ensemble methods for accurate variable star classification, and show how to rigorously evaluate the classification results using cross validation. On a 25-class data set of 1542 well-studied variable stars, we achieve a 22.8% overall classification error using the random forest classifier; this represents a 24% improvement over the best previous classifier on these data.
The data set used for this paper, including all features, light curves, and class labels, is found here.
The algorithms used in the paper are available in the R Statistical Computing Environment as contributed packages (rpart, randomForest, adabag, kernlab) or as part of the Clus software download.
We present a novel method for the optimal selection of quasars using time-series observations in a single photometric bandpass. Utilizing the damped random walk model of Kelly et al. (2009), we parameterize the ensemble quasar structure function in Sloan Stripe 82 as a function of observed brightness.
We establish the typical rate of false alarms due to known variable stars as <3% (high purity). Applying the classification, we increase the sample of potential quasars relative to those known in Stripe 82 by as much as 29%, and by nearly a factor of two in the redshift range 2.5<z<3, where selection by color is extremeley inefficient.
Python code for estimating the RW QSO features from a light curve is here.