Photometric Redshifts for Quasars in the SDSS DR5 (2007)

In a similar manner to the work for classification, the D2K framework was used to assign photometric redshifts to quasars in the SDSS DR5 using k nearest neighbour instance-based learning (kNN). Earlier results, using decision trees, were presented on a poster at the 208th meeting of the American Astronomical Society, and the final results are in Ball et al. (2007), ApJ 663 774 (astro-ph/0612471).

Our two main findings were that quasar photometric redshifts can be assigned without regions of 'catastrophic failures' (large z_photo - z_spectro) previously seen in the literature, and adding the UV bands provided by a cross-match to GALEX data provides a dramatic improvement.

Figure 1 shows the main result, the photometric versus spectroscopic redshift for a blind test of 20% of the sample of 55,000 quasars. The variance between the two measures over the whole redshift range is 0.123 +/- 0.002. We obtained the error bar on the variance using the standard deviation of values from the ten-fold cross-validation.

Figure 1: Contour plot of quasar photometric redshifts assigned by the instance-based learner versus spectroscopic redshifts for the SDSS DR5 blind testing sample. Compared to previous work, there are no regions of `catastrophic' failure, in which objects are assigned a very different redshift to the true value, just a smoothly declining spread of outliers. From Ball et al. 2007 (ApJ 663 774).

Rather than just using single-nearest neighbour in the training set to assign the redshift, we used the k nearest neighbors, and weighted the objects according to their Euclidean distances. Thus a high value gives more emphasis to closer objects. As with the decision trees, there is a broad region of approximately optimal results, although it is well away from a single neighbour.

Figure 2: Effect of varying the number of nearest neighbours and the distance weighting of the instance-based learner for the blind test, showing the mean from ten different training to blind test splits of the data with a varying random seed. The best model is marked with 1 sigma error bars. From Ball et al. (2007).

We compared our kNN method to the previously used color-redshift relation. This gives results quite similar to using a single neighbor, with a variance of 0.265 +/- 0.006.

Figure 3: As Figure 1, but showing the results for the photo-zs using an empirical color-redshift relation. From Ball et al. (2007).

Adding the GALEX near- and far-UV bands dramatically improves the variance to 0.054 +/- 0.005, albeit at the expense of a smaller sample and fewer objects above z = 2.2, where the Lyman break is shifted out of the UV.

Figure 4: As Figure 1, but showing the results for 1528 of 7642 quasars present in the SDSS DR5 cross-matched to the GALEX GR2. From Ball et al. (2007).