Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and GALEX (2008)

We extended the work on photometric redshifts of quasars to assign probability density functions (PDFs) in redshift for galaxies and quasars in the SDSS DR5 and GALEX GR3. We found that by selecting quasars with a single peak in their PDF, we are able to reduce the RMS dispersion between photometric and spectroscopic redshift from 0.343 +/- 0.005 to 0.117 +/- 0.010 (error bars from tenfold cross-validation). We increase the percentage of quasars within 0.3 of the spectroscopic redshift (a useful criterion for 'catastrophic failure') from 79.8 +/- 0.3% to 99.3 +/- 0.1%.

Preliminary results were presented as part of a poster presented at the Cosmic Cartography conference in Chicago, Dec 6th 2007. The final results are in Ball et al. (2008), ApJ 683 12 (arXiv/0804.3413).

Figure 1 below demonstrates the improvement in photo-z:

Figure 1: Improvement in quasar redshifts enabled by our data mining techniques, shown as spectroscopic versus photometric redshift for the SDSS. Left-hand panel: a reproduction, using our framework, of a typical result prior to our work (e.g., Weinstein et al. 2004). Right-hand panel: the result of using machine learning to assign probability density functions then taking the subset with a single peak in probability. Contours indicate the areal density of individual quasars (points) on the plot.

We also assigned PDFs to SDSS Main Sample Galaxies, Luminous Red Galaxies, and their cross-matches in GALEX, finding similar RMS spreads to previous studies. This example shows the LRGs. The 'avoiding the ends of the scale' bias, as seen in earlier results for several galaxy datasets is seen again here. This has now been confirmed by other groups (e.g. talks at the December 2008 PHAT conference by Carlos Cunha and Christian Wolf) to be the result of a non-uniform prior: because a given set of input observables such as magnitude represents a spread in spectroscopic redshift, but the photometric redshift gets compressed into one point, and because there are generally more spectroscopic galaxies away from the ends of the redshift range, the photometric redshifts get pulled towards the regions where there are more galaxies. A similar effect was seen in my eClass results when flattening the n(z) of the training set removed the bias.

Figure 2: SDSS Luminous Red Galaxy photo-z. From Ball et al. 2008 (ApJ 683 12).

The inevitable downside of selecting objects with a single PDF peak is that the selection function in redshift is altered. In the ideal case the red line in Figure 3 would be flat. But whether this is a problem or not depends on the application.

Figure 3: Alteration in the selection function for the subsample of SDSS DR5 quasars with one peak compared to the full sample. The horizontal dashed line shows the overall fraction of quasars with single-peaked PDFs, to which the red line would correspond if there were no alteration. From Ball et al. (2008).

Finally, we investigated the causes of bad quasar photometric redshifts, finding them to be due to degeneracy, reddening, and emission lines. We uncovered strong circumstantial evidence that the latter either could be lost when they dropped between filters or could simulate other lines. Figure 4 shows an overlay of the redshift at which the given emission lines (the five brightest) cross the edges of the SDSS filters. Almost every instance of structure in the plot corresponds to one of these lines, either through a sudden change in spread, or a larger spread within a given filter.

Figure 4: Redshifted filters overplotted on z_phot versus z_spec for SDSS DR5 quasars for the five brightest emission lines. The bottom right-hand panel shows all five lines superimposed. From Ball et al. (2008).