Gaia will produce low-resolution spectrum of each star using Blue and Red Photometer (BP and RP: 330-680nm and 640-1000nm, respectively. Figure 1).
[ Figure 1. Gaia BP/RP spectra. Credit: Gaia ESA]
Along with Discrete Source Classifier (DSC) based on Support Vector Machine (SVM) implemented in the Apsis chain (Bailer-Jones+ 2013), I developed an independent source classifier using the Random Forest (RF). The training set consists of the simulated Gaia data as follows:
Using the training set, I trained a RF model using the grid-search technique. Table 1 and 2 shows classification quality for the mixed libraries and library-by-library cases. The mixed libraries merged the all simulated Gaia libraries listed above into five subgroups as STAR, WD, PHYBIN, QSO, and GALAXY. We confirmed that the classification quality is similar with that of DSC.
[ Table 1. Classification quality for the mixed libraries ]
[ Table 2. Classification quality for the library-by-library ]
One of the advantages of RF is that RF can predict samples even in the presence of missing data. In Table 3 and 4, we show how much missing data affects classification quality. As the tables show, classification quality does not degrade much, which is very promising result since Gaia BP/RP can occasionally have missing or bad pixels due to processing errors, photometric uncertainties, etc.
[ Table 3. Classification quality with missing data for the mixed libraries ]
[ Table 4. Classification quality with missing data for the library-by-library ]
For details about this work, see GAIA-C8-TN-MPIA-DWK-003.