Available Neural Network Galaxy Types (2004)

As detailed in Ball et al. (2004) and Ball (2004), artificial neural networks are able to assign morphological types to galaxies with extinction corrected apparent Petrosian magnitude r < 15.9 . Here 29:8:1 and 10:8:1 networks were run on several SDSS datasets, and the results are available as ASCII and FITS files, each containing ra, dec, redshift (if applicable) and type. These can then be matched to desired galaxy properties from the SDSS databases.

The types are continuous real numbers corresponding to the T system used in Nakamura et al. (2003), which were assigned to the nearest 0.5 .

The parameters used to train the networks, the number of runs, etc. are the full set of 29 in Ball et al. and the subset of 10 purely morphological parameters for which the types show less dependence on redshift. The types are the median from ten runs. The networks were trained by matching the JPG EDR eyeball catalogue with DR3 photometry using the crossID tool in the Sky Server.

The datasets are available as gzipped ASCII or FITS files. The sorting is simply the order the galaxies were returned from the query. The FITS files were verifed using fverify v3.1.6 (CFITSIO v2.410) and contain no warnings, errors or nulls.

The datasets available are:

Main DR3 galaxy sample with spectra to r < 15.9: the DR3 BESTDR3 database Galaxy and SpecObj views were queried with the cuts g.specObjID = s.specObjID, g.petroMag_r - g.extinction_r < 15.9 and s.zConf > 0.8 where g is the Galaxy view and s the specObj view. The columns are ra, dec, z, T type.

Main DR3 galaxy sample from imaging to r < 15.9: similarly, but using from PhotoObjAll g with (index=0) where g.type = 3 and g.mode = 1 and (g.primTarget & 448 > 0) and g.petroMag_r - g.extinction_r < 15.9 (avoiding the SQL bookmark lookup bug - see e.g. the SQL intro page on the SDSS Skyserver). The columns are ra, dec, T Type.

The accuracy of the types is similar to those assigned by humans, and is approximately +/- 0.5 T types. In addition to this, one can also plot each galaxy +/- n times the standard deviation of its assigned type over the ten runs, and the equivalent residual plot (i.e. network type minus target type). Here n is 1. The residuals show that the types are biased to some extent. This is due to the network `avoiding the ends of the scale' as noted in the paper and as seen before, e.g. in Naim et al. (1995). It may be possible to solve this by boosting underpopulated bins such as types 5 and above, although type 0 is not underpopulated. However, because the types are continuous, it is easy to bin them into broader bins, e.g. early vs late, or whole T types, if this is preferred. See also other results pages which use the ANN types.

Median network type from ten runs versus eyeball morphological type for the eyeball test sample (643 galaxies), using the 29:8:1 network architecture. The 10:8:1 plot looks very similar. The central diagonal line indicates the ideal result, i.e. assigned types equal to the known type; the diagonal lines above and below are the overall RMS deviation of the network types from the targets; the green dots are +/- one standard deviation over the 10 runs, and show that the variation between networks is less than the intrinsic spread in the galaxy properties used here. A +/-0.5 random offset is added to the target type, which is discrete to the nearest 0.5 .

For the simulations, because there are no target types, one should verify that the types assigned are reasonable. This can be done by simulating the network on the training and test sets, and plotting network types against galaxy properties, or histograms of the numbers of each type. The galaxy properties versus the network morphological type are shown versus the network simmed on its own test sample and the Main galaxy sample with spectra set (the others look similar). Each plot has the same set of axes, so a small number of points are not shown. The first gives a better idea of the relative density of points than the second.

Training set properties versus type for the ANN test set

Training set properties versus type for the Main sample with spectra set

Further Possibilities

The networks can also assign spectral types and redshifts, the latter with an RMS of 0.02, to r < 17.77, the extent of the main galaxy sample. However the galaxies in this sample will all eventually get spectra, so a higher z training set is required for the method to give new information, although one could assign a spectral type for all targets which don't yet have spectra. This higher z could be achieved using SDSS southern, or by degrading images at lower z in the same way that they degrade in the survey, perhaps by rebinning in fewer pixels (an example of the idea is in Kelly & McKay 2004). From the DR1 counts it is clear that a substantial amount of galaxies are gained by going slightly deeper, e.g. to r =19:

Also, 19311 to 15.9, 143661 to r < 17.6 and 176241 to r < 17.77 .

The types were used as one of the galaxy properties in a study of bivariate galaxy luminosity functions in the SDSS main galaxy sample and in a study of the morphology-density relation.