Morphology

Parametric and non-parametric estimates of morphology

The excellent image quality that LSST will deliver will allow us to obtain morphological information for all extended objects with sufficient signal-to-noise ratio, using both parametric model fitting and non-parametric estimation of various morphology indices. The parametric models, when the PSF is properly accounted for, will produce measurements of the galaxy axial ratio, position angle and size. Possible models include simple Sersic models and more classical bulge and disk decompositions (Fig 1).

Fig 1: Example of a bulge and disk decomposition of a galaxy. The top panels show the residuals, produced by subtracting the original image from different models for the galaxy. In this case, a bulge + disk model is the best fit because it produces the lowest residuals.

The median LSST seeing requirement of 0.7" corresponds to ∼ 4 kpc at z = 0.5, which is smaller than a typical L∗ galaxy scale-length. Parametric models will be able to discriminate between bulge or disk dominated galaxies up to z ∼ 0.5 − 0.6, and determine their sizes for the brightest ones. Non-parametric morphology indicators include concentration, asymmetry, and clumpiness (CAS; Conselice 2003) as well as measures of the distribution function of galaxy pixel flux values (the Gini coefficient) and moments of the galaxy image (M20; Lotz et al. 2004).

Visual classifications via citizen-science platforms: Galaxy Zoo

While parametric and non-parametric methods are ideally suited to analysing large survey datasets like LSST, they are typically benchmarked against direct visual classification, which arguably provides the most reliable measurements of galaxy morphology. While visual classification can be prohibitively time consuming for individual or even small groups of researchers, the recent advent of the Galaxy Zoo platform (Lintott et al. 2011) has enabled the community to measure visual morphologies on large datasets. By employing almost a million members of the general public ('citizen scientists'), Galaxy Zoo has classified many contemporary large surveys like the SDSS and the legacy surveys from the Hubble Space Telescope. Platforms such as Galaxy Zoo will continue to be important for morphological studies in the LSST era, although as described below (see also the section on AI and Machine Learning on this website), the sheer volume of data will require the community to use visual classification in conjunction with machine-learning algorithms that are designed to autonomously analyse large datasets.

Fig 2: An example classification from Galaxy Zoo 2 of the galaxy NGC 2771, which has a classification of (R’)SB(r)ab in the RC3 (de Vaucouleurs et al. 1991). The numbers in the orange bar at top represent the number of users (after a weighting scheme is applied which removes users who always provide highly inconsistent answers) who responded positively to the question indicated. Credit: Karen Masters.

Machine learning techniques for morphological classification of galaxies

Big Data surveys like LSST, while revolutionary in their scope, will require new analysis techniques. For example, the morphological analysis of the unprecedented data volumes will require significant amounts of automation. While this can be achieved using the parametric and non-parametric methods described above, the convergence of astrophysics and computer science offers powerful new opportunities to harness the rich detail expected from LSST images.

A useful route to achieving the automation required for the morphological analysis of surveys like LSST is to employ unsupervised machine learning (UML), which does not require training sets. Successful UML algorithms can reduce the dimensionality of the problem by autonomously grouping similar objects together into a small number of 'morphological clusters'. If the purity of these clusters is high then the clusters (rather than individual galaxies) can be benchmarked using visual inspection. If the number of clusters is small (e.g. a few hundred) then this exercise becomes tractable even for individual researchers.

Fig 3 shows an example of such a UML algorithm (Martin et al. 2020). The algorithm extracts image patches from multi-band data, each of which is transformed into a rotationally-invariant representation of a small region of the survey data, efficiently encoding colour, intensity and spatial frequency information. Utilizing growing neural gas and hierarchical clustering algorithms, it groups the patches into a library of patch types, based on their similarity. It then assembles 'feature vectors' for each object, which describe the frequency of each patch type. Finally, a k-means algorithm is employed to separate objects into 160 morphological clusters, based on the similarity of their feature vectors. Fig 3 shows examples of galaxies in morphological clusters that correspond to broad, well-known morphological classes (spirals, S0/Sa and ellipticals). Algorithms such as this will be invaluable for the analysis of images in surveys like LSST.

Fig 3: Top: A schematic view of the morphological classification process. Patches are extracted around detected pixels in survey images and clustering methods are used to group these patches into a library of patch types. Galaxy feature vectors can then be constructed by creating a histogram for each object which describes the frequency of each patch ‘type’ (Martin et al. 2020).

Right: g-r-i false colour images showing a random selection of galaxies from each major morphological group. The samples are further split into bins of redshift, indicated by the label in the top right of each coloured box. Panel (a) shows objects classified as spirals, panel (b) shows objects classified as S0/Sa and panel (c) shows objects classified as ellipticals (Martin et al. 2020).