Probabilistic Clustering Analysis

In our Universe, galaxies are the most common large scale structures. A galaxy is normally consist of ~10^11 stars, with a typical mass of ~10^11 times of our Sun. Galaxies interact with each other via gravity, meaning that the closer they are, the stronger the gravity is among them. As a result, the clustering of the galaxy will reflect the gravitational interaction in our Universe. Identify clustered galaxy is therefore an important astrophysical topic. Detect clustered galaxy is not a easy task, the following factors complicate the task:

Not all galaxies are clustered, only ~ 10% of them are clustered.
We cannot measure the distance of galaxy from us precisely, meaning that we want to detect clusters in 3D space, but only 2D precise information.

Conventional statistical clustering methods generally breakdown when applied to the clustering analysis described above. We developed a GMBCG algorithm to detect the galaxy clusters and created the largest (by 2010) catalog for clustered galaxies based on the data from Sloan Digital Sky Survey. The key part of the algorithm is to have a probabilistic model for galaxy clusters and then match the actual data to this model to assign likelihood. For more detail, see the following paper.

Hao, J., McKay, T. A., Koester, B. P., Rykoff, E. S., Rozo, E., Annis, J., Wechsler, R. H., Evrard, A., Siegel, S. R., Becker, M., Busha, M., Gerdes, D., Johnston, D. E., & Sheldon, E. (2010). A GMBCG Galaxy Cluster Catalog of 55,437 Rich Clusters from SDSS DR7. Astrophys.J., Suppl.191:254-274 (Journal IF: 15.206).

The cluster catalog can be accessed from: http://home.fnal.gov/~jghao/gmbcg_sdss_catalog.html. The catalog has been used as a major source of galaxy clusters by a number of large astronomical collaborations, such as PLANCK. Later on, I have made some new improvements to the probabilistic model and here is a description of the new work

Hao, J.,(2011) Spatially Weighted and Measurement Error Corrected Gaussian Mixture Model for Galaxy Clustering Analysis Based on Photometric Data, Conference proceeding of Statistical Challenge in Modern Astronomy IV, College park, PA, [PDF]

Though the techniques described above mainly used for galaxy clustering analysis, they are directly applicable to other situations where he distance matrix is difficulty to form.

Google Sites

Report abuse