In our Universe, galaxies are the most common large scale structures. A galaxy is normally consist of ~10^11 stars, with a typical mass of ~10^11 times of our Sun. Galaxies interact with each other via gravity, meaning that the closer they are, the stronger the gravity is among them. As a result, the clustering of the galaxy will reflect the gravitational interaction in our Universe. Identify clustered galaxy is therefore an important astrophysical topic. Detect clustered galaxy is not a easy task, the following factors complicate the task:
Conventional statistical clustering methods generally breakdown when applied to the clustering analysis described above. We developed a GMBCG algorithm to detect the galaxy clusters and created the largest (by 2010) catalog for clustered galaxies based on the data from Sloan Digital Sky Survey. The key part of the algorithm is to have a probabilistic model for galaxy clusters and then match the actual data to this model to assign likelihood. For more detail, see the following paper.
The cluster catalog can be accessed from: http://home.fnal.gov/~jghao/gmbcg_sdss_catalog.html. The catalog has been used as a major source of galaxy clusters by a number of large astronomical collaborations, such as PLANCK. Later on, I have made some new improvements to the probabilistic model and here is a description of the new work
Though the techniques described above mainly used for galaxy clustering analysis, they are directly applicable to other situations where he distance matrix is difficulty to form.