GALPHAT

GALPHAT (GALaxy PHotometric ATtributes) is a newly developed image decomposition software based on Bayesian Markov Chain Monte Carlo (MCMC). Current version of GAPHAT models a galaxy using multiple 2D Sersic profiles and provides the robust posterior of model parameters by Bayesian MCMC. GALPHAT is not "just" another image decomposition software, but the first software package which focuses on scientific production and carries out the robust and sophisticated statistical analysis on galaxy image with the full posterior of model parameters. In this page, I will briefly describe why we need GALPHAT and how GALPHAT work and show some examples of performance test result and its potential application for testing more sophisticated hypothesis of galaxy formation and evolution.

Guidelines for using GALPHAT

1. Motivation

Galaxy morphology is one of fundamental properties of galaxy and extensively studied by different image decomposition techiniques developed from mid 90s to the very recent. All of them focuses on the best fit parameters by finding global minimum of likelihood function. Various minimization techniques are adopted for improving the robustness of their best fit parameter estimation. There are two broad catagories of minimization techiniques: gradient search method and Metropolis-Hastings algorithm. Gradient search methods are fast and adopted in many image decomposition softwares. However it always has a potential pitfall: it may be stuck at one of local minima. Although Metropolis-Hastings algorithm is more robust than gradient search method to find the global minimum, it requires a lot of CPU time and doesn't makes the software pratically feasible to apply to large number of galaxies.

In general, galaxy image modeling is not a easy problem due to the strong intrinsic correlation and degeneracy between model parameters. Typical number of free parameters in simple profile is about 7-8 which makes the parameter likelihood complicated. Also the various observational constraints (e.g. S/N and relative size of galaxy to PSF) affects the fitting results in an uncontrolled way. Therefore in many cases, the systematics of software has been investigated by Monte Carlo method, of which results itself is affected by the systematic of decomposition algorithm.

Although we find the best fit parameters which are truly located in the global minimum of the likelihood, it is still important to know how this global minimum compares to other minima in the likelihood. This is crucial information to set the confidence intervals of model parameters in other word, the uncertainties of model parameters. Frequently astronomers try to correlate one parameter to the other for finding a indication of the relation between them. However, unfortunately these analyses is based in many cases, on the distribution of best fit parameters without reliable range of confidence intervals. This may fool astronomers to some degree.

We need better approach to put a robust statistical confidence limit on our theory based on the observation. Bayesian MCMC is a powerful solution to this issue. Bayes theorem provides the full distribution of the posterior including "maximum a posteriori"(MAP) and Markov Chain Monte Carlo can sample any arbitrary function in high dimensional space, which makes Bayesian approach really powerful tool in modern statistical analysis. GALPHAT is motivated by all these complicated issues which were hard to resolved previously.

2. GALPHAT algorithm

Since Bayesian MCMC is computationally very expensive, GALPHAT need to be clever to generate many MCMC samples within a reasonably short amount of time. For generating MCMC, GALPHAT uses a state-of-the-art statistical software BIE which is a research product from UMASS Astronomy and Computer Science department. In every MCMC sampling step, GALPHAT generates galaxy model using a given set of parameters. This model generation step should be accurate and fast. We adopt novel technique for generating galaxy model. 1) Instead of assigning a flux at a pixel using the value at the center of that pixel, we pre-calculate the numerically integrated and accumulated table with many different Sersic index n, then read the table and interpolate pixel values in X,Y space and n-space using arbitrary values of coordinates at pixel corners. 2) Since galaxy has position angle, we scale the interpolated image using the effective radius(Re) and rotate the scaled image using three sequential shear operation in XYX direction which is done in Fourier space using Fourier shift theorem.

Fig1. Image rotation using three sequential shear operation by Larkin etal 1997.

Fig2. GALPHAT rotation of Sersic profile. Left: numerically integrated using PA=30 deg. Right: rotated by the shear operation algorithm following Larkin etal 1997.

Then the PSF is convolved with image as usual. This model image is used when evaluating the GALPHAT likelihood which is the Poisson. BIE combines likelihood and prior to evaluate the posterior. This iteration generates MCMC and GALPHAT posterior is based on this MCMC samples. All the detail algorithm will be published in journal. Meanwhile, brief and essential information can be found in the poster that I present in the meeting at Austin, TX.

3. Examples of GALPHAT performace

We have tested GALPHAT using various different types of synthetic galaxy images and characterized its performance by comparing input model parameter and the distribution of parameter posterior from GALPHAT. The GALPHAT performance is pretty robust and reliable. Currently GALPHAT adopts multiple Sersic models including sky back ground to model galaxy 2D surface brightness profile distribution.

1) Low signal-to-Noise (S/N) ratio: S/N is simply defined by signal from galaxy within half-light radius divided by Poisson noise from galaxy and sky background within half-light radius. In each S/N bin, I simulated 100 Sersic profile galaxies with random Poisson noise. Sersic index (n), half-light radius (Re), axis ratio (q), position angle (PA) were randomly sampled from uniform distribution with reasonable range. By adding up all GALPHAT posterior MCMC samples of each galaxy in S/N bin, I constructed the ensemble distribution of each model parameters. Each diagonal shows marginal posterior of magnitude (MAG) residual, scaled Re by input value, residual Sersic index (n), scaled q by input value, residual PA and fractional error of sky background in percent. And each off-diagonals are associated joint posteriors. with green color for 68.3% confidence levels.

2) High S/N: parameter posterior morphology is dominated by the parameter covariance in Sersic model.

3) Sersic index n: The parameter covariance also depends on n. Blue is for galaxies with n <=2.0 and red is for galaxies with n>2.0.

4) Galaxy size relative to image size: Robust estimate of sky background is important in galaxy surface brightness profile modeling since sky back ground is strongly degenerated with the profile shape of Sersic model. I tested how robust GALPHAT is for different fraction of blank sky region to estimate the back ground. I used different bins of relative galaxy size to image size. Following example is the case where the image encloses the galaxy upto 2Re.

Other than parameter covariance, we observed the very small (<0.1%) sky background bias owing to not enough information of sky background. But this tiny bias leads to -0.3 offset in MAG. This really motivates the need for including sky background in parameter inference.

5) Statistical inference from ensemble of galaxies: I generated 325 Sersic type galaxies sampled from Schechter function. I used a normal distribution for Sersic index. 25,000 MCMC converged samples for each galaxy are used for generating posterior and the ensemble of entire samples (25,000 times 325) is used to construct the distribution of Sersic index. Red curve is the true distribution used to generate the simulated galaxies. Grey line is the median of 1000 distributions from the bootstrap sampling of MCMC samples and grey band covers from 10% to 90% quantiles.

6) Bayesian evidence for model selection: GALPHAT can calculate Bayesian evidence from posterior MCMC using several algorithms including Laplace approximation and new algorithms proposed by my advisor Prof. Martin Weinberg.

6.1) Bayes factor (H1/H0) for the truly two-component galaxies: two hypotheses (H1: bulge+disk, H0: Sersic) are tested with statistical odds. For majority of samples (92%), null hypothesis (H0) is rejected by Bayes factor model selection.

6.2) Bayes factor (H1/H0) for the truly one-component galaxies: two hypotheses (H1: bulge+disk, H0: Sersic) are tested with statistical odds. For majority of samples (87%), null hypothesis (H0) is accepted by Bayes factor model selection.

7) Real examples from 2MASS Ks band selected galaxies. The following figure is the size-magnitude relation for ~500 Ks-band selected 2MASS galaxies modelled by Sersic profile. I used an ensemble of posteriors from these galaxies to make this contour.

4. GALPHAT potential application

In principle, GALPHAT can revise or confirm any science result which has been previously done by galaxy morphological analysis, with much more strong statistical power. Another interesting science only achievable with GALPHAT is hypothesis testing or sometimes called model comparison. Bayesian formalism provides the natural frame work for model comparison using Bayes factor or Bayesian evidence.

Why do people use Sersic profile to model galaxy? It is because the profile just describes data reasonably well over a range of galaxy types. That doesn't mean that Sersic is the true nature of morphological description of galaxies. Then the natural question is what's the best model to describe the data for a given observational condition? For example, is one component Sersic better model than two compent Sersic? or is it depending on the environment?

Many of these interesting questions can be addressed using GALPHAT in future.

5. GALPHAT release

As an application module in BIE, GALPHAT is run on parallel computing environment which is great advantage, but on the other hand it is major obstacle to attract many people who don't want to be bothered by all the detail configurations for running GALPHAT. Therefore I am planning to provide the user-guide of GALPHAT in either webpage or separate document. GALPHAT will be released to the public eventually as soon as I finished all the necessary steps. However meanwhile, if there is any interest of collarboration, it would be the most welcome. Feel free to contact Ilsang Yoon via. galphat (at) gmail.com or iyoon (at) astro.umass.edu if you have any question/comment or intent to collaborate.

6. Acknowledgement

This work is mostly done under the guidance of Prof. Martin Weinberg (my PhD thesis advisor) and Prof. Neal Katz. At the early stage of this work, I was indebted to Dr. Dan McIntosh and Mr. Yicheng Guo for useful discussions and providing sample images. This material is based in part upon work supported by the National Science Foundation under Grant Number IIS 0611948 and by the NASA AISR program under Grant Number NNG06GF25G.

GALPHAT poster (Statistical Challenges in Modern Astronomy, 2011)