Gaussian Mixture Model (GMM) has wide applications in statistical analysis. However, traditional GMM consider only the case that each data point is measurement error free. In many applications, such as in astrophysics, the data points of interests normally have non-negligible measurement errors. In these situations, we need to model the measurement errors into the likelihood function of the GMM. This leads to a generalized GMM with measurement error corrections. We call it Error Corrected Gaussian Mixture Model (ECGMM). Its details and application to galaxy cluster analysis can be found in Hao et al, ApJ, 2009. (arXiv: 0907.4383)
The ECGMM using EM algorithm is implemented in C++ and wrapped into a python package using SWIG. The method has been used to measure the properties of galaxy cluster red sequence and optical galaxy cluster detection (part of my PhD thesis). It can also be used to estimate the density distribution when measurement errors are present.
In the following, I will show how to get the code work on your computer. Please note that I am currently not able to provide any support for Win and Mac OS. I can only support linux. The codes have been successfully tested on ubuntu linux (32 and 64) and redhat linux (Scientific linux).
Part 0. Before anything
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR ARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
Acknowledgements: Please make reference to Hao et al, ApJ, 2009 (click here for bibtex entry) if you use this package.
Please report any bugs, comments and suggestions to:
Jiangang Hao
Center for Particle Astrophysics, MS127
Fermi National Accelerator Laboratory
P.O. Box 500
Batavia, IL 60510
Email: jghao@fnal.gov or jianganghao@gmail.com
Part 1. Get the package and compile it
1. First, go to https://github.com/jgbrainstorm/ecgmm
2. Check out a current version using svn
$ git clone https://github.com/jgbrainstorm/ecgmm.git
3. Enter the ecgmm directory and list the files/directories by:
$ cd ecgmm
$ ls
you will find there are 3 sub-directories
bin, src, py
and 2 files
Makefile, README
4. Now, you need to compile the C++ package so that you can call it from python. It is very straightforward and simple if you have the right dependency packages installed. The required packages for compiling are:
SWIG: Simplified wrapper and interface generator, http://www.swig.org/
g++ : GNU C++ compiler
Python2.6: installed and make sure it is included at /usr/include/python2.6; Otherwise, you need to modify the Makefile accordingly.If you have other version of python(say, v2.4), then change the /usr/include/python2.6 as /usr/include/python2.4 in the Makefile.
GSL : GNU science library, http://www.gnu.org/software/gsl/ [Only required if you want to compile the C++ example. Not need for generating the Python module]
To compile, under the main directory ecGMM and type:
$make py
If you want the python version work. It will create the shared library as well as a python wrapper for the ecgmm.h.
$make cpp
If you want to try the C++ example. It will produce a executable file, ecGMMexample, under the bin/.
$make clean
If you want to clean all the compiled files.
Yes, it is just so simple. After this, you are ready to go.
Part 2. C++ part of the package
1. The package is consist of two parts, C++ part and python part. C++ part is the core and provide the actual computation. Why use C++? It is mainly intended for computation efficiency. Also, to keep it flexible, I provide a C++ class and two C++ "convenient" functions defined in another header file. They are all in the src subdirectory:
src/ecGMMClass.h: C++ class. Please refer to its documentation part for usage.
src/ecgmm.h : C++ functions based on the class. Please refer to its documentation part for usage.
2. There is another C++ file:
src/ecGMMexample.cpp: it shows how to use the ecGMMClass.h in great details.
Part 3. Python part of the package
1. Python is a widely used programming language, and it is free. By wrapping the C++ function using SWIG, I provide a python package for the ECGMM. The python package are stored in the subdirectory py/. There are 4 relevant file:
py/_ecgmm.so : the shared library build from the C++ file ecgmm.h
py/ecgmm.py : the python file created by SWIG when wrapping the ecgmm.h file
py/ecgmmPy.py : the python file that wrap the ecgmm.py and interfacing numpy array and C++ vector. It is the major python module you will use. Refer toits documentation part for details.
py/example.py: an example python code to show you how to use the package. It also serves as a test program for successful compilation.
2. Now, you need to test if the compliation is successful. To do this, enter the py/ directory. then issue:
$./example.py
If the compile is successful, then you will see two plots as following:
Click to see larger plot
The above two plots are results from the example.py. In this example, we generate simulated data from two Gaussian distribution. We create some mock data with the following parameters:
means are: 0.0 and 0.5.
sigma are: 0.3 and 0.04.
The two plots show the estimation of these parameters by using ECGMM and traditional GMM. Clearly, ECGMM do a better job while GMM overestimated the width. For more tests, please refer to the section 2 ofHao et al, ApJ, 2009
At this step, you have successfully installed the python version of the ECGMM. What you need to do is to set the PYTHONPATH to the corresponding path of py/. Then, you are ready to play with it.
Part 4. SWIG interface file and related files
1. You do not need to know this part for using the package. But I just want to make sure those people who are as curious as me are happy with it. There are two files are related to SWIG:
src/ecgmm.i : SWIG interface file
src/ecgmm_wrap.cxx : cpp file generated by SWIG
For more details about SWIG, please visit http://www.swig.org
Part 5. Future plans
Extending to at least two dimensions.
Wrapping it to R
......
Part 6. Application to galaxy cluster red sequence measurements.
The codes were developed during 2007 and we have applied it to measure the red sequence galaxies and cluster detection. See this abstract for AAS in 2008. The following poster is a brief outline of the paper Hao et al, ApJ, 2009.