遊戲設計作業 (Game Design 2010) based on TheFly2, a 3D game engine
影像辨識 (Pattern Recognition 2009) based on OpenCV and MatLab, (Source Download)
In the report, we implemented three methods of pattern recognition, Eigenface, Fisherface, and FTC by OpenCV and Matlab. We tried to analyze and optimize their parameters and explained the test results. Besides, we tried an image pre-processing method, Histogram Equalization. It improved the hit rate with minor computing overhead. In the first section, we explained our consideration for implementation and optimization. In the second section, we showed our test results and explained each methodology respectively. In the last section, we concluded the advantages and the disadvantages of each methodologies th and proposed future work.
According to our observation, the number of Eigen value doesn’t help the hit rate of recognition in proportion. Although the original data in higher degree include more information, the projection could lose some key information and causes some bias. The projection of face images in higher degree could be closer than the projection in lower degree and the overlap part could be more in the higher degree one. It introduces to choose a right degree for our application instead of the highest degree.
We ever consider using different threshold for different subject ID. After reviewing the test result, we find the threshold cannot be relied on even with individual threshold for each subject ID. The projection of face image from different subject ID could overlap with each other. As we lack of full information after projection, one way is to gain the most probability of hit rate and reducing the false alarm rate. It consults a threshold to balance the hit rate and the false alarm rate and ROC curve could help to make the decision in a probability point of view. Besides, Mahalanobis distance also help to improve the classification according to the test result.
The most time consuming part in PCA is in the training phase about Eigen value resolving. The time consuming for 10 degrees or 200 degrees or 1000 degrees are almost the same in a big scale point of view. The registration phase and the recognition phase take a little of time in contrast. It introduces to complete a full set of Eigen value and Eigen vector at one time, and to reuse those Eigen value and Eigen vector in the registration phase and the recognition phrase for the rest experiment. It could save the experiment time significantly.
To save development time, the code is based on OpenCV library. OpenCV helps to do matrix calculation a lot and there’re many useful functions. For example, we can enhance the contrast quality of image by calling one OpenCV function in a simple way. The computing time couldn’t be said short, but it is possible to optimize that by parallel computing in a multiple CPU system.
I found that if I just used those images from the same image set as test set (probe, imposter) rather than all images to be the training and gallery images. The accuracy was much better. That means that if the variation between training data and test data is smaller, the performance of recognition is better. This conclusion is contrast to our common sense, that the performance is better with a large training set.
The performance is not a linear proportion of the number of fisherfaces. When more similar fisherfaces are used, the result get worse instead. When I choose the fisherfaces with high eigenvalues, the performance is better.
fisherfaces
projected test images
In the ‘FERET’ training image data set, we can see that the number of images from the same subject (person) is fewer than other image data set. That caused the identification of FERET set is much more difficult (while the verification is much easier). I think the reason is that, because the training data of FERET is less, the trained fisherfaces cannot represent the characteristics of them, and then the projections of gallery data and test data are not clustered well. To solve the problem, I tried another strategy which is extracting the mean of all points in each gallery class. That is, I used one point in fisherface space to represent per subject (person). I thought this must be fair to those subjects with few gallery images, although the result was still not improved much.
In my fisherface program, if I used ‘histeq’ (in MATLAB), the result is better. Here are the trained fisherfaces and the projected test image:
fisherfaces
projected test images
If I did not use the illumination pre-processing, they were :
fisherfaces
projected test images
In the FTC algorithm, it still needs PCA and LDA to reduce the dimension of images. I use the primcomp function in the matlab to reduce the dimension to 20. And further use the LDA code from Roger Jang to reduce the dimension to 6.
In the clustering stage, I use the method from “unsupervised learning of finite mixture models”. The source code from the author is easy to use. The problem I encounter is that I think the pattern that I found is not so good. It seems that the data point is still scattered in the lower dimension. It is hard to clustering even I try difference parameter (Include the number of dimensions and the maximum number of class). I think it is the main reason that the performance of my FTC program is not so good.
In the procedure of FTC algorithm, we need to classify the patches to find the best fit pattern. I try to use the LIBSVM software to learn the classier for the patterns of each patch. I download the LIBSVM from the SVM pages of Chih-Jen Lin. The version of Supported LIBSVM is 2.89, and the language is MATLAB. I call the function svmtrain to learn the model. But when I try to call the function svmpredict to use the model to classify the patches, the result seems bad. The predict label for each image is almost the same. Only the predict label for the same image for the train can get the correct predict label. After I studied the guide for the LIBSVM, I still cannot solve the problem. So at the experiment, I use the simple method to classify the patches. I save the mean and covariance and the estimate probability of each pattern. At the classify stage, I use the information to calculate the multivariate Gaussian density probability to find the best fit pattern.
The hit rate increases sharply from Eigen number 10 to Eigen number 200. After Eigen number 200, the hit rate decreases. So we can choose Eigen number 200 as an optimized point.
Picture 1. the relationship btw the hit rate and the degree of projection
According to ROC curve, the position with slop as 45 degree is around (24, 67). We can choose the closest the closest threshold as 0.1 to balance the hit rate and the false alarm rate.
Picture 2. ROC curve with image process by Equalize Histograms method
According to the following test result, Equalize Histograms method help the hit rate in about 8% and the overhead of computing time is very small.
Table 1. Hit rate comparison in image process
When considering an imposer scenario, a threshold of projection distance was set and the hit rate was lower than the original result about 17%. The overhead of computing time is very small and can be ignored.
Table 2. Hit rate comparison with threshold setting
Training phrase occupies most of execution time, and the recognition time is very short in contrast. It spends most of time in iteration of Eigen value calculation in the training phrase. In contrast, projection calculation in registration phrase and recognition phrase and takes less of time without iteration.
Table 3. Time consuming comparison in each phase
I draw two ROC curve, one is with illumination pre-precess, one is without : Then we can see the system with illumination pre-process is better.
The ROC curve of the system with illumination pre-process
The ROC curve of the system without illumination pre-process
I list the accuracy of identification over threshold from 0~10 :
We can see that after the threshold grew over than 7, the accuracy did not grow any more.
The hit rate and the false alarm rate are calculated under different number of error tolerance.
It takes little time to classify one image. The FTC program can register 1027 images(The Gallery set) and classify 2321 images(Contain the Probe set and the Imposter set) in about 41 seconds
I use the Histogram Equalization for image pre-processing. The Histogram Equalization is a famous and easy algorithm for dealing the lighting issue. The implementation can be found in OpenCV and Matlab. Below I compare the performance (success reject rate for imposter set and success identify rate for probe set) when the Histogram Equalization is used and when the Histogram Equalization is not used. We can easily see that the performance is much great when the Histogram Equalization is used.
The success reject rate is larger without image pre-processing because the classify pattern tend to be different when the light effect is not removed, so the face in imposter set will be encoded more variably. And it is more likely to be rejected so the performance is better. But it is not a good method to reject a face.
Below is a figure that represents the relations between the performance of identification rate and the number of trait used. Here the identification find the subject ID that is the most closed.
I list some of the patterns that I extract below. I use 63 traits for face identification and face Verification. Below is the trait from index 1 to index 5.
In PCA, to optimize the hit rate, we could choose a proper degree for projection, and the number can be found by a mass experiment. To balance the hit rate and the false alarm rate, we can base on ROC curve to find a point with slop 45 degree. Image process and Mahalanobis distance both also help to improve the hit rate and their overhead is very small. With the above optimization, it’s possible to reach 84% of hit rate without considering any false alarm. Matrix calculation takes most of time of computing. It is possible to speed up OpevCV libray in a multiple core system. When it’s done, the performance of all related program can be upgraded.
We think face recognition is already a highly completed technology today, but according to our experience of implementation, We feel the performance of recognition is still highly affected by lots factors, such as range of training data, size of training data, quality of images ... etc. And we have to tune lots parameter, like threshold of distance in fisherface space and number of fisherfaces. We think that maybe people can figure out some method which has enough robustness but without those problems. We think that will be a more practical method.
.
[1] Peter N. Belhumeur, Jo~ao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces : Recognition Using Class Specific Linear Projection”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 19, no. 7, 1997
[2] “The magnificent ROC”, http://www.anaesthetist.com/mnm/stats/roc/Findex.htm
[3] “Robin Hewitt”, “Implementing Eigenface”, http://www.cognotics.com/opencv/servo_2007_series/part_5/
[4] M. Figueiredo and A. Jain. “unsupervised learning of finite mixture models”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:381–396, 2002. 3, 4]
[5] Ping-Han Lee, Gee-Sern Hsu, Tsuhan Chen and Yi-Ping Hung. “Facial Trait Code and Its Application to Face Recognition”
[6] C.-W. Hsu, C.-C. Chang, C.-J. Lin. “A practical guide to support vector classification”. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
[7] Jyh-Shing Roger Jang, Data Clustering and Pattern Recognition, http://mirlab.org/jang/books/dcpr/ (推薦閱讀)