Prakash Chockalingam


  Home                    About Me                       Projects(updated !)                    CV                         Contact                      Links               


Adaptive Fragments-Based Tracking

Dynamic Layer Tracking

Video Stabilization and Mosaicking

Stroke classification in Tennis

Sports classification using cross-ratios

Fingerprint Recognition

Palmprint Recognition

Flame Recognition

Iris Recognition



                                                                                                          Adaptive Fragment-based tracking                                                                                                 ^Top

Advisor: Dr. Stan Birchfield

This project is a part of my Masters' Thesis which is accepted in ICCV '09. During the course of this work, I came up with a visual tracking approach based on dividing the target into multiple fragmentsThe target is represented by a Gaussian mixture model in a joint feature-spatial space, with each ellipsoid corresponding to a different fragment. The fragments are automatically adapted to the image data, being selected by an efficient region-growing procedure and updated according to a weighted average of the past and present image statistics.Modeling of target and background are performed in a Chan-Vese manner, using the framework of level sets to preserve accurate boundaries of the target.The extracted target boundaries are used to learn the dynamic shape of the target over time, enabling tracking to continue under partial and total occlusion.

The tracker performance under different scenarios is given below. More results and original sequences can be obtained from

Scenario Results
Non-rigid deformations:
Tracking non-rigid objects is a challenging problem in tracking. In this sequence, the elmo doll is tracked accurately with contours even after considerable deformations and scale changes as the elmo sits, stands and falls.

Rapid Motion:
A more complicated case where the monkey moves rapidly in the scene with extreme deformations as it swings around the tree.
Full Occlusion:
A sequence in which a person walks behind a tree in a complex scene with many colors. The approach predicts both the shape and the location of the object and displays the contour accordingly during the complete occlusion.

Full Occlusion:
A more complex scenario where a girl, moving quickly in a circular path (a complete revolution occurs in just 35 frames), is occluded frequently by a boy. The approach is able to handle this difficult scenario.

Partial Occlusion:
A sequence in which a person is partially occluded by a car. Though partial occlusion is not handled, the tracker adjusts the contour to only the visible portions of the moving person. Note that though the contours are extracted accurately for the body of the person, there are some errors in extracting the contours for the face region due to the complexity of the skin color.

Multiple Objects:
The fish are multicolored and swim in front of a complex, textured, multicolored background. Note that the fish are tracked successfully despite their changing shape.


                                                                                             Dynamic Layer Tracking for Aerial Video Surveillance                                                                               ^Top

I implemented a dynamic layer tracking algorithm which can detect image areas in the vicinity of targets undergoing change using the Laplacian of the current frame and that of a stabilized previous frame. The background model captures statistical intensity variations at each pixel of a frame. Using this model, statistically significant intensity variations are recorded as change pixels. A connected component module is then used to extract the final target region.

A sample aerial video showing the performance of the tracker can be viewed here


                                                                                    Development of Video Stabilization and Mosaicking system                                                                              ^Top

In this project, I was one of the team members who implemented a global intensity based stabilization algorithm. During the course of this project, we also came to know about the lack of any industrial standards for evaluating stabilization systems. Hence, we've proposed different metrics and evaluation methodologies which can become a gold standard to benchmark other stabilization algorithms. This work is published in Workshop on Applications of Computer Vision (WACV) 08. I'm also extending the current metrics to handle some complex aspects of stabilization systems and also I'm currently evaluating the metrics on feature based algorithms. This extended work will be sent to a journal shortly.

(All the results has 3 videos shown side-by-side. The 1st one is the input video, the 2nd is the stabilized video and the 3rd is the mosaicked video)

Scenario Results
Translation: The camera motion is simulated as 2-D translation in X and Y direction
Amplitude Jitter: The camera motion is simulated as a sine wave in which the amplitude is a function of time as given by X_Disp = a(t)sin(2*pi*f*t), where f is the frequency and t is the frame number.
Affine: Camera motion is simulated as random affine motion. The random values for motion parameter is generated within a range of 5% from identity motion.
Photometric difference: Adjustment of camera gain between images. Here, we modify the sequence such that the offset with respect to the DC value of each image is scaled by (1-t/T), where t is the frame no. and T is the total no of images in the sequence. Thus, the features of the image at time t are reduced by t/T. The system works excellently even in these circumstances.


                                                                                                                 Stroke Classification in Tennis                                                                                                  ^Top

Classifying the different tennis strokes is the prime goal of this project. We came up with an algorithm that uses the gradient information of the player's skeleton. The player is modeled using color histogram and tracked across the video using histogram back projection. An oriented histogram of the skeleton obtained in each frame forms the feature vector which is then sent to a trained SVM classifier to classify the strokes. The SVM classifier was trained for 3 classes: Forehand, Backhand and No Shot. This algorithm was published on International Conference on Image Analysis and Recognition (ICIAR) '07.

Sample forehand, backhand and no shot strokes along with their skeleton and histogram is shown below:

Stroke Skeleton Oriented Histogram


                                                                                              Sports classification using Geometric properties                                                                                      ^Top

This project is about classifying sports images using the invariant nature of cross-atios under projective transformation. The algorithm works as follows: A histogram of cross-ratios computed based on the intersection of lines detected using Hough transform is used to form a feature vector of the image. A modified One-Against-All multi-class SVM classifier is used to classify the feature vector. Images of Tennis, Football, Basketball and Badminton were used for classification. This algorithm is published in Asian Conference on Computer Vision (ACCV)'07.

Two different views of a same sport under different photometric conditions along with their cross-ratio histograms are shown below:

Sport View 1 View 2 Cross-Ratio Histogram






                                                                                           Fingerprint Recognition along with Facial Water-marking                                                                           ^Top

This project involves two level biometric authentication – fingerprint and face recognition. Initially, a fingerprint matching is done and then the facial information watermarked on a fingerprint is used for facial verification. This ensures that even if a person's fingerprint is stolen, unauthorized entry is not permitted since a duple match is carried out. This Project was presented in 2003 APOGEE (technical fest), BITS Pilani in Computer Science Association. It also won FIRST PRIZE in CYBERFIESTA(INTEL National Tech Festival 2K3). The dual key security mechanism was then enhanced and presented to non-refereed conference proceeding in APOGEE, BITS Pilani, 2004.

As part of the fingerprint matching module, we developed a new algorithm based on level binary tree representation to match fingerprints in linear time which was accepted in International Conference on Image Processing (ICIP) '06. I'm also currently working on an improvised version of the algorithm which is to be submitted to IEEE Transactions on Image Processing.


                                                                                        Nearest Neighbor Vector based Palmprint Recognition                                                                                  ^Top

In this project, we proposed a new approach for palmprint verification based on the construction of Nearest Neighbor Vector. A new method of two level matching, which incorporates both local and global features of palmprint, was designed for verification purposes. The algorithm was accepted in International Joint Conference on Neural Networks (IJCNN). This algorithm has been now qualified and pruned for better accuracy than the existing palmprint matching systems and this work is under review in an international conference.


                                                                                                                       Flame Recognition                                                                                                                ^Top

I explored different fire detection algorithms and implemented an algorithm based on temporal variations. The algorithm uses color and motion information computed from video sequences to locate fire. This is done by creating a Gaussian-smoothed color histogram to detect fire-colored pixels, and then using a temporal variation of pixels to determine which of these pixels are actually fire pixels. Next, some spurious fire pixels are removed using an erode operation, and some missing fire pixels are found using region growing method. I found this method applicable to more areas because of its insensitivity to camera motion.

Performance of the flame recognition system under different scenarios are given below:

Scenario Results
A sample test data where a wooden building is on fire. As the fire grows and shrinks, the algorithm captures all the fire pixels properly.
A more robust scenario where objects cross the fire regions. The fire pixels and object pixels are sequestered accurately.
A sunset video where the sky is similar to the color of fire pixels. Its just to check for false positives and the algorithm doesn't show any false positives.


                                                                                                                   Recognition of Iris Patterns                                                                                                     ^Top

In this project, different iris patterns were taken and a unique feature vector was constructed for each iris pattern. The circular iris patterns are transformed to desired rectangular shapes. On these rectangular patterns textural analysis through Gabor filters were applied to construct feature vectors. These unique feature vectors were then used for the iris matching module.


                                                                                                                                  MicroType                                                                                                                   ^Top

MicroType is an all-in-one alphabetic, numeric and skill-building keyboarding program that teaches students typing. The client is Thomson South-Western who has a major market in American schools. The product’s graphics and front-end was done using Macromedia flash and director and the core API is developed using C++. The product also has a WordProcessor developed using MFC. The API was designed in such a way that it is re-usable for other similar products the client has. Hence, the client can dynamically plug-in features using a configuration tool and customize the product. A flash tour and a demo version of the product can be downloaded here.


Bits-Pilani            Sarnoff Corporation               Yahoo! Targeting                    Google Vision Directory