Tracking of Fiducial Markers for Bat Flight Motion Capture Experiments

Matt Bender, Mincan Cao, Yu Wei, Shaoxu Xing

Fall 2015 ECE 5554/4984 Computer Vision: Class Project
Virginia Tech


We aim to develop a method for performing temporal tracking of fiducial markers on a bat to study flight motion.   We develop a three step method for identifying fiducial marker locations and tracking points.  First, the bat is segmented from the image using a frame differencing approach which produces a bounding box containing the bat.  Next, filter bank responses within this bounding box are computed to determine the location of features.  Finally, a Kalman filter tracking algorithm is implemented to track the points.  Points have been successfully tracked on the base body, however, improvement can be made in tracking points on the wings.

Teaser figure


Bats exhibit some of the most impressive flight mechanics in the animal kingdom which makes them excellent inspiration for flapping wing Micro-Air Vehicles (MAVs).  To study flapping flight in bats, motion capture experiments are conducted using GroPro 3+ Black cameras.  The GoPro cameras are cheaper than standard motion capture systems, however, they do not automatically track points. We wish to develop a tracking algorithm which performs the temporal correspondence of the markers within each camera view.  Due to the complexity of the bat motion, features can be difficult to track due to articulation of the skeleton and deformation of the wing membrane. 


This project is performed in three independent steps to yield the final tracking results. 
1. Image Segmentation
Image segmentation is performed on frame t by computing the backwards difference of grayscale image intensities. A threshold is then applied to the differential image to emphasize large differences in pixel intensities. 
 Because the bat is moving against a static background, the single bat area could be easily detected by determining which pixels have a value greater than the threshold. This approach is summarized in the equation below,

For a particular video stream, the threshold value applied for this implementation is 17.5. The center location of the bounding box is computed as the average location of every pixel with a value higher than the threshold.  An initial box is placed at this center location and all pixels above the threshold are counted.  Then the box dimensions are increased, holding the center location constant, and a new count of pixels is taken.  If the new count is larger than the previous count, then the box size is increased again and the process is repeated.  If the counts are the same, the small box is taken as the appropriate bounding box.  Results of segmentation are shown below in Figure 1.

Figure 1: Image Segmentation

2. Marker Identification

After determining the size and location of the bounding box, feature detection using a filter bank is performed.  First the region within the bounding box is thresholded to produce a binary image.  This result is then filtered with a bank of filters created by manually selecting a representative set of makers from the image . An example of the filter bank is shown below 
(figure 2b).  The output of the filter is small blob like regions which represent the detected marker locations.  The centers of the binary image dots are located using the imfindcircles function in Matlab, which locates the centers of circles based on the Hough Transformation.  Different filters in the filter bank may lead to different centers for one marker. In order to avoid this disturbance and get accurate results, we use mean-shift algorithm to cluster outputs of all the filters in the filter bank and reliably identify the centers for a majority of the markers. 

                                    Original Bat Image                              Binary Image                                    Filtered Result                                Located Centers                                 GIF Demonstration
(a) Marker Identification Process

(b) Filter Bank Examples

(c) Marker Identification Final Results

Figure 2: Marker Identification.

3. Unscented Kalman Filter Marker Tracking

To track markers in the image, an unscented Kalman filter was used.  This filter works by predicting a new marker location given past information and correcting this estimate using new sensor information.  The input to this portion of the project is uncorresponded “bags-of-marker-centers” for each frame of a video.  Initialization is performed manually for the first two time steps and the filter is able to successfully track features in the image.  Some assumptions regarding the motion of features and the sensor measurements must be made to implement the Kalman filter.

Any Kalman filter requires a motion model and sensor model to predict and correct the states of a system.  The motion model is used to predict new motion based on previous state information.  For this project, we have chosen to use a second order Markov model for state prediction in order to account for the velocity of a particular point.  The motion model is as follows,

Where xk is the pixel coordinate of a marker center at time step k. Note that the derivative term is computed discretely by using the two past states, hence the second order Markov Model. This motion model can be applied to the x and y coordinates of each point individually, or an average displacement in x and an average displacement in y can be computed over all points.  The latter works better for points which are all contained on a rigid link. Even thought this prediction equation incorporates the velocity of the point at the previous time step it is still prone to large prediction error.  Therefore, a sensor model will be used to relate this prediction to our measurements. 

The sensor model for this Kalman filter implantation is a proximity function which searches the bag-of-centers for the identified marker which is closest to the prediction.  This is done by computing the Euclidean distance from each predicted state to each marker center in the list. If there are no marker centers within a pixel radius (10pixels for this implementation), the sensor model assumes the desired marker was not observed.  The Kalman filter deals with these occlusions automatically by rescaling the observation vector so that occluded points are not updated. 

Note that due to the simplicity of the motion and sensor models, an unscented Kalman filter is not required.  The unscented Kalman filter was programmed, because the state estimates were supposed to be the generalized joint coordinates of the bat skeleton and the sensor model was supposed to map the pose of the bat into the camera frame.  Due to difficulty processing extrinsic camera calibration data this portion of the project was not completed.  The unscented Kalman filter was programmed at the beginning of this project under the assumption that it could perform well despite the nonlinear nature of the expected measurement equations. Despite the linear nature of the motion and sensor models the unscented Kalman filter will still perform the required updates sufficiently.  

Experiments and results

The data for this project is motion capture video footage which was collected at Shandong University in Jinan China. The footage is taken in a the camera array shown in Figure 3a below.  The cameras used are GoPro Hero 3+ Black cameras which are configured to record 720p video at 120 frames per second.  While the large number of cameras helps reduce the number occluded points, it generates a considerable amount of video data in which fiducial markers must be tracked.  Four frames of data from a single camera are stitched together and shown below in Figure 3b.  As seen in this figure, there are many points on the bat which must be tracked and forty camera views in which to track them.  Thus, we aim to automate the temporal tracking of the fiducial markers for this project.  

 a) Flight Tunnel - Shandong University                                                     b) Sample of Images from the Tunnel
                                                                                                                  (4 Frames Stitched Together)

Figure 3: Motion Capture Experiments 

The results from the image segmentation and marker identification are shown below.  As seen in the video, the bounding box tightly conforms to the bat and the the majority of the Markers are identified properly.   


Qualitative results

The main results for this project are qualitative evaluations of feature tracking in the image space.  As stated in the feature tracking section above, filtering was performed by computing the differential motion for each point individually and computing the average motion of a group of points.  Results for tracking the base body points when using the average velocity is shown below,


Video 1: Averaged Velocity Model - Base Body Point Tracking

As seen in the video, the body points are successfully tracked for 30 frames of video but become obscured in the last 10 frames. The method is robust to occasional point occlusion due to the motion assumptions made.  This average velocity method was also applied to a group of points spread over the wing and body.  The results for this experiment are shown in the video below.


Video 2: Averaged Velocity Model - Points Distributed Across the Wings

The averaged velocity method does not work for the points located on the wing due to the nonlinear nature of their motion.  Body points are still adequately tracked.  Finally, the motion model which computes individual velocities of points used to track points on the body.  Results are shown in the video below,


Video 3: Point-Wise Velocity Model - Base Body Point Tracking
The noise present in the individual velocity computations causes most of the initialized points to diverge over time.  Thus, the best result is base body tracking using the averaged velocity model.  

© Matt Bender, Mincan Cao, Yu Wei, Shaoxu Xing

free hit counter