Object Tracking Problem

Last update: Jul 2, 2015

For Arabian, this videos should be helpful.

What is Video tracking?

  • Given a video, Identify moving object (s).
  • Video = Much data => Time Consuming Process
  • May depend on other challenging problems (e.g. Object Recognition)
    • For example, detect boxes per objects in an image and then track objects.
  • Applications:
    • Surveillance, traffic control, players tracking...

Tracking Challenges & Concerns

  • Is it Stationary Camera (no moving)? Or not
    • Stationary Camera is an easier case for many algorithms
      • E.g. Motion Based algorithms will detect background easier
    • Real life applications are more moving camera
      • This is a challenging.
      • Typically give too a shape (e.g. bounding box, circle..) to track.
  • Occlusions? Disappearing for a while?
    • If we are tracking an object...what to do if disappeared for little seconds?
  • Colliding objects
    • Think you are tracking 2 persons..then they intersected in the video..we may switch the tracks IDs
      • Or a person walking but its body is part of the car visually
    • That is if object 1 is A and object 2 is B..we may switch the labels
  • Any moving objects? Or specific category (e.g. Persons)?
    • Identifying type of moving object will require more effort..more artifacts is challenigng
  • Object speed relative to frame rate
  • Fixed object’s orientation?
    • An object with fixed orientation is more easier than switching one
  • And many others..when develop an algorithm..consider the issues

Datasets Challenges

  • Annotating 1 video is time consuming process.
    • A 120 sec video with 20 fps = 2400 images!
  • Shortage in annotations. Things are a bit unorganized.
  • Good collection link (77 from literature).
  • Good collection link (50 from literature) used in benchmark. Some of them intersecting with previous link.

Visual features tracking Approach

  • Background
    • A Visual feature: Some special patches in an image(See Sift)
      • Patch: (e.g. Pixel (20, 34), radius = 1.5, angle 60)
        • Special? e.g. For scale, rotation, etc.
      • Feature Detector: Detect special features.
        • SIFT detection:
        • See 1, See 2, See 3. Each circle/rectangle represent a special position in the image
      • Feature Descriptor: Describe feature (e.g. In vector of 128 real number).
  • Scenarios
    • Either given a bounding box to track it.
    • Classifier to do initial object detection to track. You are subjected to classifier mistakes.
    • None..use features to guess objects locations/sizes...this is hard and buggy
  • Approach
    • Detect “good” features to track in this box
    • Track them from frame to another using optical flow
      • Optical flow allow us to track pixel from frame to next one

Blob Tracking Approach: Terminologies

  • Background Subtraction = Foreground Detection
    • Challenging in images, easier in Videos (Stationary)
    • Video intuition: Same pixels in consecutive frames
    • See 1. See 2
      • Notice that: It is not perfect..hence affect tracker quality.
  • Track
    • Each detected object may appear in several frames
    • These locations represent its track
    • Useful Statistics to calculate:
      • Age: Since how many frames this object appeared
      • Total viability: How many frames we detected it?
      • Consecutive disappearance: If it is not detected right now, since when we couldn't detect it?
      • And more..based on need
  • Observations
    • Typically some algorithm will guess/observe some objects in the new frame.
      • E.g. run background subtraction..now you have binary mask
      • Identify connected components..discard very small ones
      • Now, each component (its bounding box) is a new observed object.
    • Observations to Track assignments
    • Each track has the location of last frame
  • Observations to Track assignments (Data association)
    • We have many observed locations..but we don't know to which tracks to assign?
      • E.g, For each track, which observation (location) is the next for this track?
    • There are different ways to handle that. One way is formulating as assignment problem
    • Construct Bipartite graph (tracks observations vs locations)
    • Define cost function (e.g. Euclidean over centroid - or more complicated one)
    • Solve Assignment problem (e.g. Using Munkres):
      • Let left be centroids of current objects
      • Let right be centroids of guessed new objects
      • Define cost function - Build matrix
      • Solve the assignment problem.
    • Concerns: If we have O (Observations) x T (Tracks) matrix
      • O = T
        • Good, every track has an assignment
      • O < T
        • Some tracks won’t be matched
        • Concern: When should we remove them?
        • Concern: May object appear again in future?
      • O > T
        • Some new locations won’t be matched
          • They are purely new objects
          • Create new tracks (initial position is the observation location) for them
    • Concern: What if actually an object disappeared and new one appeared..but we considered this 1-1 matching?
      • We may limit an assignment based on distance...make penalties on the distance
      • In other words, never consider far objects as assignment!
      • That is, we may have O=T, but actually some T don't have assignments

Blob Tracking Approach

  • Do background Subtraction
  • Using the binary image
    • Find all connected components
    • Compute bounding box and centroid
  • Filter any tiny boxes
  • Assign the tracks to boxes
  • And so on
    • What is this box with word predicted? This is kalman role..see next
  • Challenges
    • How to handle collision/intersection of objects? Tags may switch?
    • How to handle the weak results of background subtractor?
    • How to handle object that disappears for a while? May use Kalman FIlter
        • Think for a while, if car is moving, then entered a tunnel for 3 seconds and went out.
          • The blob will consider 2 cars, one before enterint tunnel..and one going out
        • We need some algorithm to “predict” the object location when it is invisible.
          • It may actually disappear forever
          • Or may comeback and better recognize that

Kalman Filter

  • Good Explanation
  • "Theoretically, a Kalman filter is an estimator for what is called the linear quadratic Gaussian (LQG) problem, which is the problem of estimating the instantaneous “state” of a linear dynamic system perturbed by Gaussian white noise, by using measurements linearly related to the state, but corrupted by Gaussian white noise. The resulting estimator is statistically optimal with respect to any quadratic function of estimation error. R. E. Kalman introduced the “filter” in 1960 (Kalman 1960).Src.
  • It tries to model the moving object (linearly)
  • Object has state (e.g., position, velocity for a car)
    • Covariance matrix (between state elements)
    • Controls that affect object (braking force)
    • State transition matrix, control matrix, noise
  • We have some measurement source to correct kalman model.
  • Example
    • We have a moving Car
      • 3 Satellites together try to guess location..this is our Measurement source (but noisy)
      • We need a predictor to estimate better
        • Also, predict when signal is lost
      • Algorithm
        • Predict Step: Using Linear model, estimate state
        • Correct Step: use measurement to correct estimation
  • Implementation
    • The algorithm is set of linear and simple steps
    • Matlab / OpenCV provide this framework
    • The real Challenge:
      • Define the matrices correctly (understanding motion equations help)
    • In practice
      • We may do assumptions (e.g. No relation between state elements, Covariance = Identity) for easy impl.

Blob + Kalman For prediction

  • We use Kalman as helper when object doesn’t appear in this frame (leave/disappear/noise)
    • Use blobs to find observations...do assignments
    • If track was matched, use observation location. Inform Klaman model (correct step)
    • If track was not matched, use Kalman Prediction
      • If we exceeded threshold of not matching...remove object
  • Blob Tracking + Kalman Source Code

Further

  • Particle filter tracking: a hypothetical tracker, that approximates the filtered posterior distribution by a set of weighted particles.
    • It weighs particles based on a likelihood score and then propagates these particles according to a motion model.
    • Handle highly nonlinear and non-Gaussian models in Bayesian Filtering
    • Slow in computations
  • Old but useful: Object Tracking: A Survey
  • In case needed to detect objects (e.g. moving camera), you may use a Slow, but state of the art detector: Fast RCNN
  • Sometimes video is constrained in nature. Utilize that for better tracking
    • E.g. In volleyball matches, the court has 2 major colors. Removing them and the court lines will reveal the players blobs.

State-of-the art

  • VOT2014 Benchmark. Dlib library implements the winner (moving camera).
  • Visual Tracker Benchmark.
    • It seems for me Struck paper is the best.
  • Visual Object Tracking Repository
  • CCV Library has C++ implementation to one of state-of-the-art algorithms
    • TLD (a.k.a. “Predator” algorithm) in C.
    • Given window => Track it for a moving camera.