Part I: Features for large-scale visual recognition

In this first part, we will provide a brief overview of "standard" computer vision features for large-scale visual recognition. Learned features (including deep-learning features) will be discussed in a separate part (Part VI). We will first mention global descriptors such as the GIST [OT01] which have been proven useful for many large-scale problems including scene completion [HE07] and geo-localization from a single image [HE08]. We will then move to patch-based approaches and especially describe those techniques which aggregate patches into an image-level representation. This includes the popular bag-of-visual-words [SZ03, CD04] and its many extensions. We will especially review those recent extensions of the BOV which include higher-order statistics - the VLAD [JDS10b], the Fisher vector [PD07, PSM10] or the Super Vector [ZYZ10] - which provide rich information and yet are cheap to compute and therefore amenable to large-scale processing.

The slides are available below.

Florent Perronnin,
Jun 23, 2013, 6:04 PM