Unsupervised Visual Representation Learning by Graph-based Consistent Constraints
ECCV 2016


Learning rich visual representations often require training on datasets of millions of manually annotated examples. This substantially limits the scalability of learning effective representations as labeled data are expensive or scarce. In this paper, we address the problem of unsupervised visual representation learning from a large, unlabeled image collection. By representing each image as a node and each nearest neighbor matching pair as an edge, our key idea is to leverage graph-based analysis to discover positive and negative image pairs (e.g., pairs belonging to the same and different visual categories). Specifically, we propose to use a cycle consistency criterion for mining positive pairs and geodesic distance on the graph for hard negative mining. We show that the mined positive and negative image pairs can provide accurate supervisory signals for learning effective representations using Convolutional Neural Networks (CNNs). We demonstrate the effectiveness of the proposed unsupervised constraint mining method in two settings: (1) unsupervised feature learning and (2) semi-supervised learning. For unsupervised feature learning, we obtain competitive performance with several state-of-the-art approaches on the PASCAL VOC 2007 dataset. For semi-supervised learning, we show improved performance by incorporating the mined constraints on three image classification datasets.


Code and models

Source code: [Github page].

Mined constraints and pre-trained models: [ImageNet_unsup.zip].


    author = {Li, Dong and Hung, Wei-Chih and Huang, Jia-Bin and Wang, Shengjin and Ahuja, Narendra and Yang, Ming-Hsuan},
    title = {Unsupervised Visual Representation Learning by Graph-based Consistent Constraints},
    booktitle = {European Conference on Computer Vision},
    year = {2016},
    volume = {},
    number = {},
    pages = {}