On Label Granularity and Object Localization


Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper, we study the role of label granularity in WSOL.

In domains where a label hierarchy is available, we can choose which labels to train on.

What labels should we choose to maximize localization performance?

By training on coarse labels, we can get better box predictions using less data.


  • Controlling the label granularity boosts performance across 5 WSOL algorithms.

    • The gains are large! The choice of label granularity can be just as important as the choice of WSOL algorithm.

  • Controlling the label granularity improves data efficiency.

    • For example, training at a coarser level achieve the same performance with 15x fewer labels for CAM.

  • iNatLoc500 is a new large-scale fine-grained dataset for WSOL.

    • 500 fine-grained categories.

    • 138k images for weakly supervised training.

    • 25k images with manually verified bounding boxes for validation and testing.