1st OmniLabel workshop at CVPR 2023
Welcome to the 1st OmniLabel workshop on infinite label spaces for semantic understanding via natural language at CVPR 2023 in Vancouver.
When: Sunday June 18th, 2023, 8:00 AM (local time)
Where: Vancouver Convention Center, Room West 207
Zoom: See CVPR's virtual site
Announcements
[06/27/23] THANK YOU! Thanks to all attendees of the workshop (in-person and virtual), the participants of the challenge, our fantastic invited speakers, and the award sponsors. The leaderboard of the OmniLabel challenge is now public and everyone can participate!
[06/14/23] Program update: We will have Aljosa Osep present his work on "Learning To Understand The World From Video"!
[06/06/23] The challenge ended! We thank all the participants for their efforts in pushing the state-of-the-art in language-based detection. Please find the results here!
[05/04/23] Test set of the OmniLabel challenge is online now. See download instructions here. The challenge will close on 5/26!
[04/28/23] Our paper describing the benchmark is online on arXiv
[04/05/23] IMPORTANT UPDATE: We changed the track definitions to better match training dataset settings from existing works like GLIP or MDETR.
[03/29/23] Evaluation servers are online & UPDATED validation set & more details on the challenge are available now!! Please download the latest validation set and update the evaluation toolkit. The validation phase started > Upload your validation set results on the evaluation servers. Test set phase will start in about a month ...
[02/07/23] The OmniLabel benchmark is online! At this point, we are releasing the validation set and a Python toolkit for evaluation (test set for benchmark participation is coming soon). Please visit our website https://www.omnilabel.org for all the details
[12/15/22] Workshop accepted at CVPR 2023! See you soon in Vancouver ... stay tuned for more details on our new benchmark (planned release: end of Jan 31 by Feb 7)
Goals & motivation
The goal of this workshop is to foster research on the next generation of visual perception systems that reason over label spaces that go beyond a list of simple category names. Modern applications of computer vision require systems that understand a full spectrum of labels, from plain category names (“person” or “cat”), over modifying descriptions using attributes, actions, functions or relations (“women with yellow handbag”, “parked cars”, or “edible item”), to specific referring descriptions (“the man in the white hat walking next to the fire hydrant”). Natural language is a promising direction not only to enable such complex label spaces, but also to train such models from multiple datasets with different, and potentially conflicting, label spaces. As one concrete example, we are preparing a benchmark for object detection with a novel evaluation dataset that goes beyond generic object detection, open-vocabulary detection, or referring expression datasets. We have put together an experienced and motivated organizing team from both industry and academia, as well as a list of five confirmed invited speakers.