Introduction

We create a new large-scale benchmark dataset named Bilingual, Open World Video Text(BOVText), the first large-scale and bilingual benchmark for video text spotting in a variety of scenarios. All data are collected from KuaiShou and YouTube

There are mainly three features for BOVText:

  • Large-Scale: we provide 2,000+ videos with more than 1,750,000 frame images, four times larger than the existing largest dataset for text in videos.

  • Open Scenario:BOVText covers 30+ open categories with a wide selection of various scenarios, e.g., life vlog, sports news, automatic drive, cartoon, etc. Besides, caption text and scene text are separately tagged for the two different representational meanings in the video. The former represents more theme information, and the latter is the scene information.

  • Bilingual:BOVText provides Bilingual text annotation to promote multiple cultures live and communication.


There are four kinds of description information for each video text instance.


Questions?

Contact weijiawu@zju.edu.cn