Video-Language
VideoLanguage is an open source database and codebase for video-and-language tasks from our team.
Supported Datasets and Algorithm
The project includes(Algorithm):
TransDETR: End-to-end Video Text Spotting with Transformer
Contrastive Learning of Semantic and Visual Representations for Text Tracking
This project includes(Benchmark):
BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting
ViTVR: A Large-Scale Video Retrieval Benchmark with Vision and Text Aggregation
Datasets
BOVText: a new large-scale benchmark dataset named Bilingual, Open World Video Text(BOVText), the first large-scale and multilingual benchmark for video text spotting in a variety of scenarios. All data are collected from KuaiShou and YouTube.
There are mainly three features for BOVText:
Large-Scale: we provide 2,000+ videos with more than 1,750,000 frame images, four times larger than the existing largest dataset for text in videos.
Open Scenario:BOVText covers 30+ open categories with a wide selection of various scenarios, e.g., life vlog, sports news, automatic drive, cartoon, etc. Besides, caption text and scene text are separately tagged for the two different representational meanings in the video.
Bilingual:BOVText provides Bilingual text annotation to promote multiple cultures live and communication.
Questions?
Contact [email] to get more information on the project