VideoLanguage - Downloads

Downloads

V0.1

Train Annotation(google drive)

Test Annotation(google drive)

Readme:

The BOVText dataset is available for non-commercial research purposes only.
Please download the agreement and read it carefully.
Please ask your supervisor/advisor to sign the agreement appropriately and then send the scanned version (example) to Weijia Wu (weijiawu@zju.edu.cn).
After verifying your request, we will contact you with the dataset download link.

Ground Truth (GT) Format

We create a single JSON file for each video in the dataset to store the ground truth in a structured format, following the naming convention: gt_[frame_id], where frame_id refers to the index of the video frame in the video

In a JSON file, each gt_[frame_id] corresponds to a list, where each line in the list correspond to one word in the image and gives its bounding box coordinates, transcription, text type(caption or scene text) and tracking ID, in the following format:

{

“frame_1”:

[

{

"points": [x1, y1, x2, y2, x3, y3, x4, y4],

“tracking ID”: "1" ,

“transcription”: "###",

“category”: title/caption/scene text,

“language”: Chinese/English,

“ID_transcription“: complete words for the whole trajectory

…

{

"points": [x1, y1, x2, y2, x3, y3, x4, y4],

“tracking ID”: "#" ,

“transcription”: "###",

“category”: title/caption/scene text,

“language”: Chinese/English,

“ID_transcription“: complete words for the whole trajectory

}

“frame_2”:

[

{

"points": [x1, y1, x2, y2, x3, y3, x4, y4],

“tracking ID”: "1" ,

“transcription”: "###",

“category”: title/caption/scene text,

“language”: Chinese/English,

“ID_transcription“: complete words for the whole trajectory

…

{

"points": [x1, y1, x2, y2, x3, y3, x4, y4],

“tracking ID”: "#" ,

“transcription”: "###",

“category”: title/caption/scene text,

“language”: Chinese/English,

“ID_transcription“: complete words for the whole trajectory

}

……

}