TransDETR: End-to-end Video Text Spotting with Transformer
Contrastive Learning of Semantic and Visual Representations for Text Tracking