Keywords: Video Search, BERT, Siamese Network, Text Classification
[1] Visual Search - Multi-Lingual Video Search : Train Siamese/Two-Tower networks to align users video search query (query text) and the indexed videos (title text + video frames). Key Challenges: Noisy training data, conflicting requirements from effectiveness of model and low storage/latency for large scale real-time service, different languages .
[2] Video Captioning for video keywords extraction - about 6 months
In this project, a coca-like contrastive visual captioner was explored, to generate more useful keywords for the indexed videos, under the text-based video search product channel. The exploration was mainly for English keywords generation. In the end, the generated keywords are with >70% relevant (this is not high) to the video, and by introducing thresholds on the generated Keywords, each video can receive 1-2 new keywords, and >90% are relevant.
Interestingly, during the exploration we came to a futuristic idea highlighting generative visual search, which was to offer visual search service with summary and visualization of semantic structure of query and retrieved contents. The background issue was, some retrieved contents are too long for users to make a quick understanding, and make decisions to click-in or skip.
In brainstorming, we proposed to summarize a lengthy retrieved document into a concise sentence and/or photo to highlight the theme, and relevance to the query. For visualization, we proposed to organize the retrieved results in a hierarchical tree structure, or visualize the semantic relationship of the query and results with 2D/3D graph, to help users understand the relevance and the uniqueness of each retrieved result and take further interactive actions.
Unfortunately the importance of idea was not recognized by client at that time. Later we found in the spot-light search engine product offered by Perplexity.ai, they provide summaries to search results via LLMs. Generative Search Engine became a mega trend for major search engines, e.g. with Google, Bing, and the new-bee SearchGPT from OpenAI.
Moreover, I admired the courage, calmness and professional spirit of the colleague who worked on this project. Her hometown and parents were in war zone at the time. However, she successfully managed her mental pressures, and the work was carried out with quality and good timing.