I am currently leading an exploratory research project focusing on advancing generative models in CCTV video footage. This project is fully funded by Airbus Endeavr Wales Ltd, Airbus Defence and Space, United Kingdom from May 2023-April 2024 (approx £160K GBP) with possible extension of 24 months from May 2024 onwards (approx £320K GBP).
Challenges:
-concept-proved generation and detection of end user-definable elements in video camera streams.
-develop leading algorithms for complex open-set video generation.
-explore contextual relationship between objects and temporal consistency for high-quality, high-resolution video generation.
Findings:
We enabled the local region edit by respecting the user-guided prompts, i.e., text-driven video editing in real-world.
We brought off the integration of editable objects and background scenes via blended latent diffusion.
We enhanced temporal consistency across frames by transforming the self-attention block of U-Net into spatial-temporal blocks.