Using pretrained StableDiffusion for investigating the generation of the same object from a single image to a new view.
To detect action segments in videos using both temporal information, optical flow, and spatial information, RGB data.
A study on image reconstruction using text embedding optimization for real-world image editing
This research proposes PIE, a text-to-image model that overcomes the text length limitations of traditional models by compressing the meaning of long texts into a single padding embedding