Weekly Gathering for
Vision and Graphics Researchers and Enthusiasts
12:00 - 1:00 pm (ET), Star Room (32-D463) or Zoom†
Check out MIT Visual Computing Seminar mailing list († Zoom link and sign-up spreadsheet) and YouTube Channel for further information.
Coming Next:
Title: Geometric Regularizations for 3D Shape Generation
Speaker: Qixing Huang, UT Austin
Time: Nov 19th, 12-1pm ET, 2024
Abstract: Generative models, which map a latent parameter space to instances in an ambient space, enjoy various applications in 3D Vision and related domains. A standard scheme of these models is probabilistic, which aligns the induced ambient distribution of a generative model from a prior distribution of the latent space with the empirical ambient distribution of training instances. While this paradigm has proven to be quite successful on images, its current applications in 3D generation encounter fundamental challenges in the limited training data and generalization behavior. The key difference between image generation and shape generation is that 3D shapes possess various priors in geometry, topology, and physical properties. Existing probabilistic 3D generative approaches do not preserve these desired properties, resulting in synthesized shapes with various types of distortions. In this talk, I will discuss recent work that seeks to establish a novel geometric framework for learning shape generators. The key idea is to model various geometric, physical, and topological priors of 3D shapes as suitable regularization losses by developing computational tools in differential geometry and computational topology. We will discuss the applications in deformable shape generation, latent space design, joint shape matching, and 3D man-made shape generation. This research is supported by NSF IIS 2413161.
Speaker bio: Qixing Huang is an associate professor with tenure at the computer science department of the University of Texas at Austin. His research sits at the intersection of graphics, geometry, optimization, vision, and machine learning. He has published more than 100 papers at leading venues across these areas. His research has received several awards, including multiple best paper awards, the best dataset award at Symposium on Geometry Processing 2018, IJCAI 2019 early career spotlight, multiple industrial and NSF awards, and 2021 NSF Career award. He has also served as area chairs of CVPR, ECCV, ICCV and technical papers committees of SIGGRAPH and SIGGRAPH Asia, and co-chaired Symposium on Geometry Processing 2020.
Seminar Organizers:
Past Talks:
Title: Sora: Video Generation Models as World Simulators
Speaker: Tim Brooks, OpenAI
Time: Apr 23rd, 12-1pm ET, 2024
Abstract: We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.
Speaker bio: Tim Brooks is a research scientist at OpenAI where he co-leads Sora, their video generation model. His research investigates large-scale generative models that simulate the physical world. Tim received a PhD at Berkeley AI Research advised by Alyosha Efros, where he invented InstructPix2Pix. He previously worked on AI that powers the Pixel phone's camera at Google and on video generation models at NVIDIA.