Weekly Gathering for
Vision and Graphics Researchers and Enthusiasts
12:00 - 1:00 pm (ET), Star Room (32-D463) or Zoom†
Check out MIT Visual Computing Seminar mailing list († Zoom link and sign-up spreadsheet) and YouTube Channel for further information.
Most Recent Talks:
Title: Geometric Fields: Animation Beyond Meshes
Speaker: Ana Dodik
Time: Apr 16th, 12-1pm ET, 2024
Abstract: Despite a flurry of published papers on 3D data processing, modern tools such as Blender and Maya rely primarily on methods from over a decade ago. These tools rely on the finite-element method (FEM) to optimize various smoothness objectives. FEM-based algorithms---while providing strong guarantees necessary for disciplines like civil engineering---beget opaque and brittle pipelines in computer graphics. They put the burden on the end-user to ensure that 3D meshes are ``well-behaved'', requiring the user to e.g. remove self-intersections or to ensure obscure mathematical properties such as manifoldness, water-tightness, or suitable interior angle bounds. My research uses techniques inspired by modern machine learning tools, not for data-driven learning, but as the computational domain for problems in geometry processing that is agnostic of the shape representation and its quality, but is nonetheless aware of its geometry. These new mesh-free representations---geometric fields---allow our algorithms to focus on robustness, user control, and interactivity. This talk focuses on applications of geometric fields to problems in shape deformation and animation.
Speaker bio: Ana Dodik is a PhD student at MIT CSAIL working on neural representations for geometry processing. Prior to joining MIT, she spent two years developing next-generation virtual presence at Meta. She graduated with a Master’s degree from ETH Zurich, where she spent a year collaborating with Disney Research Studios on problems at the intersection of machine learning and offline rendering.
Coming Next:
Title: Sora: Video Generation Models as World Simulators
Speaker: Tim Brooks, OpenAI
Time: Apr 23rd, 12-1pm ET, 2024
Abstract: We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.
Speaker bio: Tim Brooks is a research scientist at OpenAI where he co-leads Sora, their video generation model. His research investigates large-scale generative models that simulate the physical world. Tim received a PhD at Berkeley AI Research advised by Alyosha Efros, where he invented InstructPix2Pix. He previously worked on AI that powers the Pixel phone's camera at Google and on video generation models at NVIDIA.
Seminar Organizers: