A New Era of AI-Generated Videos

Science

By Malakhi Beyah, 2025

Published 2/29/2024

OpenAI’s Sora pushes the limits of artificial intelligence with its hyper-realistic videos. Photo courtesy of OpenAI.

“Several giant wooly mammoths approach, treading through a snowy meadow.” “A close-up shot of a Victoria crowned pigeon.” “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage.” These vastly different prompts were among the many that Sora, OpenAI’s new video generation software, was able to analyze and transform into lifelike short videos. This innovation is leaps and bounds ahead of what artificial intelligence was previously capable of. Before, AI was generally only reliable for generating text and images; however, Sora can create the depth, shadows, and motions that one would expect from professional videos.

Sora’s seamless videos come from a machine learning technique known as the “diffusion transformer,” an ingenious integration of two common ways to teach AI. Diffusion models, often used in AI programs like Midjourney, merge enormous amounts of random media (like images or videos) until no single piece of media is distinguishable, and the AI program learns by gradually dissecting the mess of media until the desired image or video is left. Transformer models complement diffusion models by determining what parts of media were usually cut out in other instances and using that information to make the diffusion process more efficient. Combining diffusion and transformers have given AI the ability to analyze substantially more media than it could in the past, introducing the possibility of making stunning, convincing videos. 

That being said, Sora’s creators admit that the tool is not perfect. While the videos might seem incredibly realistic at first glance, a harder look at each of them makes their small mistakes obvious. On Sora’s homepage, OpenAI states that the video generator “may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect.” For example, only fifteen seconds into the Tokyo woman’s video, her legs seem to revolve under her dress and switch positions with one another while she is walking. In another video, a few pages in a man’s book flip down to up instead of left to right. Perhaps most infamously, one clip features wolf pups that spontaneously appear out of one another.

Before criticizing Sora, though, people must keep in mind how new the program is. AI-generated videos progressed from being blatantly fake and nonsensical to possibly passing off as actual videos among the general public, and it did so within the span of one short year. The rapid rate at which AI continues to advance only leaves us to wonder how much more AI will grow and influence our lives in the near future.