OpenAI’s reveal of its Sora video generation tool in February awed the AI community with stunningly realistic and fluid video outputs. However, the carefully curated examples left out critical details about Sora’s current capabilities and limitations in real-world filmmaking scenarios. Those gaps are now being filled by an early production team granted access to create a short film using Sora.
Toronto-based digital filmmakers Shy Kids provided an inside look at the painstaking process of “actually using Sora” to visual effects outlet fxguide. Their insights reveal that while immensely powerful, Sora still has major hurdles around precise control, consistency, and scaling video generation for narrative storytelling.
Contrary to OpenAI’s polished samples implying an end-to-end generative video workflow, the Shy Kids team had to rely heavily on traditional filmmaking techniques like storyboarding, editing, visual effects, and extensive post-production work. This included manually rotoscoping out unwanted elements that Sora would randomly introduce across different “generations” or video renderings of the same prompts.
Attaining even basic levels of shot-to-shot consistency for elements like character wardrobes or background objects required “hyper-descriptive” text prompts according to Shy Kids’ Patrick Cederberg. Yet prompting alone proved insufficient for fully controlling precise character movements, camera angles, or timing of dramatic moments.
“There’s a little bit of temporal control…but it’s kind of a shot in the dark,” Cederberg said regarding choreographing specific actions within Sora’s generated videos. The team had to use workarounds like rendering portrait clips then cropping to emulate traditional camera pans and tracking shots.
In the end, Shy Kids estimated having to generate upwards of 300 separate 10-20 second Sora video snippets to produce their roughly 3-minute final short using just a handful. An acceptance ratio dwarfed by even live-action filming standards.
The filmmakers also revealed intriguing insights around Sora’s copyright protection filters which blocked attempts to generate videos emulating popular franchises, filmmaking styles, or referencing directors like Hitchcock and Aronofsky – prompting questions about what copyrighted works may have been used in training the model.
While undoubtedly a major creative milestone, Shy Kids’ candid Sora experience exposes the considerable limitations and human effort still required to mold the AI’s raw generative video outputs into coherent, high-quality narrative productions. As Sora and similar tools evolve, squaring their awe-inspiring potential against practical production challenges will prove crucial for realizing the promise of AI-native filmmaking.