OpenAI has announced Sora – a groundbreaking AI model that is capable of generating lifelike and imaginative videos from simple text prompts. Sora can generate videos up to a minute long, adhering to the user’s description and maintaining visual quality throughout the length of the video. Openai’s post on X is embedded right below.
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W
Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024
The company says the new AI model excels at depicting complex scenes with multiple characters, nuanced emotions, and intricate details of the environment. In addition to interpreting user prompts accurately, Sora can also understand the underlying physics and spatial dynamics from descriptions.
This makes the model better at grasping the physical relationships between objects and characters to ensure coherence within the generated videos.
Despite its remarkable capabilities, OpenAI acknowledges that Sora has some limitations. Based on the generated videos and the company’s notes, the AI model may sometimes struggle with intricate physics simulations and may fail to understand cause-and-effect scenarios. It might also misinterpret spatial details or encounter challenges with precise, time-based descriptions.
Moreover, OpenAI is prioritizing safety measures before Sora is made available to everyone. The company is collaborating with misinformation and bias experts to rigorously test Sora for vulnerabilities. OpenAI plans to implement tools to detect misleading content and put safeguards in place to prevent violent, hateful, or sexual content and celebrity likeness from being generated using Sora.
As of now, Sora is available to “red teamers” who are assigned to assess the model’s harms or risks. OpenAI states that it’s also granting visual artists, designers, and filmmakers access to Sora to gain feedback.
Here are a demonstrations that OpenAI shared on X.
Prompt: “A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.” pic.twitter.com/gzEE8SwP81
— OpenAI (@OpenAI) February 15, 2024
here is a better one: https://t.co/WJQCMEH9QG pic.twitter.com/oymtmHVmZN
— Sam Altman (@sama) February 15, 2024
Since news first broke out of OpenAI’s impressive new text-to-video AI tech, further developments have allowed OpenAI to give us more sneak peeks of what Sora is capable of.
Everything from the museum’s architectural details to the camera pans is jaw-droppingly impressive. But if that wasn’t enough, the lead researcher on Sora, Bill Peebles, revealed what a prompt like “An alien blending in naturally with New York City, paranoia thriller style, 35mm film” would look like:
Prompt: an alien blending in naturally with new york city, paranoia thriller style, 35mm film
Video generated by Sora pic.twitter.com/eUmoaFcSab
— Sora Ai Clips (@SoraOpenAiClip) March 3, 2024
No sooner was this clip released than content creator Blaine Brown took the original, added some rap music with Suno AI, and used Pika Labs AI to have the alien lip-syncing the song. Just perfect!
This Sora clip is ? when the alien guy busts out in a lip-synced rap about how tough it is being different than everyone else. Workflow in the thread.@suno_ai_ @pika_labs (lip sync)
Alienate Yourself ??? pic.twitter.com/kc5FI83q5R
— Blaine Brown (@blizaine) March 3, 2024
Not only does it reflect how powerful this upcoming technology could be but also the ease with which the results could be modified. For content creators (and many others), this is a dream tech in the making.
The pace at which text-to-video AI tech is evolving has us drooling. What about you?
Discussion