Google debuts Gemini Omni, a model that creates video from any input

Read the original article →

What happened

Google unveiled Gemini Omni at I/O 2026 on May 19. Google describes it as a model that can create anything from any input, starting with video. Omni combines images, audio, video, and text as input and produces high quality video as output.

Google says Omni mixes Gemini's knowledge of history, science, and culture with a stronger understanding of physics. That includes forces like gravity, kinetic energy, and fluid dynamics, which helps generated video look more real.

Why it matters

Video generation has become one of the most watched parts of AI. Better physics understanding is a key gap in many tools, where water, cloth, and motion often look wrong. A model that handles these better is closer to professional use.

The move also puts Google directly against tools like Runway, Kling, and its own Veo line, at a time when OpenAI has stepped back from its Sora app.

MintedBrain take

For creators, the input flexibility matters most. Mixing image, audio, video, and text in one prompt opens new ways to edit and remix. The physics focus is the right bet, since realism is where most AI video still breaks down.

References

This article was originally published at Google. For the full piece, read the original article.

Discussion

  • Loading…

← Back to News