Google debuts Gemini Omni, a model that creates video from any input

May 19, 2026183 reads

What happened

Google unveiled Gemini Omni at I/O 2026 on May 19. Google describes it as a model that can create anything from any input, starting with video. Omni combines images, audio, video, and text as input and produces high quality video as output.

Google says Omni mixes Gemini's knowledge of history, science, and culture with a stronger understanding of physics. That includes forces like gravity, kinetic energy, and fluid dynamics, which helps generated video look more real.

Why it matters

Video generation has become one of the most watched parts of AI. Better physics understanding is a key gap in many tools, where water, cloth, and motion often look wrong. A model that handles these better is closer to professional use.

The move also puts Google directly against tools like Runway, Kling, and its own Veo line, at a time when OpenAI has stepped back from its Sora app.

MintedBrain take

For creators, the input flexibility matters most. Mixing image, audio, video, and text in one prompt opens new ways to edit and remix. The physics focus is the right bet, since realism is where most AI video still breaks down.

UN opens its first Global Dialogue on AI Governance in Geneva
The United Nations convened its first Global Dialogue on AI Governance in Geneva on July 6, a two-day session established by the UN General Assembly as the first intergovernmental platform dedicated to AI. The UN said it brings together all 193 member states alongside private-sector and civil-society participants. The UN's Independent International Scientific Panel on AI presented a preliminary report to governments.
UN science panel warns AI is outpacing safeguards as governance summit nears
In a July 5 feature previewing its Geneva meetings, UN News published interviews with the co-chairs of the new Global Dialogue on AI Governance and the UN's Independent International Scientific Panel on AI. Panel co-chair Yoshua Bengio said AI capabilities are outpacing scientific understanding and that science currently cannot guarantee advanced AI will not cause catastrophic harm. Co-chair Maria Ressa described AI-amplified disinformation as an 'information Armageddon.'
xAI makes Grok Speech-to-Text and Text-to-Speech APIs generally available
xAI moved its Grok Speech-to-Text and Text-to-Speech APIs to general availability, giving developers audio transcription across 25 languages with batch and streaming modes plus natural-sounding speech generation. The move targets enterprise voice-agent developers building on the Grok platform. It is part of xAI's broader July 2026 developer-API expansion.
Anthropic moves to close loopholes Chinese firms use to access Claude
The Financial Times reported Anthropic has stepped up efforts to detect and shut down unauthorized Claude access by Chinese companies, identifying workarounds such as routing employee accounts through overseas subsidiaries and reimbursing engineers for personal subscriptions accessed via VPNs. Anthropic's detection now monitors indicators like user time zones and targets relay services. The company frames the activity as distillation attacks meant to extract Claude's capabilities.

References

This article was originally published at Google. For the full piece, read the original article.

Discussion

Loading…

← Back to News

What happened

Why it matters

MintedBrain take

Related articles

References

Discussion