Veo-2 vs Sora: What Google Has (That OpenAI Doesn’t)

Summary-at-a-Glance

Google’s use of physically-realistic object models in its Veo-2 video generation system takes us a step closer to artificial general intelligence (AGI).

No, it is NOT – in and of itself – an AGI. It is still a generative capability, specifically good at text-to-video generation.

However, it differs sharply from simpler, “pure-play” video generative AIs, such as OpenAI’s Sora. The reason is that Veo-2 is not just generative, it includes physical object models, and thus incorporates at least TWO of the THREE NECESSARY AGI elements:

  • Multiple representation levels,
  • Feedback loops, and
  • (POSSIBLY – we don’t know at this moment) – a control system.

In this blogpost, we discuss the rudiments of Veo-2’s capabilities, and also touch on how Google has deep roots that may very well make it the winner in the emerging AGI wars.

Here’s a short collection of side-by-side comparisons of Veo-2 with Sora by Ibaba (2024): “Google Really Destroyed OpenAI and Sora Without Even Trying.” (This Medium.com article requires membership to read.)


Key Veo-2 Innovation: Realistic Physical Models

Google’s Veo-2, released in mid-December, 2024, is a groundbreaking new step in video generation. As described in VeoInsights.com, “Veo 2 focuses on precision. It creates videos that not only look natural but also align closely with the input instructions. From specific camera angles to artistic lighting styles, this tool handles complex prompts without missing a beat.”

One of the most important differentiators for Google’s Veo-2 as compared with OpenAI’s Sora is that Veo-2 integrates physically realistic object models. Yes, it also has much finer resolution – but we’re focusing today on the physically-realistic object models as a key and differentiating feature.

As noted in Clark’s 2022 article, “DeepMind researcher Luis Piloto and his colleagues developed an AI, dubbed PLATO, which adopts the thesis that objects play a central role in the representation and prediction of the physical world around us.” (See Piloto et al., 2022.)

It’s important for us to note: Google DeepMind’s investigations into and development of physically-realistic object models has a deep timeline; their early work goes back to 2018 (and possibly earlier). (See Piloto et al., 2018).


OpenAI’s Visual Models Research

OpenAI has also devoted time and energy to generating visual models. They have focused for the past few years on generating visuals from text, as described in a 2021 work for which Radford was the lead author, and for which Ilya Sutskever was one of the twelve (in total) authors.


References and Resources

  • Clark, Lindsey. 2022. “DeepMind AI Reacts to the Physically Impossible like a Human Infant.” The Register (11 Jul 2022). (Accessed Jan. 7, 2025; available at https://www.theregister.com/2022/07/11/deepmind_ai_baby_brain/.)
  • Ibaba, Tari. 2024. “Google Really Destroyed OpenAI and Sora Without Even Trying.” Medium.com (Dec. 22, 2024). (Accessed Jan. 6, 2025; available online at https://medium.com/coding-beauty/new-google-veo-2-cb7339625bb5 .) 
  • Piloto, Luis, Ari Weinstein, Dhruva T.B., Arun Ahuja, Mehdi Mirza, Greg Wayne, David Amos, Chia-chun Hung, Matthew Botvinick. 2018. “Probing Physics Knowledge Using Tools from Developmental Psychology.” arXiv:1804.01128v1 [cs.AI] (3 Apr 2018). (Accessed Jan. 7, 2025; available at arXiv.)
  • Piloto, Luis, Ari Weinstein, Peter Battaglia, and Matt Botvinic. 2022. “Intuitive Physics Learning in a Deep-Learning Model Inspired by Developmental Psychology.” Google DeepMind Blogpost Series (11 Jul 2022). (Accessed Jan. 7, 2025; available online at Google DeepMind Blogpost Series.)
  • Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. “Learning Transferable Visual Models From Natural Language Supervision.” arXiv:2103.00020v1 [cs.CV] (26 Feb 2021). (Accessed Jan. 7, 2025; available at https://arxiv.org/pdf/2103.00020 and also at https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language.pdf.)
  • VeoInsights Editorial Team. 2024. “Google Veo 2 – Revolutionizing AI and Defining the Future.” VeoInsights.com (19 Dec. 2024). (Accessed Jan. 7, 2025; available online at VeoInsights.com)

Leave a comment

Your email address will not be published. Required fields are marked *

Share via
Copy link
Powered by Social Snap