Veo-2 vs Sora: What Google Has (That OpenAI Doesn’t)

Summary-at-a-Glance

Google’s use of physically-realistic object models in its Veo-2 video generation system takes us a step closer to artificial general intelligence (AGI).

No, it is NOT – in and of itself – an AGI. It is still a generative capability, specifically good at text-to-video generation.

However, it differs sharply from simpler, “pure-play” video generative AIs, such as OpenAI’s Sora. The reason is that Veo-2 is not just generative, it includes physical object models, and thus incorporates at least TWO of the THREE NECESSARY AGI elements:

  • Multiple representation levels,
  • Feedback loops, and
  • (POSSIBLY – we don’t know at this moment) – a control system.

In this blogpost, we discuss the rudiments of Veo-2’s capabilities, and also touch on how Google has deep roots that may very well make it the winner in the emerging AGI wars.

Here’s a short collection of side-by-side comparisons of Veo-2 with Sora by Ibaba (2024): “Google Really Destroyed OpenAI and Sora Without Even Trying.” (This Medium.com article requires membership to read.)


The Related YouTube Video

The related YouTube vid for this blogpost is HERE.


Key Veo-2 Innovation: Realistic Physical Models

Google’s Veo-2, released in mid-December, 2024, is a groundbreaking new step in video generation. As described in VeoInsights.com, “Veo 2 focuses on precision. It creates videos that not only look natural but also align closely with the input instructions. From specific camera angles to artistic lighting styles, this tool handles complex prompts without missing a beat.”

One of the most important differentiators for Google’s Veo-2 as compared with OpenAI’s Sora is that Veo-2 integrates physically realistic object models. Yes, it also has much finer resolution – but we’re focusing today on the physically-realistic object models as a key and differentiating feature.

As noted in Clark’s 2022 article, “DeepMind researcher Luis Piloto and his colleagues developed an AI, dubbed PLATO, which adopts the thesis that objects play a central role in the representation and prediction of the physical world around us.” (See Piloto et al., 2022.)

It’s important for us to note: Google DeepMind’s investigations into and development of physically-realistic object models has a deep timeline; their early work goes back to 2018 (and possibly earlier). (See Piloto et al., 2018).


OpenAI’s Visual Models Research

OpenAI has also devoted time and energy to generating visual models. They have focused for the past few years on generating visuals from text, as described in a 2021 work for which Radford was the lead author, and for which Ilya Sutskever was one of the twelve (in total) authors.


Related Vids & Blogposts

There are several related YouTube vids. The most crucial are identified below.

This vid (produced in Oct., 2024) is an update to a prior vid where we described the “AGI Wars.” The big change was Friston et al. (2024) introducing Renormalising Generative Models (RGMs).

Here’s the accompanying blogpost:

  • Maren, Alianna J. 2024. “AGI Wars: Emerging Landscape.” Themesis, Inc. Blogpost Series (Oct. 20, 2024). (Accessed Jan. 20, 2024; available online at AGI Wars: Emerging Landscape.)

The predecessor YouTube on “AGI Wars” is HERE:

Here’s the accompanying blogpost:

  • Maren, Alianna J. 2024. “Emerging AGIs: Early 2024 Playing Field.” Themesis, Inc. Blogpost Series (May 13, 2024). (Accessed Jan. 20, 2024; available online at Emerging AGIs.)

In this vid, we do a contrast-and-compare with Friston’s Renormalising Generative Models (RGMs) and LeCun’s Joint Embedding Predictive Models (JEPA).

  • Maren, Alianna J. 2024. “Comparing Three Leading AGI Contenders (Part 1 of Many).” Themesis, Inc. Blogpost Series (June 26, 2024). (Accessed Jan. 20, 2024; available online at Comparing Three Leading AGI Contenders.)

This is the vid where we introduce the AGI architecture that we discuss.

We don’t have a specific blogposts associated with this YouTube, but have several “generative AI self-study” blogposts published around that same time. One of them is:

  • Maren, Alianna J. 2024. “AGI: Generative AI, AGI, the Future of AI, and You.” Themesis, Inc. Blogpost Series (June 26, 2024). (Accessed March 24, 2024; available online at AGI: Generative AI, AGI, Future … .)

CNN History

CNNs (convolutional neural networks) were invented in 1989 by LeCun et al. (See references at the end of this blogpost.)

We published a YouTube #short on CNNs.


References and Resources

Veo-2 and Physically-Realistic Object Models

  • Clark, Lindsey. 2022. “DeepMind AI Reacts to the Physically Impossible like a Human Infant.” The Register (11 Jul 2022). (Accessed Jan. 7, 2025; available at https://www.theregister.com/2022/07/11/deepmind_ai_baby_brain/.)
  • Ibaba, Tari. 2024. “Google Really Destroyed OpenAI and Sora Without Even Trying.” Medium.com (Dec. 22, 2024). (Accessed Jan. 6, 2025; available online at https://medium.com/coding-beauty/new-google-veo-2-cb7339625bb5 .) 
  • Piloto, Luis, Ari Weinstein, Dhruva T.B., Arun Ahuja, Mehdi Mirza, Greg Wayne, David Amos, Chia-chun Hung, Matthew Botvinick. 2018. “Probing Physics Knowledge Using Tools from Developmental Psychology.” arXiv:1804.01128v1 [cs.AI] (3 Apr 2018). (Accessed Jan. 7, 2025; available at arXiv.)
  • Piloto, Luis, Ari Weinstein, Peter Battaglia, and Matt Botvinic. 2022. “Intuitive Physics Learning in a Deep-Learning Model Inspired by Developmental Psychology.” Google DeepMind Blogpost Series (11 Jul 2022). (Accessed Jan. 7, 2025; available online at Google DeepMind Blogpost Series.)
  • Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. “Learning Transferable Visual Models From Natural Language Supervision.” arXiv:2103.00020v1 [cs.CV] (26 Feb 2021). (Accessed Jan. 7, 2025; available at https://arxiv.org/pdf/2103.00020 and also at https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language.pdf.)
  • VeoInsights Editorial Team. 2024. “Google Veo 2 – Revolutionizing AI and Defining the Future.” VeoInsights.com (19 Dec. 2024). (Accessed Jan. 7, 2025; available online at VeoInsights.com)

Convolutional Neural Networks

  • LeCun, Yann, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. “Handwritten Digit Recognition with a Back-Propagation Network.” Proc. NIPS 1989. (Available online at NIPS Proceedings 1989.)
  • Krizhevsky, Alex, Ilya Sutskever, Geoffrey Hinton. 2012. “ImageNet Classification with Deep Convolutional Neural Networks.” Proc. NIPS 2012. (Available online at NIPS Proceedings 2012.)
  • Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. 2021. “Learning Transferable Visual Models from Natural Language Supervision.” arXiv:2103.00020v1 [cs.CV] 26 Feb 2021. (arXiv.)

David Marr & the 2 1/2D Object Model

  • Marr, David. 1980. “Visual information processing: the structure and creation of visual representations.” Philos Trans R Soc Lond B Biol Sci Jul 8;290(1038):199-218. doi: 10.1098/rstb.1980.0091.(Available online at Philos Trans R.)

Comments from Readers (Suggested References)

The following references were suggested by a reader (@hjups), who commented on the YouTube. (AJM’s note: will be reading/responding tonight; thanks, @hjups!)

  • Motamed, Saman, Laura Culp, Kevin Swersky, Priyank Jaini, and Robert Geirhos. 2025. “Do Generative Video Models Learn Physical Principles from Watching Videos?” arXiv:2501.09038v1 [cs.CV] (14 Jan 2025). (Accessed Jan. 20, 2025; available online at arXiv.)
  • B Kang, et al. 2024. “How Far Is Video Generation from World Model: A Physical Law Perspective.” arXiv preprint arXiv:2411.02385 (2024). (Accessed Jan. 20, 2025; available online at arXiv.)
  • Eslami, et al. 2018. “Neural scene representation and rendering.” Science 360:6394. (Accessed Jan. 20, 2025; available at: https://www.science.org/doi/10.1126/science.aar6170. (This is for those who have established an account with Science; this can be done free of charge; the abstract alone is available at https://www.science.org/doi/10.1126/science.aar6170.)

3 comments

  1. The potential influence of this feature on collaborative AI systems is one area that is very interesting. Veo-2 may open the door to improved collaboration in fields like robotics or large-scale simulations by allowing numerous AIs to share a consistent understanding of physical settings. Could this be a step toward a fundamental AGI aspect of cross-AI collaboration?

  2. Hey Dr. A.J.,

    Do you go deeper into these type of topics in your other AI elective course?

    I am curious if this new VEO2 model is so much better than Sora, why isn’t Gemini also a great deal better than ChatGPT? I tried Gemini again last month and I feel like ChatGPT is better, given I only compared the free models. I am not even sure if Gemini has a paid model.

Comments are closed.

Share via
Copy link
Powered by Social Snap