World Models: Five Competing Approaches

As of late 2025, there are five different approaches to creating world models. Here, we do a contrast-and-compare, identifying not only the specific areas in which each is (or can be) useful, the companies and “Chief Magicians” (leading scientists) behind each, and – most importantly – the money.

This blogpost is still under development; we’ll seal it up and put out pointers to it when it’s complete (or nearly so). Until then, we’ll be actually publishing early, as many of you will want to use the References and Resources list to get started.

Figure 1. Five competing approaches now exist for world models, potentially supporting AGI.

This overview focuses on systems (and/or approaches) that can model the spatial world, including object modeling. Thus, even though we can think of LLMs + RLHF as forming a “linguistic” world model, and even though many of these systems are now multimodal (and can generate images), we’re focusing here on systems devoted to spatial world modeling.

There are five general world model approaches, each of which can potentially contribute to AGI:

  • Google’s DeepMind offers three approaches to working with 3D worlds: Genie 3, SIMA, and Nano Banana. Here, we focus on Genie 3, which is dedicated to 3D world modeling. In contrast, SIMA (Scalable, Instructable Multiworld Agent) is a generalist AI agent trained to interact with a wide variety of 3D virtual environments and video games using natural language instructions. Nano Banana lets you visualize and design. Built on Gemini-3 reasoning, it lets you generate context-rich visuals. But for our purposes, Google’s Genie 3 is our focus as a world model. Google notes that “Genie 3 is our first world model to allow interaction in real-time, while also improving consistency and realism compared to Genie 2.”
  • Fei Fei Li’s World Labs company, with its Marble being another gen-AI (generative AI) approach; more of an attempt to build an internal world model than the more straight-forward gen-AI approaches used by others (e.g., Google’s Veo 3.1 and Genie 3, OpenAI’s Sora, etc.)
  • Yann LeCun has recently announced formation of his own company, Advanced Machine Intelligence (AMI), we trust it will be built around “LeJEPA” (LeCun’s invention: Joint Embedding Predictive Architecture),
  • Karl Friston and members of Verses.ai (under CEO Gabriel René), have developed AXIOM, which “represents scenes as compositions of objects, whose dynamics are modeled as piecewise linear trajectories that capture sparse object-object interactions” – object properties (including dynamic properties) are modeled using active inference, so this is a blending of a semi-symbolic (or at least physical-object modeling) with active inference, and
  • Neuro-symbolic world models, which are still the least developed and emphasized within the world model community.

We’ll be doing a contrast-and-compare among these different approaches over this post.

More specifically, we focus on three of these world model capabilities – Marble, LeJEPA/VL-JEPA, and AXIOM – leaving the discussion of neuro-symbolic computing to another post, and giving sparse attention to DeepMind’s Genie 3, as this latter one is fairly well understood from its transformer architecture baseline.


Motivation and Starting Thoughts

This review was inspired by an excellent EntropyTown (2025) article, which compared Google’s DeepMind approaches (e.g., SIMA 2 and Genie 3) with Fei Fei Li’s [World Labs] Marble and the likely advocacy of LeCun’s LeJEPA.

Notably absent from these conversations is any mention of Friston’s active inference, even though Verses.ai has released AXIOM, a system where users can experiment with “slotted” objects that have properties associated with them, and which use active inference to postulate future actions.

And, of course, neuro-symbolic computing – even though it’s been brought up over this past year (and certain companies and organizations are hiring for leadership roles in this area) – is still not mentioned as a viable contender in the world models arena.

In order to build AGI, we first need robust world models – models that inherently capture the symbolic nature of objects (and later, entities and concepts) that are represented. This does not dispute the role of transformer-based methods (e.g., LLMs, and also diffusion models, which are different). However, our core world models must allow for object representation that is NOT simply a function of token generation.

Fortunately, within the last few months, several new world models have emerged:

  • Genie 3 (along with SIMA and Nano Banana) from Google,
  • Marble, developed by World Labs, founded by Fei Fei Li with Justin Johnson, Christoph Lassner, and Ben Mildenhall,
  • VL-JEPA, developed by Yan LeCun, and now being introduced via LeCun’s new company, AMI Labs (Advanced Machine Intelligence Labs), and
  • AXIOM, developed by Verses.ai, based on the active inference concepts developed by Karl Friston, who is now Chief Scientist with Verses.ai.

The dominant use case suggested for these models, across all the companies, is a combination of robotics and AI training, as well as prototyping.


A Quick Look at Funding

Money isn’t everything, but it can give us a sense of how mature a product can be – based on how much money was available for product development.

Here’s a quick look at the investment in these four world models.

Figure 2. Investment into the four leading world model contenders: Google’s (DeepMind’s) Genie 3, AMI Labs’ (LeCun’s) VL-JEPA, World Labs’ (Fei Fei Li’s) Marble, and Verses.ai’s (René – CEO and Friston – Chief Scientist) AXIOM.

{* To be completed – AJM, Tuesday, Jan. 7th, 4:20 AM Hawai’i Time *}


Google’s Genie 3

Genie 3 is a pure-play generative approach to world modeling, based on the transformer architecture.

DeepMind researchers Jack Parker-Holder and Shlomi Fruchter state that Genie 3 [is] “created frame by frame based on the world description and actions by the user.” Genie 3, as with other Genie models, relies on the transformer architecture – making it conceptually similar to LLMs, such as Gemini 3, Open AI’s GPT 5, Anthropic’s Claude 4.5, and other LLMs.

As Dave Goyal, in the Think AI tech blogpost notes, “Genie 3 outputs a navigable 3D world that runs at 24 frames per second in 720p resolution. Players can use a keyboard or controller to move through these worlds, which remain consistent for several minutes at a time. Most importantly, Genie 3 retains a form of memory …”

Goyal also notes that “Genie 3 also introduces promptable world events, which are text commands that can alter the environment in real time.”

SIMA 2, also developed by DeepMind, is an agent that can be inserted into the Genie 3 world. As SIMA developers note, “SIMA is evolving from an instruction-follower into an interactive gaming companion. Not only can SIMA 2 follow human-language instructions in virtual worlds, it can now also think about its goals, converse with users, and improve itself over time.


World Labs’ Marble

World Labs’ Marble is a generative AI system that creates 3D worlds. We share information from the World Labs website, where they present a Marble Case Study.

Dr. Fei Fei Li, initiating Co-Founder and CEO of World Labs, expresses their fundamental belief: “Spatial Intelligence is the scaffolding upon which our cognition is built.” (See the Fei Fei Li “manifesto,” published on Substack, in the References and Resources at the end.)

But first, we highlight an interview with World Labs Co-Founder and CEO, Dr. Fei Fei Li.


Interview with Fei Fei Li & Justin Johnson

Dr. Fei Fei Li has given numerous interviews recently. The one that I like the most – where she seems to be most relaxed and animated – is THIS INTERVIEW that she gives, along with World Labs Co-Founder Justin Johnson. (Note: There are two other World Labs Co-Founders as well; Christoph Lassner, and Ben Mildenhall.)  

Latent Space. 2025. “After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs. ” Latent Space: The AI Engineer Podcast (Nov., 2025). (Interview of Fei Fei Li and Justin Johnson by Latent Space Founders swyx (Founder of smol.ai) and Alessio Fanelli (Partner and CTO at Decibel Partners).

This is Fei Fei Li at her most relaxed, exuberant, and charming – and Co-Founder Justin Johnson is equally delightful!

Of the many public appearances that Fei Fei Li has given recently (both interviews and a TED talk), this is the one that held my interest the most – it was well-worth the hour of watch-time, and I’ll be looking at it again!


Marble – Generating 3D Worlds

As described in a World Labs case study:
“For decades, robotics simulation has relied on manually curated environments — warehouses, kitchens, offices — each painstakingly modeled, textured, and tested. These environments are vital for robot learning but impossible to scale at the pace AI models demand.

Marble changes that. By generating 3D worlds directly from text or image prompts, it allows researchers to create thousands of photorealistic scenes quickly. Each scene includes depth, lighting, and geometry data, along with an exportable collider mesh for accurate physical interaction.

“This means researchers can test perception, planning, and control algorithms across unlimited visual and structural variations … “

The Marble case study makes it clear which elements were native to Marble, and which were imported using other (3D dynamic model) capabilities:

“Marble generated the static kitchen environment — including the architectural layout of the Gaussian splat. Separately, 3D models such as the robot, pan, boxes, microwave, and trash can were imported from external tools such as Infinigen and RoboSuite.

“A robotic manipulator was remotely controlled to interact with these objects — traversing the kitchen environment and completing structured tasks.” 

Figure 3. “Scaling Robotic Simulation with Marble,” screenshot taken from the World Labs Case Study. (For full citation, see References and Resources.)

Briefly summing some key points about Marble:

  • 3D scene building using Gaussian splats to create visual properties of specific objects and background, and it’s possible to change the camera position – not only rotating the camera through 360 degrees of visual field, but panning up and down as well,
  • Very high-resolution visual modeling, and
  • Intrinsically static, and no embedded knowledge of the “object physics,” e.g., (as Co-Founder Jonathan J expressed during this interview that he gave with Co-Founder Fei Fei Li), a 3D representation of a Roman arch does not carry with it the knowledge that if one brick is removed, the others might fall out. [Paraphrase on my part.]

What I find particularly attractive about Marble is that – independent of dynamic elements added manually by the researchers – it provides an access route to semantic (symbolic) object modeling in 3D space.

The World Labs blogpost notes that “The results point to a broader opportunity: controlled, semantic world generation.

Specifically, Marble researcher Hang Yin states that “Being able to specify object-level semantics and interactivity would transform how we generate training data for embodied AI. It would let us move from manual scene design to infinite, purposeful variation.”


Yann Lecun’s JEPA, LeJEPA and VL-JEPA

Yann LeCun introduced JEPA (Joint Embedding Predictive Architecture) in his 2022 paper, where he suggested JEPA as a potential AGI method.

Since then, there have been multiple JEPA advances, most recently VL-JEPA and LeJEPA.

VL-JEPA, or “vision language JEPA,” is a … {* to be completed; work in progress *}

Ron Schmelzer, writing for Forbes (2025), notes that “Nabla, a clinical AI company, formed an early partnership with AMI Labs. The goal is agents that don’t just summarize a doctor’s visit, but keep context throughout, and can juggle images and text to leave a trail anyone can follow.

Dave Smith, writing for Fortune, notes that LeCun is targeting a $3.5B valuation, and has already raised an initial funding round of $587M.

Both LeJEPA and VL-JEPA are evolutions of the basic JEPA (Joint Embedding Predictive Architecture) approach, developed by LeCun (2022).


VL-JEPA in Simple Terms

There are several good tutorials, both as YouTubes and as blogposts, about VL-JEPA.

Of the various YouTube tutorials, this one is very useful and informative – I think watching it more than once would be helpful.

AI Revolution. 2025. “They Just Built a New Form of AI, and It’s Better Than LLMs.” AI Revolution YouTube Channel (Dec., 2025). (Accessed Jan. 7, 2026; available at AI Revolution.)

Ben Dickson, a frequent Substack writer, used NotebookLM to put together this infographic on how VL-JEPA learns (in the two-stage training process).

Figure 4. Ben Dickson created this depiction of the two-stage training process for VL-JEPA using NotebookLM. See also his VL-JEPA tutorial on Substack (link in References and Resources).

Friston, Active Inference, and Axiom

Active inference, invented by Karl Friston, is a generative AI method that is – in simplest possible terms – variational inference where the model can be “actively” updated over time.

Figure X.1. Active inference is an “active” form of variational inference. Variational inference is a well-known generative machine learning method, which stands in contrast to neural network-based generative AI methods. There are now several evolutions that have spun out of active inference, most notably Renormalising Generative Models (RGMs) and Action Perception Divergence (APD).

Until about two years ago, interest in Friston’s active interest was largely academic. Friston’s 2013 & 2015 papers laid the foundation; what followed were deeper conceptual build-outs and a few small-scale studies.

Figure X.2. Friston formed his notions of active inference in his 2013 and 2015 papers (see References and Resources for full citations), with a predecessor conceptual work in 2010.

All this changed in 2024, when Gabriel René (founder and CEO of Verses.ai) persuaded Karl Friston to join his company as Chief Scientist. Since then, it’s been a slow and steady build.

In 2025, Verses.ai released the first-ever systems built on active inference:

  • AXIOM – a pure, down Python package; appropriate for research and education, and
  • Genius – a fairly easy-to-use monthly-fee capability with three different price points. There are a number of papers/tutorials/demos for Axiom. (See References and Resources.)

{* More to come. AJM, Tuesday, Jan. 7, 2026. *}


References and Resources

Overviews and Cross-Comparisons

This article does a cross-compare between Google’s DeepMind approaches (SIMA 2 and Genie 3), Fei Fei Li’s (World Labs’) Marble system, and Yann Lecun’s JEPA (likely now LeJEPA).

  • EntropyTown (on Twitter (X)). 2025. “Why Fei-Fei Li, Yann LeCun and DeepMind Are All Betting on “World Models” — and How Their Bets Differ.” entropytown @2025/Twitter (X) (Nov. 13, 2025). (Accessed Nov. 24, 2025; available at Google, Marble, LeJEPA.)

Genie 3

  • Parker-Holder, Jack and Shlomi Fruchter. 2025. “Genie 3: A New Frontier for World Models.” DeepMind (August 5, 2025). (Accessed Jan. 7, 2026; available at DeepMind.)
  • Goyal, Dave. 2025. “DeepMind Genie 3: A New Frontier for Simulation, Training, and Prototyping.” Think AI Corp. (Sept. 30, 2025.) (Accessed Jan. 7, 2026; available at Think AI.)

Marble

This is the best Fei Fei Li interview, along with World Labs Co-Founder Justin Johnson.

  • Latent Space. 2025. “After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs.” Latent Space: The AI Engineer Podcast (Nov. 12, 2025). (Accessed Jan. 7, 2026; available online at Latent Space.) (Interview of Fei Fei Li and Justin Johnson by Latent Space Founders swyx (Founder of smol.ai) and Alessio Fanelli (Partner and CTO at Decibel Partners).

This is the Marble Case Study presented by World Labs.

  • World Labs. 2025. “Marble: A Multimodal World Model.” World Labs Case Study. (Accessed Jan. 7, 2026; available at World Labs.)

This is Fei Fei Li’s manifesto – or vision for world modeling – as presented on Substack.

  • Li, Fei Fei. 2025. “From Words to Worlds: Spatial Intelligence is AI’s Next Frontier.” DrFeiFei on Substack (Nov. 10, 2025). (Accessed Jan. 7, 2026; available on Substack.)

LeCun’s JEPA (and Variants)

There are three things to consider:

  • The basic JEPA (Joint Embedding Predictive Architecture) concept, originally published in LeCun’s 2022 paper,
  • VL-JEPA (Vision-Language JEPA), and
  • Balestriero and LeCun’s effort to provide a theoretical backbone for JEPA.

JEPA (The Basic Concept)

First introduction of JEPA.

  • LeCun, Yann. 2022. “A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27.” Open Review (OpenReview.net). (Accessed May 6, 2024; available online at OpenReview.net.)

VL-JEPA (Vision-Language JEPA)

The following tech blogpost is a fast and easy way in which to get a VL-JEPA overview:

  • Dickson, Ben. 2026. “VL-JEPA is a lean, fast vision-language model that rivals the giants.” TechTalks (Jan. 3, 2026). (Accessed Jan. 5, 2026, available at TechTalks.)

LeJEPA – the Theoretical Backbone

This is a tech overview; solid – and much easier to read than the actual LeJEPA paper!

  • Vert, Ayona and Ksenia Se. 2025. “AI 101: What is LeJEPA? The Theory Upgrade JEPA Has Been Missing.” Turing Post (Nov. 19, 2025). (Accessed Jan. 7, 2026; available online at Turing Post.)

This is the theoretical justification for JEPA, now renamed LeJEPA.

  • Balestriero, Randall, and Yann LeCun. 2025. “LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics.” arXiv:2511.08544v3 [cs.LG] (14 Nov 2025). (Accessed Nov. 24, 2025; available at Abs, PDF.)

LeCun’s Latest – or the Starting of AMI Labs

This Forbes article describes Yann LeCun’s new start-up, Advanced Machine Intelligence Labs, or AMI Labs.

  • Schmelzer, Ron. 2025. “Yann LeCun’s New Startup AMI Labs: Can World Models Move Beyond Hype?” Forbes (Dec. 22, 2025). (Accessed Jan. 7, 2026; available at Forbes.)

This Fortune article reports on LeCun’s funding ambitions.

  • Smith, Dave. 2025. “AI Whiz Yann LeCun is Already Targeting a $3.5 Billion Valuation for His New Startup—and It Hasn’t Even Launched Yet.” (Dec.. 19, 2025.) (Accessed Jan. 7, 2026; available at Fortune.)

Friston’s Active Inference and Verses.ai’s AXIOM

Active Inference – Selected Resources

{* To be included shortly – AJM, Thursday, Jan. 8th, 8:30AM Hawai’i Time *}


AXIOM

  • Heins, Conor, et al. 2025. “AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models.” arXiv:2505.24784 [cs.AI] (May 30, 2025). doi:10.48550/arXiv2505.24784. (Accessed Jan. 7, 2025; available at arXiv.)

2 comments

    1. Dear Mr. Savage – thank you so much! I wrote in haste, too late last night. Making the fix. SO APPRECIATE your noticing this and letting me know! – AJM

Leave a comment

Your email address will not be published. Required fields are marked *

Share via
Copy link
Powered by Social Snap