World Models: JEPA and VL-JEPA

This blogpost provides a more extensive summary of the JEPA (Jonit Embedding Predictive Architecture) and VL-JEPA (Vision-Language JEPA) architectures developed by Yann LeCun. An earlier world models overview blogpost gave an initial summary and preliminary resource set.

This blogpost provides:

  • Links to other YouTube vids that describe VL-JEPA, and
  • Links to prior Themesis YouTubes and blogposts describing JEPA (typically as contrast-and-compare with Friston’s active inference).
  • Duplicating the earlier JEPA/VL-JEPA resource list from the earlier blogpost (scientific/technical links only, not corporate).

VL-JEPA: Some Introductory YouTubes

This YouTube by AI Revolution provides a very easy-to-understand contrast-and-compare between LLMS (large language models) and VL-JEPA. It overview of LLM problems to set the context for VL-JEPA.

AI Revolution. 2026. “They Just Built a New Form of AI, and It’s Better Than LLMs.” AI Revolution YouTube Channel (Jan., 2026). (Accessed Jan. 21, 2026; available at AI Revolution.)

The value of this YouTube is not just that it posits VL-JEPA in the context of LLMs, but it gives us a sense of how VL-JEPA is an entirely different approach to working with the vision + language combination.

This vid provides a high-level, easily-understood description of the VL-JEPA architecture.

Near the end (10:21 min), the authors describe VL-JEPA’s key element as “reasoning in latent space.” This is a hugely important concept.

It suggests that directly predicting latent space semantics can be more effective than narrating the world in words and reasoning over those words afterward” (10.21 min) … “It shifts the center of gravity from language to meaning (11.48 min).” (AI Revolution vid, see above)

The notion of latent space is perhaps the single most defining concept underlying all AI methods and models today. Eric Drexler has written an excellent tutorial on latent space representation (Drexler, 2025); well worth the read!

Underlying all of this is the notion of switching from tokens, or even latent space representations based on tokens (such as used in LLMs) to vector embeddings for the concepts embodied in the visual + language caption representations used for training.

We have a earlier video on word embeddings, as a contrast-and-compare with the earlier (simpler) TF*IDF (term frequency*inverse document frequency) approach; this may be useful.

Maren, Alianna J. 2020. “NLP: Tf-Idf vs Doc2Vec – Contrast and Compare.” Alianna J. Maren YouTube Channel (2020). (Accessed Jan. 21, 2026; available at AJM YouTube Channel.)

To learn more about the evolution of latent space thinking in NLP, this YouTube vide provides good insight and overview.

Maren, Aliana J. 2023. “Evolution of AGI and NLP Algorithms Using Latent Variables: Future of AI (Part 3 of 3).” Themesis YouTube Channel (May 31, 2023). (Accessed Jan. 21, 2026; available at Themesis YouTube Channel.)

JEPA (Joint Embedding Predictive Architecture) Basics

Yann LeCun developed the JEPA (Joint Embedding Predictive Architecture), presenting it in the context of an AGI architecture (Lecun, 2022).

Maren, Alianna J. 2024. “Joint Embedding Predictive Architecture (JEPA),” (adapted from LeCun’s 2022 paper). “AGI Basics: Five Key Reads.” Themesis Inc. Blogpost Series (May 20, 2024). (Accessed Jan. 21, 2026, accessible at Themesis blogposts.)

We’ve developed several YouTubes explaining JEPA as a contrast-and-compare between JEPA and Friston’s active inference. This is important, because we need to understand the key differentiating concepts.

This video introduces the contrast-and-compare between JEPA (developed by LeCun, then at Meta) and Active Inference (developed by Friston, and being developed into products at Verses.ai.)

Maren, Aliana J. 2024. “Five Key Papers (and Two Viewpoints) for AGI.” Themesis YouTube Channel (May 28, 2024). (Accessed Jan. 21, 2026; available at Themesis YouTube Channel.)

LeCun’s Early Work: CNNs

One of the most useful things that we can do, with any key invention, is to trace its provenance, or intellectual evolution. In this section, we trace LeCun’s primary contributions – first from convolutional neural networks (CNNs), then his positioning of CNNs and other methods vs. generative methods, and then his introduction of JEPA in 2022.

One of the most important things to know is that CNNs and JEPA both are NOT generative methods. CNNs are based on backpropagation.

Lecun is notably famous for inventing (with others) the which was based on Fukushima’s earlier efforts to create a multi-layer system that could recognize Japanese kanji characters. The crucial innovation that made CNNs possible was the backpropagation algorithm for neural networks, developed by Paul Werbos and presented initially in his 1974 Harvard Ph.D. dissertation.

Backpropagation was popularized by Rumelhart et al. in 1986. LeCun and colleagues presented the first version of CNNs in 1989.

LeCun & Bengio introduced the next version of CNNs in 1995.


References and Resources

LeCun’s JEPA (and Variants)

There are three things to consider:

  • The basic JEPA (Joint Embedding Predictive Architecture) concept, originally published in LeCun’s 2022 paper,
  • VL-JEPA (Vision-Language JEPA), and
  • Balestriero and LeCun’s effort to provide a theoretical backbone for JEPA.

JEPA (The Basic Concept)

First introduction of JEPA.

  • LeCun, Yann. 2022. “A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27.” Open Review (OpenReview.net). (Accessed May 6, 2024; available online at OpenReview.net.)

VL-JEPA (Vision-Language JEPA)

The following tech blogpost is a fast and easy way in which to get a VL-JEPA overview:

  • Dickson, Ben. 2026. “VL-JEPA is a lean, fast vision-language model that rivals the giants.” TechTalks (Jan. 3, 2026). (Accessed Jan. 5, 2026, available at TechTalks.)

LeJEPA – the Theoretical Backbone

This is a tech overview; solid – and much easier to read than the actual LeJEPA paper!

  • Vert, Ayona and Ksenia Se. 2025. “AI 101: What is LeJEPA? The Theory Upgrade JEPA Has Been Missing.” Turing Post (Nov. 19, 2025). (Accessed Jan. 7, 2026; available online at Turing Post.)

This is the theoretical justification for JEPA, now renamed LeJEPA.

  • Balestriero, Randall, and Yann LeCun. 2025. “LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics.” arXiv:2511.08544v3 [cs.LG] (14 Nov 2025). (Accessed Nov. 24, 2025; available at Abs, PDF.)

JEPA Contrast-and-Compare with Active Inference (Blogpost Links)

This blogpost has not only a bit on both JEPA and active inference, but also some VERY GOOD LINKS about Yann Lecun and JEPA. (Well worth going down the rabbit hole!)

  • Maren, Alianna J. 2024. “AGI Basics: Five Key Reads.” Themesis Inc. Blogpost Series (May 20, 2024). (Accessed Jan. 21, 2026, accessible at Themesis blogposts.)

Latent Space Representation

This is a highly readable and very useful tutorial on the overall concept of latent spaces.

  • Drexler, Eric. 2025. “LLMs and Beyond: All Roads Lead to Latent Space.” AIProspects Substack (Apr. 14, 2025). (Accessed Jan. 21, 2026; available at AIProspects Substack.)

This Themesis blogpost includes a link to LeCun’s 2022 ICLR (Int’l. Conf. Learning Representations) Conference keynote talk. This is a very useful talk – most of us will be able to follow the first several minutes. It will get a bit more complex when he gets into variational inference. Then, he progresses to self-supervised learning – the arguments there are based on generational methods and (specifically) variational inference.

  • Maren, Alianna J. 2023. “Latent Variables in Neural Networks and Machine Learning.” Themesis, Inc. Blogpost Series (July 10, 2020). (Accessed Jan. 21, 2026; available at Themesis Blogposts.)

LeCun’s CNN Invention and Other Historicals

Most of these references are pulled from a former Themesis blogpost, “Evolution of NLP Algorithms through Latent Variables: Future of AI (Part 3 of 3).”

This is where Rumelhart et al. presented backpropagation in 1986.

  • Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986. “Learning Representations by Back-propagating Errors.” Nature 323:9 (October, 1986). (Accessed Jan. 21, 2026; available at PDF.)

AJM’s Note: This is the original Fukushima paper, in which he proposed a multi-layered system designed to recognize handwritten Japanese kanji characters.

  • Fukushima, Kunihiko. 1980. “Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position.” Biol. Cybernetics 36: 193 202 (1980) (Accessed May 22, 2023; available online as https://www.rctn.org/bruno/public/papers/Fukushima1980.pdf.)

AJM’s Note: This is the original Yann LeCun et al. paper, presenting the first known instance of a convolutional neural network.

AJM’s Note: A follow-on work by LeCun and Bengio beautifully summarizes CNN development.

AJM’s Note: This is a key CNN breakthrough paper, presenting the AlexNet.

  • Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2017. “ImageNet Classification with Deep Convolutional Neural Networks.” Communications of the ACM 60 (6) (May 24, 2017): 84-90. doi:10.1145/3065386. (Accessed May 28, 2023; available online at https://dl.acm.org/doi/10.1145/3065386.)

AJM’s Note: Very good interview where Ilya Sutskever explains how he developed the first major convolutional neural network – the AlexNet, in concert with Alex Krizhevsky and Geoffrey Hinton.

  • Fridman, Lex with Ilya Sutskever. 2020. “Ilya Sutskever: Deep Learning | Lex Fridman Podcast #94.” Lex Fridman YouTube podcast series. (Accessed May 28, 2023; available online at https://www.youtube.com/watch?v=13CZPWmke6A.)

AJM’s Note: This paper by LeCun and colleagues summarizes the evolution of CNNs up through 2006, which is before Sutskever and Krizhevsky invented AlexNet in 2015.

  • LeCun, Yann, Koray Kavukcuoglu and Clement Farabet. 2008. “Convolutional Networks and Applications in Vision.”

I’m also liking this SUMMARY of one of LeCun’s talks.

This set of LeCun’s slides is possibly the same as the above.

AJM’s Note: The following are two very nice overviews of CNN evolution over the past several years. The second is comparable to the first, slightly different selection of CNNs. Both are good reads, very well-illustrated.

AJM’s Note: This is an excellent Medium.com article tracing the evolution of CNNs; beautifully illustrated (with a link to the author’s YouTube on CNNs as well) – excellent historical tutorial!

Biswas, Avishek. 2024. “The History of Convolutional Neural Networks for Image Classification (1989 – Today).” Medium.com (June 28, 2024). (Accessed Jan. 27, 2026; available at Medium.com.)


LeCun on Energy-Based Neural Networks

AJM’s Note: This paper by LeCun and colleagues gives us a sense of how LeCun was thinking about energy-based neural networks, which led (over time) to his view that we needed to move away from contrastive divergence (in generative AI) and move to a different approach. This work shows as a point on the “JEPA evolution path.”

  • LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., and Huang, F. (2006). “A tutorial on energy-based learning.” In Bakir, G., Hofman, T., Schölkopf, B., Smola, A., and Taskar, B., editors, Predicting Structured Data. MIT Press. (Accessed Jan. 21, 2026; available at PDF.)

AJM’s Note: This is an excellent deep learning review article by AI leaders Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. It is extremely readable – at the tutorial level, almost. It covers convolutional neural networks (CNNs) as well as deep learning architectures for a variety of tasks. The discussion of various issues, such as learning representations in neural networks, is excellent. This is an important “must-read” paper in anyone’s AI education.

AJM’s Note: Andrej Karpathy has a good and useful blogpost, in which he reproduced the original 1989 LeCun et al. findings.

AJM’s Note: Deep Boltzmann machines (DBMs) are also useful for some forms of image recognition. This is an example of using a DBM for analyzing a class of medical images. They contrast and compare with similar work using a CNN.

  • Jeyaraj, Pandia Rajan, Edward Rajan, and Samuel Nadar. 2019. “Deep Boltzmann Machine Algorithm for Accurate Medical Image Analysis for Classification of Cancerous Region.” Cognitive Computation and Systems 1 (3): 85-90 (24 September 2019). doi:10.1049/ccs.2019.0004. (Accessed March 28, 2023; available online at https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/ccs.2019.0004.)

The ImageNet database, created and nurtured by Fei-Fei Li.

2 comments

  1. Thank you the comprehensive post – it may take me a while, but I will work my way through your cited papers.

    1. Thank you, Fred! And really – I’m not expecting ANYONE (even myself!) to get through ALL these papers. (Some are historical. Some are so deeply complex; they’d be a semester of study.) But the important links are there, so if we ever DO decide to “deep-dive,” then we’ve got this stuff at hand!
      So appreciated! – AJM

Leave a comment

Your email address will not be published. Required fields are marked *

Share via
Copy link
Powered by Social Snap