AGI Basics: Five Key Reads

We’re building our AGI discussion around a few core papers. The ones identified here are either central to our discussion – and will be featured across multiple YouTubes and blogposts (and also front-row in our AGI 101 Themesis course), or are hugely and supremely supportive of our studies.

This blogpost specifically correlates with THIS YOUTUBE:

Maren, Alianna J. 2024. “Five Key Papers (and Two Viewpoints) for AGI.” Themesis YouTube Channel (May 28, 2024). (Accessed May 28, 2024; available HERE.)

Here are the five essential papers.


Friston and LeCun – Actually Very Cordial

This is the YouTube of Karl Friston and Yann LeCun at Davos, 2023. Very pleasant interchange; very mutually respectful!

Casper Labs. 2024. “Day 3 Panel: Beyond the Hype Cycle What AI is Today, and What It Can Become. [Conversation between Karl Friston and Yann LeCun, moderated by Oliver Oullier. Davos, 2023.]” Casper Labs YouTube Channel (2024). (Accessed May 28, 2024; available HERE.)


Five Key AGI Papers

These five papers are at the top of our “reading mountain.” If we can make sense of ANY portion of ANY paper, we’re doing fabulous.

Figure 1. Five key papers represent the “top of the stack” – for two different approaches to AGI; the first three either introduce or extend (Hafner et al., 2022) Friston’s notion of “active inference.” The latter two advance LeCun’s architecture, specifically his Joint Embedding Predictive Architecture (JEPA) approach.

#1. Friston, 2013, “Life as We Know It”

AJM’s Note: Friston’s 2013 paper is the central point for theoretical (and mathematical) development of his notions on free energy in the brain, and in any living system. He starts with the notion of a system separated by Markov boundary from its external environment. Moves on from there. This blogpost series is largely focused on this paper, buttressed with Friston et al. (2015).

  • Friston, Karl. 2013. “Life as We Know It.” Journal of The Royal Society Interface10. doi:10.1098/rsif.2013.0475. (Accessed Oct. 13, 2022; pdf.)

#2. Friston et al., 2015, “Knowing One’s Place”

AJM’s Note: Friston and colleagues, in their 2015 paper “Knowing One’s Place,” show how self-assembly (or self-organization) can arise out of variational free energy minimization. Very interesting read!

  • Friston, K.; Levin, M.; Sengupta, B.; Pezzulo, G. 2015. “Knowing One’s Place: A Free-Energy Approach to Pattern Regulation.” J. R. Soc. Interface12:20141383. doi:10.1098/rsif.2014.1383. (Accessed Oct. 3, 2022; pdf.)

Hafner et al., 2022, “Action Perception Divergence”

This paper, led by Danijar Hafner, takes Friston’s earlier notion of active inference one step further.

AJM’s Note: Mention of this paper received multiple “thumbs up” in the comments for the YouTube associated with this post! Clearly an important paper!

  • Hafner, Danijar, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, and Nicolas Heess. “Action and Perception as Divergence Minimization.” arXiv:2009.01791v3 [cs.AI] 13 Feb 2022. (Accessed May 20, 2024. Available online at arXiv.)

#4. LeCun, 2022, “A Path Towards Autonomous Machine Intelligence”

In the other corner, we have Yann LeCun. In this paper, he sets out his basic concepts for an AGI architectuer.

  • LeCun, Yann. 2022. “A Path Towards Autonomous Machine Intelligence. Version 0.9.2, 2022-06-27” OpenReview (June 27, 2022). ((Accessed May 20, 2024. Available online at OpenReview.)

#5. Garrido et al. Taking LeCun (2022) One Step Further

  • Garrido, Quentin, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman and Yann LeCun. 2024. “Learning and Leveraging World Models in Visual Representation Learning.” arXiv: 2403.00504v1 [cs.CV] 1 Mar 2024. (Accessed May 27, 2024; available online at arXiv.)

My Own Tutorials

I wrote these tutorials to teach myself – and if they’re useful to you, yay!

  • The Reverse Kullback-Leibler Divergence – figuring this out was essential, and then
  • Variational inference – the “Rosetta stone” paper that I wrote to translate from Beal (2003) and Friston (2013, 2015) to Blei et al. and back again. (Note: Needs another update.)

The Kullback-Leibler Divergence – and the REVERSE Kullback-Leibler

I’ll admit – I got tremendously squirreled-up when I was teaching myself variational inference, because I just didn’t get that notion of “reverse Kullback Leibler.”

It was staring me in the face, of course. It was right there – and I just didn’t see it.

Of course, others have referred to this as the “reverse KL divergence” all along.

Figure 1. Hafner et al. (2022) note the “reverse KL [divergence].”

Now others may or may not have the same tripping point. But in the course of rectifying my earlier mis-understandings, I wrote – then REWROTE – a little tutorial on the K-L divergence, and the REVERSE K-L divergence.

Here’s the paper – still “privately published” – awaiting review, but giving you a sneak-peek here:

  • Maren, Alianna J. 2024. “Minding Your P’s and Q’s: Notational Variations Expressing the Kullback-Leibler Divergence.” Themesis, Inc. Technical Note THM TN2024-001 (ajm). (PDF last accessed Feb. 02, 2024.)

And here’s a blogpost by Dibya Ghosh that really helped me understand the difference between the straightforward KL vs. the reverse KL.

  • Ghosh, Dibya. 2019 (best guess). “KL Divergence for Machine Learning.” The RL Probabilist. (Accessed May 28, 2024; available HERE.)

My Variational Inference Tutorial

AJM’s Note: This tutorial is pedantic, weighty, and over-worked. That said … if it were me, trying to study the combined works of Friston (2013, 2015), Beal (2003), and Blei et al. all over again … I’d make this my go-to resource. (And I STILL DO make this my resource!) (NOTE: it needs a minor rewording around the “P’s and Q’s” – will get to that soon.)

This is the “Rosetta Stone” tutorial on variational Bayes; it traces Friston’s notation back into Matthew Beal’s dissertation (his primary notation source), and also cross-correlates with David Blei et al.

  • Maren, Alianna J. 2019. (Revised 2022.) “Derivation of Variational Bayes Equations.” Themesis Technical Report TR-2019-01v5 (ajm). arXiv1906.08804v5 [cs.NE]. doi:10.48550/arXiv.1906.08804. (Accessed June 8, 2023; available online at Deriv Var. Bayes (v5).)

Very Useful for Understanding Friston

Friston is, God help us all, astonishingly prodigious.

We’re only capturing the most essential papers and the ones that we REALLY NEED in order to understand basic Friston-ese.


That Beal 2003 Dissertation

AJM’s Note: Karl Friston’s 2013 “Life as We Know It” referenced Beal’s 2003 dissertation. Friston’s notation is largely based on Beal’s. Friston introduces the notion of Markov blankets as a key theme in discussing how life (or biological self-organization) necessarily emerges from a “random dynamical system that possesses a Markov blanket.” Beal’s Section 1 discusses both Bayesian probabilities as well as Markov blankets.

In sum: reading Beal’s work is a very useful prerequisite for getting into anything by Friston. It helps that Beal does his best to present material in a tutorial style.

Matthew Beal presented his dissertation at UCL (University College London) in 2003. (I said 2004 in the YouTube; mis-spoke.)

I looked again at his dissertation acknowledgements … no mention of Friston. BUT … same college as where Friston has worked for a very longish time. I’m trusting that there were community conversations … however it happened, Friston references Beal’s dissertation and uses Beal’s notation – in fact, just plunks down his starting equations direct from Beal.

So to understand Friston, we need to start here.

  • Beal, M. 2003. Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London. (Accessed Oct. 13, 2022; pdf.)

Friston & Beal on Markov Blankets (YouTube and Blogposts)

One of the most confoundingly weird things about Friston’s (and Beal’s) notation is that they make the external world (“Psi”) conditionally dependent on the representation of that world.

Very mind-bendy.

I tried to work through it in THIS YouTube:


Blei et al. – Variational Inference Tutorial

AJM’s Note: I refer to the Blei et al. tutorial because it is very solid and lucid. If we’re trying to understand variational ANYTHING (variational Bayes, variational inference, variational autoencoders, etc.); Blei et al. make some comments and offer a perspective that is very complementary to that given by Beal in his 2003 dissertation.

  • Blei, D.M., A. Kucukelbir, and J.D. McAuliffe. 2016. “Variational Inference: A Review for Statisticians.”  arXiv:1601.00670v9 doi:10.48550/1601.00670 (Accessed June 28, 2022; pdf. )

But Wait! There’s MORE! (Some Excellent Active Inference Tutorials)

I ALWAYS like to lead with this very accessible presentation, by Noor Sajid & colleagues:

  • Sajid, Noor, Philip J. Ball, Thomas Parr, and Karl J. Friston. 2020. “Active Inference: Demystified and Compared.”  arXiv:1909.10863v3 [cs.AI] 30 Oct 2020. (Accessed 17 June 2022; https://arxiv.org/abs/1909.10863 )

Thomas Parr’s Book on Active Inference

I haven’t read this book – it’s on my “near-term” list. That said, I’ve looked at it a bit online … and it’s written by Thomas Parr (NOT Friston), so there’s a chance that this is understandable.

  • Parr, Thomas. (More details forthcoming.)

A Monstrously Complex Paper that Purports to be “Simpler”

Maybe it’s just me. And maybe I was a bit too tired when I started reading this. But … even with my background in statistical mechanics, I’ve not found this easy … it’s on my “long-term” reading list.

For the sake of completeness, we include it here:

  • Friston, Karl, Lancelot DaCosta, Noor Sajid, Conor Heins, Kai Ueltzhöffer, Grigorios A. Pavliotis, and Thomas Parr. 2023. “The Free Energy Principle Made Simpler but Not Too Simple.” Physics Reports 1024 (19 June 2023): 1-29. (Accessed May 20, 2024; available online at Physics Reports.)

And a Bit about LeCun

This is the “fluff” piece that I mentioned in the YouTube, regarding LeCun’s 2022 paper. Calling it “fluff” is a bit demeaning … but … I stay with that. The real value is that the authors got inputs from several other voices regarding LeCun’s work – sort of a validation cross-check.

It’s a pay-to-read piece; if you can’t get it, not the worst thing, but if you CAN get it – it’s nice.

  • Heikkilä, Melissa and Will Douglas Heaven. 2022. “Yann LeCun has a Bold New Vision of AI.” MIT Technology Review (Available to those who pay HERE.)

And there’s a brief summary of this already brief article that IS available, for the many-MANY of us who’ve subscribed to Medium.com. Useful if you want this in a very small nutshell.

  • Phair, Meryl. 2022. “MIT Technology Review Features Yann LeCun’s Vision for the Future of Machine Learning.” Medium.com. (Available HERE.)

Also, THIS PRIOR BLOGPOST presented Yann LeCun’s 2020 Plenary at the ICRA conference.

  • Maren, Alianna J. 2023. “Latent Variables.” Themesis Blogpost Series (Dec., 2023). (Accessed May 28, 2024; available HERE.)

LeCun’s very similar Keynote address at the 2020 ICLR conference is HERE.


And a Very Interesting YouTube on Control Theory-Based Analysis of LLM Tokens

:Language models THINK in this high-resolution token space …” – so damn interesting! You’ve got to work w/ a smaller model (GPT-2) to get into this readily, but VERY interesting results that likely transfer to bigger LLMs.
Share via
Copy link
Powered by Social Snap