Big AGI Breakthrough: Leveling the Playing Field

Three weeks ago, the AGI world tilted on its axis.

More specifically, Friston et al. (2024) introduced an evolutionary advance in active inference, which they call renormalising generative models (RGM).

Figure 1. Friston et al. 2024. “From Pixels to Planning: Scale-Free Active Inference.”

This blogpost addresses three key questions:

  • How important are RGMs?
  • What sort of context do we have for assessing RGMs?
  • What is the potential timeframe for RGM influence?

Here’s the YouTube that accompanies this blogpost:

Maren, Alianna J. 2024. “Big AGI Breakthrough: From Active Inference to Renormalising Generative Models.” Themesis, Inc. YouTube Channel (Aug. 21, 2024). (Accessed Aug. 30, 2024; available online at Themesis Inc. YouTube Channel.)


Friston’s evolution of active inference is one of the key methods that will likely prove essential in artificial general intelligence, or AGI. However, despite being intellectually very well-grounded, it has been limited in scope and application. As Friston and colleagues note in their “Pixels to Planning” paper:

However, applications of active inference have been largely limited to small-scale problems.”

Friston et al. 2024. “From Pixels to Planning.”

As examples, an active inference application in Cullen et al. (2018) was to the game of Doom. Sajid et al. (2020) used a “modified FrozenLake OpenAI baseline” for comparing active inference with reinforcement learning. While these applications certainly illustrated the method, they were at small scale.

Key Comparison: Active Inference with Generative AI

We suggest that as a crucial point of reference, we should think of active inference, from its initial inception (approximately 2010) and early work (2013 – 2015) as comparable to Hinton’s invention of the (restricted) Boltzmann machine (Ackley, Hinton, and Sejnowski, 1985).

Figure 2. We can envision the start of generative AI with the 1983 – 1985 invention of Boltzmann machines by Ackley, Hinton, and Sejnowski.

Similarly, we can identify the genesis of active inference with Friston’s work in 2010.

Figure 3. Friston’s introduction of the “Free Energy Principle” in 2010 set the course for active inference.

In comparing the RGM step with evolutions in Boltzmann machines to deep learning, we identify key moments.

Figure 4. Key moments in the evolution of both generative AI (beginning with Boltzmann machines to transformers) and active inference (including the “Free Energy Principle”).

Key comparative points:

  • Twenty-year delta in initiation times: Generative AI began in 1983-1985, with the invention of Boltzmann machines; active inference began in 2010 – so the initiation point was twenty years AFTER the initiation of generative AI.
  • Key breakthroughs: It took Hinton approximately twenty years from the initial Boltzmann machine invention to develop the contrastive divergence algorithm; it has taken Friston and colleagues fourteen years (with Friston joining in 2022 and finding new colleagues) to develop renormalising generative models (RGMs). The developmental timescale is compressed.
  • First large-scale applications: Hinton and colleagues published a study providing a “perspective” from four research groups on deep learning in 2012; this was the same time that Salakhutdinov and Hinton published a major follow-up to their 2016 paper. Friston and colleagues put forth their first efforts at larger-scale applications in their 2024 paper. The timescale in going from theory-to-applications is greatly compressed.

Summary So Far: If we carry the analogy between Friston’s active inference with Hinton’s Boltzmann machines and deep learning, then we can see that Friston’s work has begun much more recently, and has moved more rapidly – both theoretically and in applications.

Renormalization Group Theory: A Common Thread between RGM (Active Inference) and Deep Learning

The key thing that led from (restricted) Boltzmann machines going from an interesting (and beautiful) theoretical development to something with widespread practical applications was the development of contrastive divergence (Hinton, 2002), which enabled deep learning (Hinton and Salakhutdinov, 2006; Salakhutdinov and Hinton, 2012, Hinton et al., 2012).

Deep learning (DL) can be viewed as applying renormalization group (RG) theory to successive layers of neural network latent node layers (Bény, 2013). (In early reviews of this paper, Yann LeCun noted that this relation between deep learning and the renormalization group theory was useful; see OpenReview.) Ro Jefferson, in an excellent blogpost on this topic, identified several authors who had noted the connection between RG and DL (2019).

The essence of renormalization group theory is one of scaling-similarities: systems evidence similar structure across multiple scales of physical size – and (most particularly noted by Friston et al., 2024) time.

RGM: Pivot to a New Direction

In an earlier YouTube, we identified key players in the coming “AGI Wars.”

Maren, Alianna J. 2024. “AGI: The Coming AGI Wars: Players and Positioning.” Themesis, Inc. YouTube Channel (May 15, 2024). (Accessed Aug. 18, 2024; available at Themesis YouTube Channel.)

In a follow-up YouTube, we suggested that the key AGI contenders were (leading with active inference, espoused by Friston et al.) and Meta (leading with JEPA, espoused by LeCun et al.)

Maren, Alianna J. 2024. “Five Key Papers (and Two Viewpoints) for AGI.” Themesis, Inc. YouTube Channel (May 15, 2024). (Accessed Aug. 18, 2024; available at Themesis YouTube Channel.)

In these earlier YouTubes (and their associated blogposts), we suggested that the primary active inference method would be Action Perception Divergence, as offered by Hafner et al. (2020, rev. 2022).

As a side note, that work represented a brief rapprochement between Friston and others at University College London, Danijar Hafner and others at Google’s DeepMind (with Hafner formerly at Google Brain), and Jimmy Ba at the University of Toronto. That work was limited to a conceptual evolution, with no practical applications.

Figure 5. Hafner et al. 2020, rev. 2022. “Action and Perception as Divergence Minimization.”

We still believe that Action Perception Divergence (APD) is important, and will have a role to play in AGI. (In fact, we hope to illustrate an important APD use in the near term.)

HOWEVER, that suggestion – that APD would be the primary means for carrying active inference forward – was premature.

The recent work by Friston et al. (2024), coming out of a fully-fledged team, changes the playing field considerably.

