Three weeks ago, the AGI world tilted on its axis.
More specifically, Friston et al. (2024) introduced an evolutionary advance in active inference, which they call renormalising generative models (RGM).
This blogpost addresses three key questions:
- How important are RGMs?
- What sort of context do we have for assessing RGMs?
- What is the potential timeframe for RGM influence?
Here’s the YouTube that accompanies this blogpost:
Background
Friston’s evolution of active inference is one of the key methods that will likely prove essential in artificial general intelligence, or AGI. However, despite being intellectually very well-grounded, it has been limited in scope and application. As Friston and colleagues note in their “Pixels to Planning” paper:
However, applications of active inference have been largely limited to small-scale problems.”
Friston et al. 2024. “From Pixels to Planning.”
As examples, an active inference application in Cullen et al. (2018) was to the game of Doom. Sajid et al. (2020) used a “modified FrozenLake OpenAI baseline” for comparing active inference with reinforcement learning. While these applications certainly illustrated the method, they were at small scale.
Key Comparison: Active Inference with Generative AI
We suggest that as a crucial point of reference, we should think of active inference, from its initial inception (approximately 2010) and early work (2013 – 2015) as comparable to Hinton’s invention of the (restricted) Boltzmann machine (Ackley, Hinton, and Sejnowski, 1985).
Similarly, we can identify the genesis of active inference with Friston’s work in 2010.
In comparing the RGM step with evolutions in Boltzmann machines to deep learning, we identify key moments.
Key comparative points:
- Twenty-year delta in initiation times: Generative AI began in 1983-1985, with the invention of Boltzmann machines; active inference began in 2010 – so the initiation point was twenty years AFTER the initiation of generative AI.
- Key breakthroughs: It took Hinton approximately twenty years from the initial Boltzmann machine invention to develop the contrastive divergence algorithm; it has taken Friston and colleagues fourteen years (with Friston joining Verses.ai in 2022 and finding new colleagues) to develop renormalising generative models (RGMs). The developmental timescale is compressed.
- First large-scale applications: Hinton and colleagues published a study providing a “perspective” from four research groups on deep learning in 2012; this was the same time that Salakhutdinov and Hinton published a major follow-up to their 2016 paper. Friston and colleagues put forth their first efforts at larger-scale applications in their 2024 paper. The timescale in going from theory-to-applications is greatly compressed.
Summary So Far: If we carry the analogy between Friston’s active inference with Hinton’s Boltzmann machines and deep learning, then we can see that Friston’s work has begun much more recently, and has moved more rapidly – both theoretically and in applications.
Renormalization Group Theory: A Common Thread between RGM (Active Inference) and Deep Learning
The key thing that led from (restricted) Boltzmann machines going from an interesting (and beautiful) theoretical development to something with widespread practical applications was the development of contrastive divergence (Hinton, 2002), which enabled deep learning (Hinton and Salakhutdinov, 2006; Salakhutdinov and Hinton, 2012, Hinton et al., 2012).
Deep learning (DL) can be viewed as applying renormalization group (RG) theory to successive layers of neural network latent node layers (Bény, 2013). (In early reviews of this paper, Yann LeCun noted that this relation between deep learning and the renormalization group theory was useful; see OpenReview.) Ro Jefferson, in an excellent blogpost on this topic, identified several authors who had noted the connection between RG and DL (2019).
The essence of renormalization group theory is one of scaling-similarities: systems evidence similar structure across multiple scales of physical size – and (most particularly noted by Friston et al., 2024) time.
RGM: Pivot to a New Direction
In an earlier YouTube, we identified key players in the coming “AGI Wars.”
In a follow-up YouTube, we suggested that the key AGI contenders were Verses.ai (leading with active inference, espoused by Friston et al.) and Meta (leading with JEPA, espoused by LeCun et al.)
In these earlier YouTubes (and their associated blogposts), we suggested that the primary active inference method would be Action Perception Divergence, as offered by Hafner et al. (2020, rev. 2022).
As a side note, that work represented a brief rapprochement between Friston and others at University College London, Danijar Hafner and others at Google’s DeepMind (with Hafner formerly at Google Brain), and Jimmy Ba at the University of Toronto. That work was limited to a conceptual evolution, with no practical applications.
We still believe that Action Perception Divergence (APD) is important, and will have a role to play in AGI. (In fact, we hope to illustrate an important APD use in the near term.)
HOWEVER, that suggestion – that APD would be the primary means for carrying active inference forward – was premature.
The recent work by Friston et al. (2024), coming out of a fully-fledged Verses.ai team, changes the playing field considerably.
{* Blogpost in progress, please check back later. AJM, Sunday, Aug. 18, 2024; 10PM Hawai’i time. *}
Resources and References
- Ackley, David H., Hinton, G.E., and Sejnowski, T.J. (1985). “A Learning Algorithm for Boltzmann Machines.” Cognitive Science 9, no. 1 (January–March): 147-169. https://doi.org/10.1016/S0364-0213(85)80012-4.
- Bény, Cédric. 2013. “Deep Learning and the Renormalization Group.” arXiv:1301.3124v4 [quant-ph] 13 Mar 2013. (Accessed Aug. 18, 2024; available online at https://arxiv.org/pdf/1301.3124.
- M. Cullen, B. Davey, K.J. Friston, and R.J. Moran. 2018 “Active inference in OpenAI Gym: A Paradigm for Computational Investigations into Psychiatric Illness,” Biological Psychiatry CNNI 3(9):809-818 (September, 2018). doi:10.1016/j.bpsc.2018.06.010.
- Friston, K. 2010. “The Free-Energy Principle: A Unified Brain Theory?” Nat. Rev. Neurosci. 11:127–138. doi:10.1038/nrn2787.
- Friston, K. 2013. “Life as We Know It.” Journal of The Royal Society Interface 10(86).
- Friston, K., M. Levin, B. Sengupta, and G. Pezzulo. 2015. “Knowing One’s Place: A Free-Energy Approach to Pattern Regulation.” J. R. Soc. Interface, 12:20141383. doi:10.1098/rsif.2014.1383. (Available online at: http://dx.doi.org/10.1098/rsif.2014.1383.)
- Friston, Karl, Conor Heins, Tim Verbelen, Lancelot Da Costa, Tommaso Salvatori, Dimitrije Markovic, Alexander Tschantz, Magnus Koudahl, Christopher Buckley and Thomas Parr. 2024. “From Pixels to Planning: Scale-Free Active Inference.” arXiv:2407.20292v1 [cs.LG] 27 Jul 2024. doi:10.48550/arXiv.2407.20292. (Accessed Aug. 10, 2024; available online at https://arxiv.org/pdf/2407.20292.)
- Friston, Karl, Lancelot Da Costa, Noor Sajid, Conor Heins, Kai Ueltzhoffer, Grigorios A. Pavliotis and Thomas Parr. 2023. “The Free Energy Principle Made Simpler but Not Too Simple.” arXiv:2201.06387v3 [cond-mat.stat-mech]. doi:10.48550/arXiv.2201.06387. (Accessed Aug. 10, 2024; available online at https://arxiv.org/pdf/2201.06387.)
- Goldenfeld, Nigel. 1972. Lectures On Phase Transitions And The Renormalization Group (Frontiers in Physics) (Boston, MA: Addison-Wesley.)
- Hafner, Danijar, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston and Nicolas Heess. 2020, rev. 2022. “Action and Perception as Divergence Minimization.” arXiv:2009.01791v3 [cs.AI] (13 Feb 2022). doi:10.48550/arXiv.2009.01791. (Accessed Aug. 10, 2024; available online at https://arxiv.org/pdf/2009.01791.
- Hinton, Geoffrey E. 2002. “Training Products of Experts by Minimizing Contrastive Divergence.” Neural Computation 14(8) (August, 2002):1771-800. doi: 10.1162/089976602760128018 (Accessed April 3, 2022.) https://www.cs.toronto.edu/~hinton/absps/tr00-004.pdf
- Hinton, Geoffrey, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury. 2012. “Deep Neural Networks for Acoustic Modeling in Speech Recognition: Four Research Groups Share Their Views.” IEEE Signal Processing Magazine 29 (6) (Nov. 2012); 82-97. doi: 10.1109/MSP.2012.2205597. (Accessed Jan. 31, 2023; available at https://www.cs.toronto.edu/~hinton/absps/DNN-2012-proof.pdf.)
- Hinton, G.E., and R. R. Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science 313(5786) (July 28, 2006): 504-507. doi: 10.1126/science.1127647. (Accessed April 5, 2022, available at https://www.cs.toronto.edu/~hinton/science.pdf.)
- Jefferson, Ro. 2019. “Deep Learning and the Renormalization Group.” Ro’s Blog (Aug. 4, 2019). (Accessed Aug. 18, 2024; available online at Ro’s Blog.)
- Parr, Thomas, Giovanni Pezzulo, and Karl J. Friston. 2022. “Active Inference: The Free Energy Principle in Mind, Brain, and Behavior.” (Cambridge, MA: The MIT Press). doi:10.7551/mitpress/12441.001.0001
- Sajid, Noor, Philip J. Ball, Thomas Parr, and Karl J. Friston. 2020. “Active Inference: Demystified and Compared.” arXiv::1909.10863v3 [cs.AI] 30 Oct 2020. (Accessed Aug. 10, 2024; available online at https://arxiv.org/pdf/1909.10863.)
- Salakhutdinov, Ruslan, and Geoffrey Hinton. 2012. “An Efficient Learning Procedure for Deep Boltzmann Machines.” Neural Computation 24(8) (August, 2012): 1967–2006. doi: 10.1162/NECO_a_00311. (Accessed April 3, 2022; available at https://www.cs.cmu.edu/~rsalakhu/papers/neco_DBM.pdf.)