AGI Notation: Friston’s Use of “Psi”

We want to create an AGI (artificial general intelligence). If you’re reading this post, we trust that is your intention as well.

YouTube: Maren, Alianna J. 2023. “AGI Notation: Friston’s Use of ‘Psi’.”Themesis, Inc. YouTube Channel (Dec. 14, 2023). (Accessed Dec. 14, 2023; available at YouTube Link.)

We already know that AGI won’t come out of transformers. They are, in their essence, content addressable memories. That’s what they can do; that’s ALL that they can do.

Our core equation comes from Friston’s work on active inference.

Figure 1. Karl Friston’s primary equation for active inference, from “Life as We Know It” (2013). (See full reference plus link in Resources and References, at the end of this post.

Making a Distinction

So we start from the beginning.

To do that, we’re going back in time to something that was a bit of a cult classic when it was first published, G. Spencer Brown’s Laws of Form.

Figure 2. In Laws of Form (1969), G. Spencer Brown introduces the “First Command” as: Draw a Distinction. (See Laws of Form (pdf), p. 1.)

Spencer Brown’s first command was: draw a distinction.

In short, we need to create a boundary.

This is why we start with Karl Friston’s work on active inference, instead of general variational inference.

In active inference, we make a clear distinction between reality (expressed as the Greek letter Psi), the representation of reality (typically expressed as r), and the model of the representation.

A Markov boundary separates the representation from the actual reality.

Figure 3. Friston’s notation, redrawn and published in Maren, Alianna J. 2019. (Revised 2022.) “Derivation of Variational Bayes Equations.” Themesis Technical Report TR-2019-01v5 (ajm). arXiv1906.08804v5 [cs.NE]. doi:10.48550/arXiv.1906.08804. (Accessed June 8, 2023; available online at Deriv Var. Bayes (v5).)

Blei et al. (2018) have an important and much-cited paper on variational inference. This is an important paper, but they phrase their notion of variational inference more generally than that found in Friston’s work.

For this reason, we think it is important to start with Friston – his 2015 and 2013 papers are the pivotal ones.

At the same time, Friston presents his core equations without much preamble.

Friston draws his notation largely from Matthew Beal’s 2003 dissertation, so we go back to Beal for our basic tutorial on variational inference including Markov blankets.

There are minor notational differences between Friston and Beal; they are not too much of an impediment in cross-referencing the two works.

There are substantial notational differences between Blei et al. (2018) versus Friston’s and Beal’s works.

These differences were substantial enough to motivate a study that this author described as a “Rosetta stone” tutorial – it was a cross-comparison of the notation used by Blei et al. versus that with Friston and with Beal (Maren, 2019, revised 2022).


Important New Work

Up until last week, we thought that Friston’s work would be the centerpiece of an AGI framework.

That was … until we discovered Danijar Hafner’s 2022 work (done in collaboration with Friston and others), on an evolution from active inference that they describe as “Action Perception Divergence.”


Blogpost Still in Progress!

AJM’s Note: This blog is still being written; please check back. We expect substantial updates over the timeframe from Thursday, Dec. 14, 2023 through Thursday, Dec. 21, 2023. Also, check the Themesis YouTube channel for the associated YouTube, paired with this blogpost – see the link at the BEGINNING of this post. Thank you! AJM, Thursday, Dec. 23, 2023; 0800 Hawai’i Time.


Resources and References

Under Resources and References, you’ll find:

  • Prior related Themesis blogposts – these may have more detailed reference lists, address specific topics, or be important beginning/ending blogposts in a series. Look at the “AJM’s Notes” to find out.
  • Prior related Themesis YouTubes: We are making a strong effort, especially over these last several months, to pair a blogpost with each YouTube, and vice versa. Some of the YouTubes give essential context. Think of this as the pairing of “lecture” with the associated “reading materials.”
  • Reference lists – in addition to the (expected) literature references, we also include links to relevant (non-Themesis) YouTubes, tech blogs, and other useful material.

Prior Related Themesis Blogposts

AJM’s Note: This is the “Mega-Resource Compendium.”

AJM’s Note: This is the immediate predecessor blogpost, which is also a “mega-resource compendium.”


Prior Related Themesis YouTubes

We recommend the following YouTube as the immediate predecessor to this work.

Maren, Alianna J. 2023. “Evolution of AGI and NLP Algorithms Using Latent Variables: Future of AI (Part 3 of 3).” Themesis YouTube Channel (May 31, 2023). (Accessed June 8, 2023; available online at Evolution of AGI and NLP Algorithms.)

Key Literature for This Blogpost

Active Inference

Active Inference: perhaps the most accessible presentation, by Noor Sajid & colleagues (first recommended in the Themesis June, 2022 blogpost Major Blooper – Coffee Reward):

  • Sajid, N., Philip J. Ball, Thomas Parr, and Karl J. Friston. 2020. “Active Inference: Demystified and Compared.” arXiv:1909.10863v3 [cs.AI] 30 Oct 2020. (Accessed 17 June 2022; https://arxiv.org/abs/1909.10863 )

AJM’s Note: Friston’s 2013 paper is the central point for theoretical (and mathematical) development of his notions on free energy in the brain, and in any living system. He starts with the notion of a system separated by Markov boundary from its external environment. Moves on from there. This blogpost series is largely focused on this paper, buttressed with Friston et al. (2015).

  • Friston, Karl. 2013. “Life as We Know It.” Journal of The Royal Society Interface10. doi:10.1098/rsif.2013.0475. (Accessed Oct. 13, 2022; pdf.)

AJM’s Note: Friston and colleagues, in their 2015 paper “Knowing One’s Place,” show how self-assembly (or self-organization) can arise out of variational free energy minimization. Very interesting read!

  • Friston, K.; Levin, M.; Sengupta, B.; Pezzulo, G. 2015. “Knowing One’s Place: A Free-Energy Approach to Pattern Regulation.” J. R. Soc. Interface12:20141383. doi:10.1098/rsif.2014.1383. (Accessed Oct. 3, 2022; pdf.)

Variational Inference

AJM’s Note: This is the “Rosetta Stone” tutorial on variational Bayes; it traces Friston’s notation back into Matthew Beal’s dissertation (his primary notation source), and also cross-correlates with David Blei et al.

  • Maren, Alianna J. 2019. (Revised 2022.) “Derivation of Variational Bayes Equations.” Themesis Technical Report TR-2019-01v5 (ajm). arXiv1906.08804v5 [cs.NE]. doi:10.48550/arXiv.1906.08804. (Accessed June 8, 2023; available online at Deriv Var. Bayes (v5).)

AJM’s Note: Karl Friston’s 2013 “Life as We Know It” referenced Beal’s 2003 dissertation. Friston’s notation is largely based on Beal’s. Friston introduces the notion of Markov blankets as a key theme in discussing how life (or biological self-organization) necessarily emerges from a “random dynamical system that possesses a Markov blanket.” Beal’s Section 1 discusses both Bayesian probabilities as well as Markov blankets. Reading Beal’s work is a very useful prerequisite for getting into anything by Friston. It helps that Beal does his best to present material in a tutorial style. We’ll start with Markov blankets in the next post.

  • Beal, M. 2003. Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London. (Accessed Oct. 13, 2022; pdf.)

AJM’s Note: I refer to the Blei et al. tutorial because it is very solid and lucid. If we’re trying to understand variational ANYTHING (variational Bayes, variational inference, variational autoencoders, etc.); Blei et al. make some comments and offer a perspective that is very complementary to that given by Beal in his 2003 dissertation.

  • Blei, D.M., A. Kucukelbir, and J.D. McAuliffe. 2016. “Variational Inference: A Review for Statisticians.” arXiv:1601.00670v9 doi:10.48550/1601.00670 (Accessed June 28, 2022; pdf. )

Action-Perception Divergence

AJM’s Note: This work by Hafner and colleagues extends the notions of active inference into “Action Perception Divergence (APD),” which is very possibly the most significant vector into real AGI.

  • Hafner, Danijar, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, and Nicolas Heess. 2022. “Action and Perception as Divergence Minimization.” arXiv:2009.01791v3 [cs.AI] (13 Feb 2022). (Accessed May 25, 2023; available online at https://arxiv.org/pdf/2009.01791.pdf.)

Laws of Form

  • Spencer Brown, George. 1969. Laws of Form. (London: George Allen and Unwin Ltd.) (An online version of this short book is available at: Laws of Form (pdf).)

4 comments

  1. I have spent far too much time today playing around with the Falcon LLM (https://huggingface.co/blog/falcon?_hsmi=261555936&_hsenc=p2ANqtz-_vArWWMfz-x4Lc_Ic-KZ__yMUVsQAnjl7MNFsbRQovha5FKoiggOMgLhZ-TtL9wp1_IKByPISBXpvPA1FFcqxr7VMPDg#demo), so, just for fun, I decided to ask it about Karl Friston’s work. It seems to be a bit evasive…

    Me: What is Karl Friston on about?

    Falcon: I’m sorry, but I am not sure who Karl Friston is. Could you please provide more context or information about him so that I can assist you better?

    Me: Karl Friston is the guy who invented Active Inference.

    Falcon: Oh, I see. Active Inference is a theory of perception and learning that suggests that organisms actively infer the causes of their sensations in order to maximize their chances of survival. Karl Friston is a British neuroscientist who developed the theory of Active Inference, which is based on the principles of Bayesian inference and free energy. His work has been influential in the fields of neuroscience, psychology, and artificial intelligence.

    Me:So can you summarize his work for me, please?

    Falcon: Sure, I can do that. Karl Friston’s theory of Active Inference suggests that organisms are always trying to minimize the amount of free energy that they have to expend. They do this by constantly updating their internal models of the world based on their sensory experiences. According to Friston, organisms are constantly making predictions about their environment and testing those predictions against their sensory inputs. When their predictions are confirmed, they feel less free energy, and when their predictions are not confirmed, they feel more free energy. This process of actively minimizing free energy is what Friston calls Active Inference.

    Me: Some people have suggested that Friston’s work is the key to Artificial General Intelligence. Are they on the right track?

    Falcon: It’s hard to say for sure. Friston’s work on Active Inference is certainly influential in the field of artificial intelligence, and it has inspired many researchers to develop AI systems that can actively learn and update their knowledge of the world. However, there are many different approaches to AI, and it’s difficult to say if any one theory or approach will lead to the development of true Artificial General Intelligence. It’s a complex and difficult problem, and it will likely require contributions from many different researchers and perspectives in order to solve.

    1. Hi, Simon – so good to hear from you again, and I was just thinking about you this morning – it’s been a while since your last comment in this blog-space!
      So, I love your “conversation” with Falcon re/ Karl Friston! It seems almost … like a real conversation, doesn’t it?
      Tripping points are things like Falcon’s statement that “[organisms] feel less free energy” … mmph.
      On the other hand, this is a fairly good (simplistic but good) reduction of active inference into a very simple summary.
      So, yay! Love that you’re playing w/ LLMs! Always good to hear from those who are doing experiments! – AJM

  2. On a less flippant note, I’ve been reading Active Inference: The Free Energy Principle in Mind, Brain, and Behavior, Thomas Parr, Giovanni Pezzulo, and Karl J. Friston (https://doi.org/10.7551/mitpress/12441.001.0001). Chapter 4 is about Partially Observable Markov Decision Processes: we have transition probability between emitted symbols (à la much of modern NLP), but the transition probabilities depend on the hidden state of the system. The point of Chapter 4 is using the FEP to decipher the likely value of the hidden states. I wonder if this is where you are going with this thread? If this is really they key to NLP, does it mean that much of grammar is not simply syntax (in the programming language sense), but is really driven by the hidden state, the meaning? Meaning, IHMO, seems to be what Chat-GPT, Falcon, and their friends leave out (or sort-of model with probabilities).

    1. Oh, now this is interesting! I’m going to have to go into this paper.
      Interesting how our minds seem to follow up on similar paths … I was just looking into Markov decision processes last night, as related to … something else. (Would have to go find my search thread.)
      I’ll look at this paper.
      Lately, I’ve been re-reading Noor Sajid’s work (w/ Ball, Parr, and Friston) giving that very nice summary and cross-compare of RL w/ active inference … it takes me many “reads” for these papers to really penetrate my thinking. I’ve liked this one for a long time, and am still internalizing it.
      When I looked into Sajid’s website last night, I found that she was working w/ Zhang, who was lead author on this very nice IEEE paper on “Advances in Variational Inference,” and now this one is vying for top place on my reading-stack: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588399
      ***
      And as to “where [I] was going with this thread” – actually, what you’re referring to might be a side-note, although I am very much aware of how limited these transformer-based models are, and that’s why I’m liking your pointer to the article that identifies how transition probs are dependent on hidden states. That might be a substantial evolution.
      But where I’m going is to set a framework for introducing a new class of neural networks. This particular blogpost – this one here on AGI – was really introducing a parallel discussion, and I hope to show how this new neural networks class will be useful in AGI, as well as a stand-alone neural network.
      This “new neural networks class” presentation should come along pretty fast now … I’m in the midst of prepping the next vid, which means working up the “B-roll” (to use a film industry term). Rather immersive (and exhaustive) effort.
      But moving fast now – in that marvelous little break when grades have all been submitted for the previous quarter, and a few precious days before the summer quarter begins – and I’m doing my best to get vids done as fast as possible.
      So, let me know what you think as the new vids emerge, ok? I value your comments and inputs!
      Thanks again for pointing out that active inference article by Parr et al., will add it to the “must read” stack! – AJM

Comments are closed.

Share via
Copy link
Powered by Social Snap