Variational Free Energy and Active Inference: Pt 4

Today, we interpret the q(Psi | r) and p(Psi, s, a, r | m) in Friston’s (2013) “Life as We Know It” (Eqn. 2.7) and Friston et al. (2015) “Knowing One’s Place” (Eqn. 3.2). This discussion moves forward from where we left off in the previous post, identifying how Friston’s notation builds on Beal’s (2003) Ph.D. dissertation on “Variational Influence.”

To quickly recap, here is Figure 4 from last week’s post, posted here as our new Figure 1. In this figure, we juxtapose the notations offered by both Friston (2013) and Beal (2003).

Figure 1. Juxtaposing the notation used by Friston (2013) and Beal (2003). First presented in last week’s post as Figure 4.

AJM’s Note: There is an extensive reference list at the end of this post. The references cited directly in this post are identified in “This Week’s Read-Alongs” – the first portion of Resources and References.

We notice that the function F(q(x),theta) in Beal’s notation is formally the same as Friston’s function F(s,a,lambda) – the difference is that Friston’s has a negative sign in front. (See the last post for more discussion on this.)

Our goal – over this entire blogpost series – is to understand and interpret Eqn. 2.7 in Friston (2013), “Life as We Know It.” To recap, we present this free energy equation, which we originally presented in Figure 1 from the first blogpost in this series.

Figure 2 (earlier presented as Figure 1 in the first blogpost in this series). Lemma 2.1, presented as Eqn. 2.7, from Karl Friston, 2013, “Life as We Know It.” (See full citation and link in the Resources at the end.)

Also, as we mentioned in that first blogpost – and in the two subsequent ones – we’re reinforcing our studies with Matthew Beal’s 2003 Ph.D. dissertation as a primary resource, because he gives very clear and lucid explanations, and appears to be the reference-source for Friston’s notation.


It’s All a Matter of Perspective

The most important thing that will let us understand and interpret Friston’s Eqn. 2.7 is that we locate our perspective inside the Markov blanket surrounding the “representation” r. (The “representation” system is the internal system, and is denoted as lambda in Friston (2013) and as r in Friston et al. (2015).)

Outside of a dog, a book is a man’s best friend … inside of a dog, it’s too dark to read.”

Evolution of this aphorism – erroneously attributed to Groucho Marx – is in The Quote Investigator.

The whole thing comes down to understanding q(Psi | r).

See Friston’s explanation of his notation q(Psi | lambda) in the following Figure 3 (presented as Figure 7, the last figure in the previous blogpost – this is where we left off last week).

Figure 3 (earlier presented as Figure 7 in the previous blogpost in this series). Friston (2003) and Friston et al. (2015) explain the meaning of q(Psi | lambda) as representing the posterior density over external states.

Friston states that q(Psi | r) is the “internal states as parameterizing some arbitrary (variational) density or Bayesian beliefs q(Psi | r) about external states.”

That is, q is our set of beliefs, and they are about the external states Psi, based on (conditioned by) what we know about the internal states r. (We’ve dropped the “tilde” notation that Friston et al. introduced in 2015.)

So the important thing is that we have a system in which the external states are separated from the internal states (the “representation”) via a Markov blanket.

We illustrate this in the following Figure 4, taken from Maren (2019), and adapted from Friston (2003) and Friston et al. (2015).

Figure 4. Friston (2013, and Friston et al. 2015) presents a system in which the external states Psi are separated from the internal (representational) states r (and in Friston (2013) as lambda) by sensory and active elements s and a, respectively. Figure taken from Maren (2019), and based on Friston’s works (op. cit.).

A Totally Frivolous Example

So admittedly, I was having the hardest time understanding q(Psi |r).

Then, my mind sort of blurry with the whole thing, I lay down for a nap, and this totally frivolous example came to mind – absolutely out of nowhere.

Imagine a large country estate. The estate used in the TV series Downton Abbey is a good example.

Figure 5. Highclere Castle, Hampshire, England. Photo by Martin John Bishop, uploaded to Wiki Creative Commons on 9 June 2006.

When we take a “God’s eye” view of the 5000-acre Highclere Castle estate, we see the grounds as well as the castle itself. This is illustrated in Figure 6.

Figure 6. “Where You can Walk” at Downton Abbey (Highclere Castle estate map).

With our “God’s eye perspective,” we see all of the surrounding lawns, woods, outlying buildings and roads, etc. This is the external system, or Psi.


“Inside of a Dog …”

We’ll play a little perspectives game.

If we are able to take a God’s-eye view of things, we see the entire estate, as shown in the previous figure.

However, the castle walls are our Markov blanket.

We’ll say that if we are in a position to see Psi, the castle grounds, then we cannot see inside Downton Abbey itself.

Conversely, if we are inside of the Abbey, we can’t see directly into the external states or Psi.

Instead, we must rely on our sensing apparatus, s.

So let’s say that when Downton Abbey (Highclere Castle) was originally constructed, the walls were still pretty open – and the castle itself became a representation of the external states. That is, the kitchen was a representation of the foods produced in the surrounding area, the drawing room was a representation of the social environs, etc.

As the castle became completed, the walls (the Markov blanket) became firmer, and the only way in which the people inside the Abbey would know the state of the outside was through some sensory apparatus.

Figure 7. Cover art of John Lunn,’s Downton Abbey soundtrack. Reproduction covered under Fair Use.

To elaborate: suppose that the butler, Mr. Carson, is very busy with managing the Abbey. Likewise, the housekeeper, Mrs. Hughes, is also very busy managing household details. Neither of them get out of the Abbey much. Instead, they rely on information coming in from the outside to form their own representation of the external states, or Psi.

This is where we get the notion of q(Psi | r).

We’ve shifted our perspective from God’s-eye to inside the Abbey; that is, we can see all of the inside (r, or lambda, depending on whether you’re reading Friston et al. (2015) or Friston (2013)).

However, to “see” outside the Abbey – we need to rely on sensory information, because we no longer have that God’s-eye view.

So let’s try a more specific example.

I love the scene in Downton Abbey, S1, E.7, when Mrs. Patmore, recovering from cataract surgery, finds an unexpected ally in the visiting cook, who insists that the cook ALWAYS gets to order the pantry supplies – thus helping to usurp a prerogative that the housekeeper, Mrs. Hughes, had taken.

Assessing supplies is an r activity – it’s monitoring the current representation. Knowing what is coming in on a regular basis, and getting information about what is available, is an s (sensing) activity – it’s sensory information. Ordering new pantry items is an a activity; it’s an action on the environment. Together, s and a contribute not only to updating the internal representation r, but also to updating a probability estimate of the external system, Psi.

That is, this flow of goods and information contributes to building q(Psi | r).

The term q(Psi | r) denotes a (set of) hidden variable(s).

From inside the Abbey, one cannot directly see the estate and the surrounding farms. The external state is now “hidden.” Instead, information has to come in from the outside; and there can even be new sensory/action mechanisms introduced, as happens in Downton Abbey S1E7, when a phone is installed at the Abbey.

As news comes in about the assassination of Archduke Ferdinand, and the members of the Abbey realize that England is going to war (WWI), this is information coming in that updates q(Psi | r). That is, from inside the Abbey, we might infer from war preparations that the cost of food will go up. We might infer that the Abbey will be short-staffed, as young men leave to fight in the war. We don’t directly observe the outside, but we do have incoming information that lets us update our understanding of the external states, or q(Psi | r).


Back to the Equations – and Defining p

We’ve just clarified – via a ridiculously frivolous example – how we understand the external states Psi as “parameterizing some arbitrary (variational) density or Bayesian beliefs q.” (Specifically, q(Psi | r).)

Now, we turn our attention to the second term in the K-L divergence equation that Friston has composed to describe a Markov-blanketed system; p(Psi, s, a, r | m). (Refer back to Figure 1.)

In this term, p is a model; it is a model with parameter (set) m. (If you’re reading Beal (2003), the parameter set is theta.)

Further, we start off with a model of “everything in the entire universe.” That is, our model p is of the external system Psi, the sensing and active modalities s and a, and the internal representation itself, r.

Further, even though we have previously enjoined a “parameterized Bayesian belief (set) q(Psi |r),” we are now thinking of the full set of variables Psi, s, a, and r as being independent of each other – or at least, independent enough that we can envision them as being separately and distinctively described by our model p(Psi, s, a, r | m).

We can take a big gulp and swallow this, because our next step is to separate the model probability p into distinct probabilities; p(Psi | m) and p(s, a, r | m).

This is the crucial step that lets us move forward into the variational free energy equation.


From Kullback-Leibler to Free Energy

We are just on the (quivering, tumultuous) brink of that which we’ve longed for … for the past two months of blog-posting. (And also, clarifying Friston enough to FINALLY do an update on the 2019 arXiv paper.)

Here’s what happens next:

We can rewrite the Kullback-Leibler equation with the separation of the model probabilities into two terms; p(Psi | m) and p(s, a, r | m).

We can then manipulate the K-L divergence in two different ways.

One of those manipulations leads to an equation that is formally correct, but is (according to all pundits) notoriously difficult (and/or time-consuming; same thing) to compute.

The other version leads to something that is more computationally tractable. THIS is what gives us our variational free energy.

We’re going to summarize the highlights here; the details are in Maren (2019), an arXiv publication which will be updated shortly. (And then, all the blogposts in this and the prior series will be updated to reflect the update.)

An Equations-Infographic: The Variational Free Energy

The following Figure 8 shows how this decomposition (in two possible ways) of the original K-L divergence can be accomplished.

Figure 8. The original Kullback-Leibler divergence of Friston (2013) Eqn. 2.7 can be refactored two different ways. One of them (the left-hand-side) is more computable than the other. Figure from Maren (2019).

Not that I want to cut things off, just as we’re starting to have fun. But we’re going to “go to press” with this, and pick up next week on interpreting these two different “refactored” versions of the original K-L divergence. We’ll identify which is more computationally tractable, and why. We’ll also start an example that leads us to active inference.


It’s been lovely to share this journey with you. The next blogposts in this series will take us through an active inference example – using something other than Downton Abbey!


Use that Opt-In, described below, to make sure you’re informed of all future posts and YouTubes – and I’ll see you soon!

Alianna J. Maren, Ph.D.

Founder and Chief Scientist, Themesis, Inc.


How to Stay Informed

This is the fourth in the blogpost series on Variational Free Energy and Active Inference. We’re anticipating weekly posts, and a few YouTubes as well. To be informed as soon as these blogs / YouTubes come out, please do an Opt-In with Themesis.

To do this, go to www.themesis.com/themesis.

(You’re on www.themesis.com right now. You could just hit that “About” button and you’ll be there.)

Scroll down. There’s an Opt-In form. DO THAT.

And then, please, follow through with the “confirmation” email – and then train yourself and your system to OPEN the emails, and CLICK THROUGH. That way, you’ll be current with the latest!

Thank you! – AJM



Resources & References

This Week’s Read-Along’s

The “Resources & References” list is, alas, getting long.

To get a focus, we’re identifying the material specific to this week’s post, and further identifying where they are – “Bunny Trails,” “Blue Squares,” or “Black Diamonds.”

  • Beal, M. 2003. Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis. (Blue Squares)
  • Friston, K. 2013. “Life as We Know It.” (Black Diamonds)
  • Friston, K.; Levin, M.; Sengupta, B.; Pezzulo, G. 2015. “Knowing One’s Place: A Free-Energy Approach to Pattern Regulation.” (Black Diamonds)
  • Maren, Alianna J. 2019 (updated 2022). “Derivation of the Variational Bayes Equations.” Themesis, Inc. Technical Report TR-2019-01v5 (ajm); arXiv:1906.08804v4 [cs.NE] 30 Jul 2019. (V4 accessed Nov. 1, 2022; https://arxiv.org/pdf/1906.08804.pdf.)

Following the protocol that we introduced in last week’s blogpost, we are now grouping resources & references according to difficulty level – from Bunny Trails to Blue Squares to Double Black Diamond.

Almost all of Friston’s works are double black.

I’m putting a couple of early blogposts about Friston in at the Bunny Trails level, as well as some bio & auto-bio materials (thank you, Simon!) and a couple of (relatively) easy-to-read articles (thank you, Biswa and Simon!) are in the Blue Squares section.


Bunny Trails – Decent Introductory Source Stuff

CAPTION: The Bunny. We decided (see Kullback-Leibler 2.5/3, or “Black Diamonds,” that we needed trail markings for the resource materials.

Bunny slopes: the introductory materials. Some are web-based blogposts or tutorials – good starter sets. All Themesis blogposts and YouTubes in this category.


Bio and Autobio Articles re/ Friston

AJM’s Note: A big shout-out and thank you to Simon Crase, who suggested these two articles in a comment to last week’s post – much appreciated, Simon!

Friston – Semi-Bio:

  • Fridman, Lex, interviewing Karl Friston. (2020). “You Are Your Own Existence Proof (Karl Friston) | AI Podcast Clips with Lex Fridman.” Lex Fridman YouTube Channel series (July 1, 2020). (Accessed Oct. 18, 2022; https://www.youtube.com/watch?v=k8Zomsf3uBI)

Friston – Semi-Auto-Bio:


Related Themesis Blogposts

The Variational Free Energy and Active Inference Series:

In the third post in this series, we connect Friston’s notation to that of Beal (2003):

In the second post in this series, we identified that Friston’s (and previously, Beal’s) use of the P and Q notation in the Kullback-Leibler divergence was reversed from how most authors use those variables:

In the first post in this series, we presented our overarching goals (re-presented in this post as Figure 2), and provided interpretations for two key terms, ergodicity and surprise.

The Kullback-Leibler/Free Energy/Variational Inference Series; just the kick-off post for the entire thing:

Prior (much older) Blogpost on the Kullback-Leibler Divergence:

Older posts on Friston:

  • Maren, Alianna J. 2019. “Interpreting Karl Friston: Round Deux.” Alianna J. Maren blogpost series; www.aliannajmaren.com (July 31, 2019). (Accessed Oct. 10, 2022; https://www.aliannajmaren.com/2019/07/31/interpreting-karl-friston-round-deux/ )
  • Maren, Alianna J. 2019. “How to Read Karl Friston (In the Original Greek).” Alianna J. Maren blogpost series; www.aliannajmaren.com (July 27, 2017). (Accessed Oct. 10, 2022; http://www.aliannajmaren.com/2017/07/27/how-to-read-karl-friston-in-the-original-greek/ )

The following (2016) blogpost is useful mostly because it has some links to good tutorial references:


Related Themesis YouTubes

  • For prior Themesis YouTubes on statistical mechanics as it relates to AI, particularly the notion of the “Donner Pass(es) of AI,” see the Resources & References in the prior blogpost of this series.

CAPTION: Intermediate Reading/Viewing: Requires preliminary knowledge of both concepts and notation. Not trivially easy, but accessible – often advanced tutorials.

Matthew Beal and David Blei

AJM’s Note: Karl Friston’s 2013 “Life as We Know It” referenced Beal’s 2003 dissertation. Friston’s notation is largely based on Beal’s. Friston introduces the notion of Markov blankets as a key theme in discussing how life (or biological self-organization) necessarily emerges from a “random dynamical system that possesses a Markov blanket.” Beal’s Section 1 discusses both Bayesian probabilities as well as Markov blankets. Reading Beal’s work is a very useful prerequisite for getting into anything by Friston. It helps that Beal does his best to present material in a tutorial style. We’ll start with Markov blankets in the next post.

  • Beal, M. 2003. Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London. (Accessed Oct. 13, 2022; pdf.)

AJM’s Note: I refer to the Blei et al. tutorial because it is very solid and lucid. If we’re trying to understand variational ANYTHING (variational Bayes, variational inference, variational autoencoders, etc.); Blei et al. make some comments and offer a perspective that is very complementary to that given by Beal in his 2003 dissertation.

  • Blei, D.M., A. Kucukelbir, and J.D. McAuliffe. 2016. “Variational Inference: A Review for Statisticians.” arXiv:1601.00670v9 doi:10.48550/1601.00670 (Accessed June 28, 2022; pdf. )

Karl Friston & Colleagues

AJM’s Note: ANYTHING written by Friston is “double black diamond.” That said, a few papers are a BIT more accessible than others.

AJM’s Note: Friston’s 2010 paper is largely conceptual. (Meaning, equations-free.) Not to be blown off; he’s establishing the context for future works.

  • Friston, K. 2010. “The Free-Energy Principle: A Unified Brain Theory?” Nature Reviews Neuroscience11 (2), 127-138. (Accessed Oct. 13, 2022; online access.)

AJM’s Note: Thanks and a shout-out to Biswa Sengupta, who reminded me of this excellent (and fairly readable) 2016 paper.

  • Sengupta, Biswa, Arturo Tozzi, Gerald K. Cooray, Pamela K. Douglas, and Karl J. Friston. 2016. “Towards a Neuronal Gauge Theory.” PLoS Biol 14(3): e1002400. doi:10.1371/journal.pbio.1002400. (Accessed Oct. 18, 2022; pdf)

AJM’s Note: Thanks and a shout-out to Simon Crase, who pointed out this 2022 article by Friston et al. that promises to be a much more accessible read – Simon’s recommendation came in just as I was prepping this post, so I haven’t read this yet, much less had a chance to integrate references into this post – look for the integration and refs starting in the next post in this series.

  • Friston, Karl, Lancelot Da Costa, Noor Sajid, Conor Heins, Kai Ueltzhöffer, Grigorios A. Pavliotis, and Thomas Parr. 2022. “The Free Energy Principle Made Simpler but Not Too Simple.” arXiv:2201.06387v2 [cond-mat.stat-mech] (Jan. 28, 2022) doi:arXiv.2201.06387. (Accessed Oct. 19, 2022; pdf)

Kullback & Leibler – Orig. Work

AJM’s Note: Kullback and Leibler. Their original paper. The one that started all of this.



CAPTION: Double Black Diamond: Expert-only! These books, tutorials, blogposts, and vids are best read and watched AFTER you’ve spent a solid time mastering fundamentals. Otherwise, a good way to not only feel lost, but hugely insecure.

Friston & Co.

AJM’s Note: This Friston (2005) paper is his most-cited paper for the his personal genesis of active inference, and seems to be the earliest where he presents a fully-fleshed notion of how “both inference and learning rest on minimizing the brain’s free energy, as defined in statistical physics.” He refers also to a Hinton et al. (1995) paper, but several papers published between 2004 – 2006 establish the genesis timeframe for Bayesian interpretations of perception.

AJM’s Note: This paper by Knill & Pouget (2004) was published just prior to Friston’s 2005 paper; both dealing with Bayesian modeling of brain processes. Friston cites this in his 2012 works.

AJM’s Notes: These two Friston papers are useful and important predecessors to Friston (2013). These two, in turn, also cite useful and important predecessor works – by both Friston and colleagues as well as others. (See above.) It’s still TBD as to how deep we need to go in reading back into the earliest works, in order to understand the ones addressed in this (blogpost) course of study.

Active Inference: perhaps the most accessible presentation, by Noor Sajid & colleagues (first recommended in the Themesis June, 2022 blogpost Major Blooper – Coffee Reward):

  • Sajid, N., Philip J. Ball, Thomas Parr, and Karl J. Friston. 2020. “Active Inference: Demystified and Compared.” arXiv:1909.10863v3 [cs.AI] 30 Oct 2020. (Accessed 17 June 2022; https://arxiv.org/abs/1909.10863 )

AJM’s Note: Friston’s 2013 paper is the central point for theoretical (and mathematical) development of his notions on free energy in the brain, and in any living system. He starts with the notion of a system separated by Markov boundary from its external environment. Moves on from there. This blogpost series is largely focused on this paper, buttressed with Friston et al. (2015).

  • Friston, Karl. 2013. “Life as We Know It.” Journal of The Royal Society Interface. 10. doi:10.1098/rsif.2013.0475. (Accessed Oct. 13, 2022; pdf.)

AJM’s Note: Friston and colleagues, in their 2015 paper “Knowing One’s Place,” show how self-assembly (or self-organization) can arise out of variational free energy minimization. Very interesting read!

  • Friston, K.; Levin, M.; Sengupta, B.; Pezzulo, G. 2015. “Knowing One’s Place: A Free-Energy Approach to Pattern Regulation.” J. R. Soc. Interface12:20141383. doi:10.1098/rsif.2014.1383. (Accessed Oct. 3, 2022; pdf.)


… And Something to Go with This Week’s Theme …

The big thing that we’re dealing with in this study of variational free energy is the juxtaposition of opposites. It’s a continually-evolving dynamic tension between using the free energy metaphor (from statistical mechanics) and the notion that we can create “hidden variables” or “latent variables,” and draw on prior observations to form estimates for these latent variables. (This comes from a combination of the overall notion of a latent variable combined with basic elements of Bayesian theory.)

Dynamic tension is not a new thing.

The Nag Hammadi text The Thunder, Perfect Mind, was written about two thousand years ago – actually in the 3rd or 4th century AD. It was found as one of many Nag Hammadi texts, in the mid-1900’s, and not translated until recently.

Thunder, Perfect Mind

I am the whore and the holy woman
I am the wife and the virgin
I am he the mother and the daughter
I am the limbs of my mother …”

From The Thunder, Perfect Mind – an early Nag Hammadi text; article by Dr. Hal Taussig (July 19, 2020), in The Thunder, Perfect Mind.

The entire presentation of The Thunder is an expression of oppositions and contrasts.

For those who are seeking language to convey a non-binary gender identification, the fact that the speaker in Thunder is dominantly feminine but occasionally uses a masculine pronoun (see above) may be both interesting and comforting.

Here’s one translation of the original text: http://gnosis.org/naghamm/thunder.html

Here’s a translation with an interesting/useful commentary and history at the end: https://diotima-doctafemina.org/translations/coptic/the-thunder-perfect-mind/

This is perhaps the most useful translation, commentary, and study: The Thunder, Perfect Mind: A New Translation and Introduction

And here’s a shorter exposition: https://earlychristiantexts.com/the-thunder-perfect-mind/

I learned that when the feminine divine was in the house, every facet of the human experience was celebrated with equal gratitude.

Regena Thomaschauer, aka “Mama Gena.” https://mamagenas.com/how-to-get-out-of-your-head/

3 comments

  1. Dr. A J, Thanks for an interesting discussion of p and q. I think I have q sorted: Ptolemy’s model of the Solar System is one version of q, Copernicus another, Kepler a third, and so on. Fitting the data can be seen as maximizing ELBO, or minimizing Free Energy, or Good Old Fashioned Least Squares.
    But p? I’m reminded of Plato’s cave: “The shadows represent the fragment of reality that we can normally perceive through our senses, while the objects under the sun represent the true forms of objects that we can only perceive through reason”-Wikipedia.
    ‘…we start off with a model of “everything in the entire universe.” That is, our model p is of the external system Psi, the sensing and active modalities s and a, and the internal representation itself, r’. IMHO we have to acknowledge that “p” is unknowable: it describes Plato’s “true forms of objects” (which is why KF takes expectations over q not p). Where I’m confused still is “m”, the model that parameterizes p. What is it? In Plato’s allegory there are actors moving the shapes that generate the shadows. Maybe “m” is a parameter that controls whether the actors are male or female, Oscar winners or wannabees, but surely that is unknowable.

    1. Oh, how totally funny – I love your comments and analogies, Simon!
      You are always a more creative (and I think more insightful) thinker than I am!
      So – w/r/t the parameter set “m.”
      I’ll confess that I have mischief afoot – so my mind is already made up about “m.” (To be revealed in coming posts.)
      The thing is … when we look at the Variational literature, most often, when authors take that next step and introduce a potential set of models, they rely on the good old “family of exponentials” that is so very useful … but also limited.
      Something new coming up soon!
      But in the meantime … LOVED your analogies! – AJM

Comments are closed.

Share via
Copy link
Powered by Social Snap