Variational Free Energy and Active Inference: Pt 3

When we left off in our last post, we’d determined that Friston (2013) and Friston et al. (2015) reversed the typical P and Q notation that was commonly used for the Kullback-Leibler divergence.

Just as a refresher, we’re posting those last two images again.

The following Figure 1 was originally Figure 5 in last week’s Part 2 of this series.

We’ve established that the more common notation (referencing the venerable Wiki, as well as other published sources) typically uses P for the data probability, and Q for the model.

Figure 1. (Reproducing Figure 5 from last week’s Part 2 of this series.) Summary of the notation most typically used for the Kullback-Leibler divergence, when it is used to measure the divergence between a model Q (or q(x)) and a data probability set P (or p(x)).

Now, let’s go back to Friston (2013), and to be inclusive, also to Friston et al. (2015).

Figure 2. (Originally published as Figure 6 in Part 2 of this series.) Friston is using the notation q(psi|lambda) to denote the “data” or “observations” when we use the K-L divergence to measure the divergence between a set of observations and the model of that set. (And just as a reminder; “lambda” = “r;” Friston et al. made that notational change in 2015.)

We can see that the typical notation is reversed here, when we compare the notation from Friston’s work (Fig. 2) to the more typical notation (Fig. 1).


AJM’s Quick Note: We have an extensive Resources & References (R&R) list at the end. Starting with this post, we have a bullet-list of the important papers referenced in this post at the beginning of that list. They are identified by the R&R “difficulty section” in which they belong; “Bunny Trails,” “Blue Squares,” or “Black Diamonds.”


An Intentional Step

Now, that notation reversal was not a goofus on the part of Friston & Friston et al.

Rather, they were picking up with notation established by Matthew Beal, in his 2003 Ph.D. dissertation on “Variational Inference.”

Let’s look at an extract from Beal:

Figure 3: Matthew Beal, in his 2003 Ph.D. dissertation, “Variational Algorithms for Approximate Bayesian Inference” (p. 48), establishes use of q(x) for the “data” probability and p(x,y|theta) for the “model” probability.

So, we’re seeing that Friston’s notation is in alignment with Beal’s.

We might also note that Beal has an extensive discussion on Markov networks and Markov blankets in Section 1.1, “Probabilistic Inference,” of his dissertation. This is useful, because Friston picks up on the notion of Markov blankets in his work; we’ll see that reflected in his notation shortly.


M. Beal: Variational Bayes

Our next step is to look a bit more into the notation that Matthew Beal is using. This is because we’re going to make a fairly precise correspondence between Friston’s notation and Beal’s.

Let’s see the two formulations, juxtaposed:

Figure 4: Juxtaposing the notation used by Friston (2013) and Beal (2003).

We notice that the function F(q(x),theta) in Beal’s notation is formally the same as Friston’s function F(s,a,lambda) – the difference is that Friston’s has a negative sign in front.

However, we quickly find the correspondence:

Figure 5: Beal (2003) identifies F(q(x), theta) as the negative of the free energy and thus removes the negative sign that we see in front of F(s,a,lambda) in Friston (2013), see Fig. 3.

We see that Beal explicitly identifies his F(q(x), theta) as the negative of the free energy; that is, he does NOT include the negative sign in front of the F(s,a,lambda) that we see in Friston (2013).

Thus far, we’ve identified the formal correspondence of the expressions for the free energy in both Beal (2003) and Friston (2013), with a nod also to Friston et al. (2015).

Now, we’d like to more carefully express what the terms q and p actually represent.


Minding Our “P’s” and “Q’s”

So far, we’ve confirmed two things:

  • Friston (2013) and Friston et al. (2015) share a notational reference frame that reaches back to Beal (2003), and
  • In each of these works (and, as we’ll see later, Blei et al. (2018)), P(x) and Q(x) are interchanged with respect to the more traditional meanings.

So for each of these, Q(x) (or q(x)) refers to the data, or the observables, or “that which is being modeled,” and P(x) (or p(x)) refers to the model, and is generally of the form p(x,y|theta), where theta is the {set of} model parameter(s).

The notation that we’ve shown from Beal (2003) is the simplest of what he offers, his Sections 3 FF address variational Bayes for a system with a Markov blanket. His notation there is a bit more complex than what we need; he’s walking us through the E-M (Expectation-Maximization) algorithm for the Markov blanket system.

Instead, for our purposes, we can do a more straightforward notational comparison.

We start with Beal’s definitions.

Figure 6: Beal (2003) identifies the hidden variables as x, and the observed variables as y. The parameter (set) is given as theta. A generative model produces the data y using theta, where we need to integrate over the hidden variables x.

Now what is interesting here is that although we could indeed “observe” the observed variables y(i) directly, what Beal is presenting is a probability distribution p(y|theta), and to obtain this distribution, we use a generative model that integrates a probability function over the hidden variables x(i) for a given (set of) model parameters theta.

Friston uses, in a corresponding sense, p(Psi, s, a, lambda |m). (See Figure 4.)

Similarly, Friston uses q(Psi | lambda) as notation, whereas Beal uses q(x).

In both cases, q(*) refers to the hidden variables.

Summary: We have the correspondence from Friston-to-Beal as:

  • Psi is the hidden variable, and for Friston, it is conditioned upon lambda (or later, in Friston et al. (2015), as r – for representation.) Beal describes q(x) as creating an auxiliary variable.
  • The model parameter for Beal is theta, and for Friston is m.
  • Lambda (or r), s, and a are all observable variables, corresponding to y in Beal’s notation.
  • To represent the probability distribution q over the hidden variables, Friston works with q(Psi | lambda) and Beal with q(x).
  • To represent the probability distribution p over the full set of hidden and observed variables, Friston works with p(Psi, s, a, lambda | m) and Beal with p(x, y|theta).

Figuring Out What Is “Observable” (in Friston’s Markov-Blanket Space)

Well, my head just gets a bit spinny when it comes to keeping track of variables across multiple papers. (Things get even more interesting when, next week or the week thereafter, we pull in Blei et al. (2018).)

In particular, in reading Friston’s works, I ask myself – why does the “external space” (the external states, to use Friston’s terms) comprise a “hidden variable”? After all, this is … simply … extrinsic. It’s right there, to measure, etc.

See Friston’s explanation of his notation q(Psi | lambda) in the following Figure 7.

Figure 7. Friston (2003) and Friston et al. (2015) explain the meaning of q(Psi | lambda) as representing the posterior density over external states.

(AJM’s Note: Next week – an example. Right now, I need to pause. This is a big “grading week,” and I would love, just LOVE, to continue … but “Read-and-Review” (aka, grading) need to be done.)


How to Stay Informed

This is the third in the blogpost series on Variational Free Energy and Active Inference. We’re anticipating weekly posts, and a few YouTubes as well. To be informed as soon as these blogs / YouTubes come out, please do an Opt-In with Themesis.

To do this, go to www.themesis.com/themesis.

(You’re on www.themesis.com right now. You could just hit that “About” button and you’ll be there.)

Scroll down. There’s an Opt-In form. DO THAT.

And then, please, follow through with the “confirmation” email – and then train yourself and your system to OPEN the emails, and CLICK THROUGH. That way, you’ll be current with the latest!

Thank you! – AJM



Resources & References

This Week’s Read-Along’s

The “Resources & References” list is, alas, getting long.

To get a focus, we’re identifying the material specific to this week’s post, and further identifying where they are – “Bunny Trails,” “Blue Squares,” or “Black Diamonds.”

  • Beal, M. 2003. Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis. (Blue Squares)
  • Friston, K. 2013. “Life as We Know It.” (Black Diamonds)
  • Friston, K.; Levin, M.; Sengupta, B.; Pezzulo, G. 2015. “Knowing One’s Place: A Free-Energy Approach to Pattern Regulation.” (Black Diamonds)

Following the protocol that we introduced in last week’s blogpost, we are now grouping resources & references according to difficulty level – from Bunny Trails to Blue Squares to Double Black Diamond.

Almost all of Friston’s works are double black.

I’m putting a couple of early blogposts about Friston in at the Bunny Trails level, as well as some bio & auto-bio materials (thank you, Simon!) and a couple of (relatively) easy-to-read articles (thank you, Biswa and Simon!) are in the Blue Squares section.


Bunny Trails – Decent Introductory Source Stuff

CAPTION: The Bunny. We decided (see Kullback-Leibler 2.5/3, or “Black Diamonds,” that we needed trail markings for the resource materials.

Bunny slopes: the introductory materials. Some are web-based blogposts or tutorials – good starter sets. All Themesis blogposts and YouTubes in this category.


Bio and Autobio Articles re/ Friston

AJM’s Note: A big shout-out and thank you to Simon Crase, who suggested these two articles in a comment to last week’s post – much appreciated, Simon!

Friston – Semi-Bio:

  • Fridman, Lex, interviewing Karl Friston. (2020). “You Are Your Own Existence Proof (Karl Friston) | AI Podcast Clips with Lex Fridman.” Lex Fridman YouTube Channel series (July 1, 2020). (Accessed Oct. 18, 2022; https://www.youtube.com/watch?v=k8Zomsf3uBI)

Friston – Semi-Auto-Bio:


Related Themesis Blogposts

The Variational Free Energy and Active Inference Series:

The Kullback-Leibler/Free Energy/Variational Inference Series; just the kick-off post for the entire thing:

Prior (much older) Blogpost on the Kullback-Leibler Divergence:

Older posts on Friston:

  • Maren, Alianna J. 2019. “Interpreting Karl Friston: Round Deux.” Alianna J. Maren blogpost series; www.aliannajmaren.com (July 31, 2019). (Accessed Oct. 10, 2022; https://www.aliannajmaren.com/2019/07/31/interpreting-karl-friston-round-deux/ )
  • Maren, Alianna J. 2019. “How to Read Karl Friston (In the Original Greek).” Alianna J. Maren blogpost series; www.aliannajmaren.com (July 27, 2017). (Accessed Oct. 10, 2022; http://www.aliannajmaren.com/2017/07/27/how-to-read-karl-friston-in-the-original-greek/ )

The following (2016) blogpost is useful mostly because it has some links to good tutorial references:


Related Themesis YouTubes

  • For prior Themesis YouTubes on statistical mechanics as it relates to AI, particularly the notion of the “Donner Pass(es) of AI,” see the Resources & References in the prior blogpost of this series.

CAPTION: Intermediate Reading/Viewing: Requires preliminary knowledge of both concepts and notation. Not trivially easy, but accessible – often advanced tutorials.

Matthew Beal and David Blei

AJM’s Note: Karl Friston’s 2013 “Life as We Know It” referenced Beal’s 2003 dissertation. Friston’s notation is largely based on Beal’s. Friston introduces the notion of Markov blankets as a key theme in discussing how life (or biological self-organization) necessarily emerges from a “random dynamical system that possesses a Markov blanket.” Beal’s Section 1 discusses both Bayesian probabilities as well as Markov blankets. Reading Beal’s work is a very useful prerequisite for getting into anything by Friston. It helps that Beal does his best to present material in a tutorial style. We’ll start with Markov blankets in the next post.

  • Beal, M. 2003. Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London. (Accessed Oct. 13, 2022; pdf.)

AJM’s Note: I refer to the Blei et al. tutorial because it is very solid and lucid. If we’re trying to understand variational ANYTHING (variational Bayes, variational inference, variational autoencoders, etc.); Blei et al. make some comments and offer a perspective that is very complementary to that given by Beal in his 2003 dissertation.

  • Blei, D.M., A. Kucukelbir, and J.D. McAuliffe. 2016. “Variational Inference: A Review for Statisticians.” arXiv:1601.00670v9 doi:10.48550/1601.00670 (Accessed June 28, 2022; pdf. )

Karl Friston & Colleagues

AJM’s Note: ANYTHING written by Friston is “double black diamond.” That said, a few papers are a BIT more accessible than others.

AJM’s Note: Friston’s 2010 paper is largely conceptual. (Meaning, equations-free.) Not to be blown off; he’s establishing the context for future works.

  • Friston, K. 2010. “The Free-Energy Principle: A Unified Brain Theory?” Nature Reviews Neuroscience11 (2), 127-138. (Accessed Oct. 13, 2022; online access.)

AJM’s Note: Thanks and a shout-out to Biswa Sengupta, who reminded me of this excellent (and fairly readable) 2016 paper.

  • Sengupta, Biswa, Arturo Tozzi, Gerald K. Cooray, Pamela K. Douglas, and Karl J. Friston. 2016. “Towards a Neuronal Gauge Theory.” PLoS Biol 14(3): e1002400. doi:10.1371/journal.pbio.1002400. (Accessed Oct. 18, 2022; pdf)

AJM’s Note: Thanks and a shout-out to Simon Crase, who pointed out this 2022 article by Friston et al. that promises to be a much more accessible read – Simon’s recommendation came in just as I was prepping this post, so I haven’t read this yet, much less had a chance to integrate references into this post – look for the integration and refs starting in the next post in this series.

  • Friston, Karl, Lancelot Da Costa, Noor Sajid, Conor Heins, Kai Ueltzhöffer, Grigorios A. Pavliotis, and Thomas Parr. 2022. “The Free Energy Principle Made Simpler but Not Too Simple.” arXiv:2201.06387v2 [cond-mat.stat-mech] (Jan. 28, 2022) doi:arXiv.2201.06387. (Accessed Oct. 19, 2022; pdf)

Kullback & Leibler – Orig. Work

AJM’s Note: Kullback and Leibler. Their original paper. The one that started all of this.



CAPTION: Double Black Diamond: Expert-only! These books, tutorials, blogposts, and vids are best read and watched AFTER you’ve spent a solid time mastering fundamentals. Otherwise, a good way to not only feel lost, but hugely insecure.

Friston & Co.

AJM’s Note: This Friston (2005) paper is his most-cited paper for the his personal genesis of active inference, and seems to be the earliest where he presents a fully-fleshed notion of how “both inference and learning rest on minimizing the brain’s free energy, as defined in statistical physics.” He refers also to a Hinton et al. (1995) paper, but several papers published between 2004 – 2006 establish the genesis timeframe for Bayesian interpretations of perception.

AJM’s Note: This paper by Knill & Pouget (2004) was published just prior to Friston’s 2005 paper; both dealing with Bayesian modeling of brain processes. Friston cites this in his 2012 works.

AJM’s Notes: These two Friston papers are useful and important predecessors to Friston (2013). These two, in turn, also cite useful and important predecessor works – by both Friston and colleagues as well as others. (See above.) It’s still TBD as to how deep we need to go in reading back into the earliest works, in order to understand the ones addressed in this (blogpost) course of study.

Active Inference: perhaps the most accessible presentation, by Noor Sajid & colleagues (first recommended in the Themesis June, 2022 blogpost Major Blooper – Coffee Reward):

  • Sajid, N., Philip J. Ball, Thomas Parr, and Karl J. Friston. 2020. “Active Inference: Demystified and Compared.” arXiv:1909.10863v3 [cs.AI] 30 Oct 2020. (Accessed 17 June 2022; https://arxiv.org/abs/1909.10863 )

AJM’s Note: Friston’s 2013 paper is the central point for theoretical (and mathematical) development of his notions on free energy in the brain, and in any living system. He starts with the notion of a system separated by Markov boundary from its external environment. Moves on from there. This blogpost series is largely focused on this paper, buttressed with Friston et al. (2015).

  • Friston, Karl. 2013. “Life as We Know It.” Journal of The Royal Society Interface. 10. doi:10.1098/rsif.2013.0475. (Accessed Oct. 13, 2022; pdf.)

AJM’s Note: Friston and colleagues, in their 2015 paper “Knowing One’s Place,” show how self-assembly (or self-organization) can arise out of variational free energy minimization. Very interesting read!

  • Friston, K.; Levin, M.; Sengupta, B.; Pezzulo, G. 2015. “Knowing One’s Place: A Free-Energy Approach to Pattern Regulation.” J. R. Soc. Interface12:20141383. doi:10.1098/rsif.2014.1383. (Accessed Oct. 3, 2022; pdf.)


… And Music to Go with This Week’s Theme …

“Carry on Wayward Son.” Was picking it up while watching Reacher, S1 E6 “Papier” a couple of evenings ago.

Not even going to try to interpret the lyrics in the context of this week’s post. (That one’s up to you.)

“Carry on Wayward Son,” by Kansas (Official Video)

Share via
Copy link
Powered by Social Snap