Variational Free Energy and Active Inference: Pt 2

Our intention with this post is to cover not only the notion, but the notation, used by Karl Friston in his 2013 paper, “Life as We Know It.” (Actually, we’re addressing a very small notational subset – albeit one that needs to be treated with great care and caution.)

To do this, we’re also discussing the same equations, as presented by Friston et al. (2015) in “Knowing One’s Place.”

One of the things that will help us carefully delineate our specific notation-of-interest will be to recollect the Kullback-Leibler divergence.

*Figure 1. The basic Kullback-Leibler divergence equation. Kullback and Leibler, 1951. (See full citation in **Resources and References**.)*

We’re showing the original Kullback-Leibler divergence equation in Figure 1, above.

This equation expresses the divergence in terms of two different “distributions,” f(x) and g(x). In this original context, neither f nor g is either the probability of an observation or the model prediction.

However, for most applications, we use f(x) to refer to the actual observations, or the data, and g(x) to refer to the model.

Now, we cut to the equation presented in Friston et al. (2015),

*Figure 2. The free energy equation, as presented by Friston et al. (2015) in “Knowing One’s Place.”*

Figure 2 presents an extracted equation from Eqn. 3.2, taken from Friston et al. (2015), “Knowing One’s Place.” He says, earlier in describing Eqn. 2.2 of that work, “ergodic density Inline Formula is the solution to the Fokker–Planck equation describing the evolution of the probability density over states. It is straightforward to show that ln p(~x) ¼ L(~x) is the solution to the Fokker–Planck equation [49].” (Ref. [49] is to Friston and Ao (2012); it is included in the Black Diamonds section of Resources and References, at the end of this post.)

Re-interpreted (into much simpler terms); p(x) (imagine a tilde over the x) is model of the actual probability density of states in the external system. This model is achieved via Fokker-Plank diffusion equation.

More specifically, p(Psi|s,a,r) is a model of the external system Psi, as conditioned or dependent on the sensory signals s from Psi to the internal system, on the action signals a from the internal system to the external system Psi, and on the internal system r, which is a representation of the external system Psi. (For more on Psi, s, a, and r, see the previous post, and we will also pick up on this theme in next week’s post, and will insert that link once that post is complete.)

Note on Notation: If we were to stay with Friston’s notation, there would be tildes (“~“) above each of these terms. He started adding the tildes in 2015, with “Knowing One’s Place” (Friston et al.; see Section 2). He uses the tilde to refer to “generalized states.”

As mathematicians and physicists like to say, “we can simplify the notation without loss of meaning.” Unless we’re deliberately replicating one of Friston’s equations in full, we’ll skip the tildes in our discussions.

Friston et al.’s (2015) presentation is a follow-on to the earlier focus of Friston (2013) that we began addressing in the last post. In that post, the key equation that we were addressing is Eqn. 2.7, shown below in Figure 3.

*Figure 3. Friston’s Eqn. 2.7 (in Lemma 2.1), from Friston (2013); “Life as We Know It.”*

The Kullback-Leibler Divergence, Revisited

Our focus is on the FIRST PART of the LAST EQUATION of Eqn. 2.7, presented in Fig. 3 above. We see that Friston defines the free energy, F(s,a,lambda) as a “functional of an arbitrary (variational) density q(psi|lambda) that is parameterized by internal states.”

Lambda denotes the internal states of the system; those that are partitioned from the external part of the system via that ubiquitous Markov blanket. (In 2015, Friston et al. use R & r instead of Lambda and lambda. Forewarned.)

AJM’s Note: Friston uses the term “functional” precisely; a function is a function of variables; a functional is a function of functions. (See a useful ResearchGate discussion on the difference between function & functional.)

When we look at this equation in Fig. 3, we see that the free energy F is being defined by a Kullback-Leibler divergence equation.

Digression: We’ve Now LEFT “Life as We Know It”

When we see that last equation in Fig. 3, we right away get clued into something very important: we are no longer in the statistical-mechanics world.

Those of us coming from perspective of physics or physical chemistry have a very well-ordered view of free energy. We understand how free energy is formulated in either a macro-sense (e.g., Carnot cycles, efficiencies of engines, etc.) or a micro-sense, based solidly on statistical mechanics.

At this point, we are leaving the world of “real” statistical mechanics behind. There are no particles in a box, where each particle has its own little “energy state.”

Instead, the equations that are true in thermodynamics – and which represent very real measurables in the very real, solid, physical world – are now abstractions.

Actually, they are metaphors.

I’ve referenced this YouTube in prior blogposts, and am linking to it again because – for those of us who went through a good and thorough sheep-dipping in the traditional approach to free energy, Eqn. 2.7 of Friston (2013), and all such similar works, comes as a bit of a shock.

Maren, Alianna J. 2021. “The AI Salon: Statistical Mechanics as a Metaphor.” Themesis, Inc. YouTube Channel (Sept. 2, 2021). (Accessed Oct. 18, 2022; https://www.youtube.com/watch?v=D0soPGtBbRg&t=2s )

The free energy equation, in particular, is called upon because it is framed as the difference between one term and something that looks enormously like entropy. (See that last part of the last eqn. in Fig. 3.)

It’s very easy to take that analogy and run with it. Free energy, instead of being a thermodynamic function, is now something that is “defined” as this difference. It remains only to fill in the blanks – exactly WHAT is that variable in the entropy term? There’s definitely a probability of something, and the formalism looks like entropy – and in fact it is being called entropy – but it’s not a real thermodynamic entropy, and so we want to be clear that we are now in some magical metaphorical land that has no real correlation with physical systems.

Returning to the Kullback-Leibler Divergence

Ahh … I just had to do that. Now let’s get back to business.

We have a Kullback-Leibler divergence in the first part of the last eqn of Fig. 3 (Eqn. 2.7 from Friston, 2013).

Actually, we have the negative of the Kullback-Leibler divergence; see Figure 1 for the original. In the Kullback-Leibler divergence, the multliplier variable (e.g., f(x)) is also the numerator in the difference term that we’re taking the log of.

Not a problem. Flip the terms (numerator and denominator), change the sign of the whole equation; and get the same thing.

Friston is doing this because, when we go through sufficient mathematical steps, we get the last part of Eqn. 2.7 (in Fig. 3), we get the -H[q(psi|mu)] term (where mu will be the (set of) parameters that can be varied; this is, after all variational free energy). That negative sign in front of the last term is important, because in thermodynamics, the free energy F = H – (T)S. (We’ll absorb the T to get a reduced form, so that the newly-created “reduced” equation is F = H – S, and we want that minus sign in front of the S.)

And, to make things extra-special-delightful, Friston is using the notation typically used by information theorists, who denote the enthalpy H as E, and the entropy S as H … I believe it was Shannon who concocted that little bit, and WHY IN HELL HE COULDN’T LEAVE WELL ENOUGH ALONE … !!!

But in the information theoretics world, the free energy F = E – H, whereas in the physical chemistry world, the free energy F = H – S. (And this leads directly to one of my rants on notation … which I’ll put on a shelf until later. But I will indulge. It’s coming.)

OK, that was almost another digression. Let’s pull back to the Kullback-Leibler.

We want to figure out what Friston means by q(x). He has words … we’ll get to them later. Let’s look at the equation first.

We begin with the Kullback-Leibler divergence, once again.

We’re doing this because we want to see the K-L divergence in its more typical current form, with P (or p) and Q (or q) notation, so we can compare with Friston (and Friston et al.’s) notation.

Figure 4. The definition of the Kullback-Leibler divergence, in integrative form, from the venerable Wiki. which really presents a good discussion and uses the P and Q notation most commonly found these days.

There is an accepted and conventional usage of P and Q notation these days; one good example is given in a MatLab (MathWorks) tutorial, which states that for its function KLD:

KLD = getKullbackLeibler(P,Q)
Compute Kullback-Leibler divergence of probability distribution Q from probability distribution P. P represents the “true” distribution of data, observations, or a theoretical distribution, whereas Q typically represents a theory, model, description, or approximation of P.”
MathWorks description of the function KLD, which calls the Kullback-Leibler divergence.

Now, this is not news to anyone. We’re presenting this simply to cover our bases – reinforce the fact that most typically, when we’re using the K-L divergence to measure how well a model fits a data (or “observables”) distribution, most typically, the data is denoted as P (or p(x)) and the model is denoted as Q (or q(x)).

We summarize this in the following Figure 5.

*Figure 5. Summary of the notation most typically used for the Kullback-Leibler divergence, when it is used to measure the divergence between a model Q (or q(x)) and a data probability set P (or p(x)).*

Now, let’s go back to Friston (2013), and to be inclusive, also to Friston et al. (2015).

Life in Friston-Land

Let’s take a look at the notation used by Friston (2013) and also Friston et al. (2015).

At the risk of being terribly pedantic, this was (earlier) a source of confusion to me, and thus could be a potential pitfall to others.

Figure 6. Friston is using the notation q(psi|lambda) to denote the “data” or “observations” when we use the K-L divergence to measure the divergence between a set of observations and the model of that set. (And just as a reminder; “lambda” = “r;” Friston et al. made that notational change in 2015.)

We can see that the typical notation is reversed here, when we compare the notation from Friston’s work (Fig. 6) to the more typical notation (Fig. 5).

Friston (and Friston et al.) use q(psi|lambda) to denote the “data” or “observations,” and p(psi, s, a, lambda|m) to denote the model.

Now that we’ve clarified how Friston (2013) and Friston et al. (2015) are using the variables q and p in their work, let’s look at how Friston actually defines these terms.

Actually, we’re looking at his definition of q(psi-with-tilde|r-with-tilde).

Friston states that [we are interpreting] “internal states as parameterizing some arbitrary (variational) density of Bayesian beliefs q(psi-with-tilde|r-with-tilde) about external states.”

Figure 7. Friston’s definition of the variable q, specifically *q(psi-with-tilde|r-with-tilde)* (Friston et al., 2015).

Finding New Meaning in “Life as We Know It”

We’re going to assume that Karl Friston didn’t just wake up some morning and arbitrarily decide to reverse notation.

And in fact, he didn’t. And neither did Matthew Beal (2003), on whose work Friston’s notation is based.

To understand what just happened, though, we need to step back and take another look at that notion of a system separated from its surround by a Markov blanket.

This is key, and we will pick up with this theme in next week’s blogpost.

(AJM’s Note: Not to deliberately indulge in a cliff-hanger, but once again – we’ve got to hit that publication deadline.)

How to Stay Informed

This is the second in the blogpost series on Variational Free Energy and Active Inference. We’re anticipating weekly posts, and a few YouTubes as well. To be informed as soon as these blogs / YouTubes come out, please do an Opt-In with Themesis.

To do this, go to www.themesis.com/themesis.

(You’re on www.themesis.com right now. You could just hit that “About” button and you’ll be there.)

Scroll down. There’s an Opt-In form. DO THAT.

And then, please, follow through with the “confirmation” email – and then train yourself and your system to OPEN the emails, and CLICK THROUGH. That way, you’ll be current with the latest!

Thank you! – AJM

Resources & References

Following the protocol that we introduced in last week’s blogpost, we are now grouping resources & references according to difficulty level – from Bunny Trails to Blue Squares to Double Black Diamond.

This week, we’ve pared down the references – and will start rebuilding towards the paper under discussion, that “Life as We Know It” (2013) paper.

Almost all of Friston’s works are double black. Love the man. Think he’s genius. But that “double-black” rating is just what is so.

I’m putting a couple of early blogposts about Friston in at the Bunny Trails level, as well as some bio & auto-bio materials (thank you, Simon!) and a couple of (relatively) easy-to-read articles (thank you, Biswa and Simon!) are in the Blue Squares section.

Bunny Trails – Decent Introductory Source Stuff

CAPTION: **The Bunny**. We decided (see Kullback-Leibler 2.5/3, or “Black Diamonds,” that we needed trail markings for the resource materials.

Bunny slopes: the introductory materials. Some are web-based blogposts or tutorials – good starter sets. All Themesis blogposts and YouTubes in this category.

Bio and Autobio Articles re/ Friston

AJM’s Note: A big shout-out and thank you to Simon Crase, who suggested these two articles in a comment to last week’s post – much appreciated, Simon!

Friston – Semi-Bio:

Fridman, Lex, interviewing Karl Friston. (2020). “You Are Your Own Existence Proof (Karl Friston) | AI Podcast Clips with Lex Fridman.” Lex Fridman YouTube Channel series (July 1, 2020). (Accessed Oct. 18, 2022; https://www.youtube.com/watch?v=k8Zomsf3uBI)

Friston – Semi-Auto-Bio:

Friston, K. (2018). Am I autistic? An intellectual autobiography. ALIUS Bulletin, 2, 45-52. https://www.aliusresearch.org/uploads/9/1/6/0/91600416/friston_-_am_i_autistic_.pdf

The Notions of “Surprise” and “Ergodicity”

AJM’s Note: We covered this last week; please see last week’s post for some very good introductory (readable) articles regarding the information-theoretic notion of “surprise” and the physical chemistry notion of “ergodicity,” at June 2, 2022). (Accessed Oct. 18, 2022; https://themesis.com/2022/10/13/variational-free-energy-and-active-inference-pt-1/ )

Related Themesis Blogposts

The Variational Free Energy and Active Inference Series; first one in the series:

Maren, Alianna J. 2022. “Variational Free Energy and Active Inference: Pt. 1.” Themesis, Inc. Blogpost Series (www.themesis.com). (Oct. 13, 2022). (Accessed Oct. 18, 2022; https://themesis.com/2022/10/13/variational-free-energy-and-active-inference-pt-1/ )

The Kullback-Leibler/Free Energy/Variational Inference Series; just the kick-off post for the entire thing:

Maren, Alianna J. 2022. “The Kullback-Leibler Divergence, Free Energy, and All Things Variational (Part 1 of 3).” Themesis, Inc. Blogpost Series (www.themesis.com). (June 28, 2022) (Accessed Sept. 6, 2022; https://themesis.com/2022/06/28/the-kullback-leibler-divergence-free-energy-and-all-things-variational-part-1-of-3/ )

Prior (much older) Blogpost on the Kullback-Leibler Divergence:

Maren, Alianna J. 2014. “The Single Most Important Equation for Brain-Computer Information Interfaces.” Alianna J. Maren Blogpost Series (www.aliannajmaren.com). (November 28, 2014) (Accessed Aug. 30, 2022; https://www.aliannajmaren.com/2014/11/28/the-single-most-important-equation-for-brain-computer-information-interfaces/ )

Older posts on Friston:

Maren, Alianna J. 2019. “Interpreting Karl Friston: Round Deux.” Alianna J. Maren blogpost series; www.aliannajmaren.com (July 31, 2019). (Accessed Oct. 10, 2022; https://www.aliannajmaren.com/2019/07/31/interpreting-karl-friston-round-deux/ )
Maren, Alianna J. 2019. “How to Read Karl Friston (In the Original Greek).” Alianna J. Maren blogpost series; www.aliannajmaren.com (July 27, 2017). (Accessed Oct. 10, 2022; http://www.aliannajmaren.com/2017/07/27/how-to-read-karl-friston-in-the-original-greek/ )

The following (2016) blogpost is useful mostly because it has some links to good tutorial references:

Maren, Alianna J. 2016.”Approximate Bayesian Inference.” Alianna J. Maren blogpost series; www.aliannajmaren.com (Nov. 4, 2016). (Accessed Oct. 10, 2022; http://www.aliannajmaren.com/2016/11/04/approximate-bayesian-inference/ )

Related Themesis YouTubes

For prior Themesis YouTubes on statistical mechanics as it relates to AI, particularly the notion of the “Donner Pass(es) of AI,” see the Resources & References in the prior blogpost of this series.

CAPTION: ***Intermediate Reading/Viewing:*** Requires preliminary knowledge of both concepts and notation. Not trivially easy, but accessible – often advanced tutorials.

Matthew Beal and David Blei

AJM’s Note: Karl Friston’s 2013 “Life as We Know It” referenced Beal’s 2003 dissertation. Friston’s notation is largely based on Beal’s. Friston introduces the notion of Markov blankets as a key theme in discussing how life (or biological self-organization) necessarily emerges from a “random dynamical system that possesses a Markov blanket.” Beal’s Section 1 discusses both Bayesian probabilities as well as Markov blankets. Reading Beal’s work is a very useful prerequisite for getting into anything by Friston. It helps that Beal does his best to present material in a tutorial style. We’ll start with Markov blankets in the next post.

Beal, M. 2003. Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London. (Accessed Oct. 13, 2022; pdf.)

AJM’s Note: I refer to the Blei et al. tutorial because it is very solid and lucid. If we’re trying to understand variational ANYTHING (variational Bayes, variational inference, variational autoencoders, etc.); Blei et al. make some comments and offer a perspective that is very complementary to that given by Beal in his 2003 dissertation.

Blei, D.M., A. Kucukelbir, and J.D. McAuliffe. 2016. “Variational Inference: A Review for Statisticians.” arXiv:1601.00670v9 doi:10.48550/1601.00670 (Accessed June 28, 2022; pdf. )

Karl Friston & Colleagues

AJM’s Note: ANYTHING written by Friston is “double black diamond.” That said, a few papers are a BIT more accessible than others.

AJM’s Note: Friston’s 2010 paper is largely conceptual. (Meaning, equations-free.) Not to be blown off; he’s establishing the context for future works.

Friston, K. 2010. “The Free-Energy Principle: A Unified Brain Theory?” Nature Reviews Neuroscience. 11 (2), 127-138. (Accessed Oct. 13, 2022; online access.)

AJM’s Note: Thanks and a shout-out to Biswa Sengupta, who reminded me of this excellent (and fairly readable) 2016 paper.

Sengupta, Biswa, Arturo Tozzi, Gerald K. Cooray, Pamela K. Douglas, and Karl J. Friston. 2016. “Towards a Neuronal Gauge Theory.” PLoS Biol 14(3): e1002400. doi:10.1371/journal.pbio.1002400. (Accessed Oct. 18, 2022; pdf)

AJM’s Note: Thanks and a shout-out to Simon Crase, who pointed out this 2022 article by Friston et al. that promises to be a much more accessible read – Simon’s recommendation came in just as I was prepping this post, so I haven’t read this yet, much less had a chance to integrate references into this post – look for the integration and refs starting in the next post in this series.

Friston, Karl, Lancelot Da Costa, Noor Sajid, Conor Heins, Kai Ueltzhöffer, Grigorios A. Pavliotis, and Thomas Parr. 2022. “The Free Energy Principle Made Simpler but Not Too Simple.” arXiv:2201.06387v2 [cond-mat.stat-mech] (Jan. 28, 2022) doi:arXiv.2201.06387. (Accessed Oct. 19, 2022; pdf)

Kullback & Leibler – Orig. Work

AJM’s Note: Kullback and Leibler. Their original paper. The one that started all of this.

Kullback S, and R.A. Leibler. 1951. “On Information and Sufficiency.” Ann. Math. Statist. 22(1):79-86. (Accessed June 28, 2022; https://www.researchgate.net/publication/2820405_On_Information_and_Sufficiency/link/00b7d5391f7bb63d30000000/download )

CAPTION: **Double Black Diamond: Expert-only!** These books, tutorials, blogposts, and vids are best read and watched AFTER you’ve spent a solid time mastering fundamentals. Otherwise, a good way to not only feel lost, but hugely insecure.

Friston & Co.

AJM’s Note: This Friston (2005) paper is his most-cited paper for the his personal genesis of active inference, and seems to be the earliest where he presents a fully-fleshed notion of how “both inference and learning rest on minimizing the brain’s free energy, as defined in statistical physics.” He refers also to a Hinton et al. (1995) paper, but several papers published between 2004 – 2006 establish the genesis timeframe for Bayesian interpretations of perception.

Friston, Karl. 2005. “A Theory of Cortical Responses.” Philos. Trans. R. Soc. Lond B Biol. Sci. 360:815–836. (Accessed Oct. 13, 2022; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1569488/ )

AJM’s Note: This paper by Knill & Pouget (2004) was published just prior to Friston’s 2005 paper; both dealing with Bayesian modeling of brain processes. Friston cites this in his 2012 works.

Knill, David C. and Alexandre Pouget. 2004. “The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation.” Trends Neurosci. 27:712–719. (Accessed Oct. 13, 2022; http://uoneuro.uoregon.edu/wehr/coursepapers/knill-pouget-2005.pdf )

AJM’s Notes: These two Friston papers are useful and important predecessors to Friston (2013). These two, in turn, also cite useful and important predecessor works – by both Friston and colleagues as well as others. (See above.) It’s still TBD as to how deep we need to go in reading back into the earliest works, in order to understand the ones addressed in this (blogpost) course of study.

Friston, Karl, and Ping Ao. 2012. “Free Energy, Value, and Attractors.” Comput. Math. Meth. Med. 2012, 937860. doi:10.1155/2012/937860. (Accessed Oct. 10, 2022; https://pdfs.semanticscholar.org/d76d/94d6367c6b7feb1dd1051ce2214af97d5a11.pdf?_ga=2.134216265.693488368.1665433077-1262626841.1662660794 )

Friston, Karl. 2012. “A Free Energy Principle for Biological Systems.” Entropy 14(11), 2100-2121; doi:10.3390/314112100. (Accessed June 28, 2022; https://www.mdpi.com/1099-4300/14/11/2100 )

Active Inference: perhaps the most accessible presentation, by Noor Sajid & colleagues (first recommended in the Themesis June, 2022 blogpost Major Blooper – Coffee Reward):

Sajid, N., Philip J. Ball, Thomas Parr, and Karl J. Friston. 2020. “Active Inference: Demystified and Compared.” arXiv:1909.10863v3 [cs.AI] 30 Oct 2020. (Accessed 17 June 2022; https://arxiv.org/abs/1909.10863 )

AJM’s Note: Friston’s 2013 paper is the central point for theoretical (and mathematical) development of his notions on free energy in the brain, and in any living system. He starts with the notion of a system separated by Markov boundary from its external environment. Moves on from there. This blogpost series is largely focused on this paper, buttressed with Friston et al. (2015).

Friston, Karl. 2013. “Life as We Know It.” Journal of The Royal Society Interface. 10. doi:10.1098/rsif.2013.0475. (Accessed Oct. 13, 2022; pdf.)

AJM’s Note: Friston and colleagues, in their 2015 paper “Knowing One’s Place,” show how self-assembly (or self-organization) can arise out of variational free energy minimization. Very interesting read!

Friston, K.; Levin, M.; Sengupta, B.; Pezzulo, G. 2015. “Knowing One’s Place: A Free-Energy Approach to Pattern Regulation.” J. R. Soc. Interface. 12:20141383. doi:10.1098/rsif.2014.1383. (Accessed Oct. 3, 2022; pdf.)

… And Music to Go with This Week’s Theme …

“Gamble Everything for Love.” Only because I’m listening to it right now (on the Spotify Blacklist soundtrack) while writing this post.

“Gamble Everything for Love,” Ben Lee (Official Video).