Variational Free Energy: Getting-Started Guide and Resource Compendium

Many of you who have followed the evolution of this variational inference discussion (over the past ten blogposts), may be wondering where to start. This would be particularly true for readers who are not necessarily familiar with the variational-anything literature, and would like to begin with the easiest, most intuitive-explanatory articles possible, and then gently ease into the mathematical hard-core.

Thus, even though this blogpost presents a resource-compendium, we focus first on a Getting Started Guide.

Then, the compendium identifies resources that we’ve mentioned over the previous ten weeks, on the topics of:

  • The Kullback-Leibler (KL) divergence,
  • Free energy, especially its role in variational systems,
  • Karl Friston’s work on variational free energy for systems enclosed within a Markov blanket,
  • Matthew Beal’s Ph.D. dissertation as a primary resource for studying Friston’s works,
  • David Blei et al., again as a resource on variational inference (with an important tutorial paper), and
  • Supporting topics, such as “ergodicity” and “surprise.”

How to Stay Informed

This is a follow-on to the blogpost series on Variational Free Energy and Active Inference, as well as the prior series on Kullback-Leibler, Free Energy, and All Things Variational. We’re anticipating weekly posts, and a few YouTubes as well. To be informed as soon as these blogs / YouTubes come out, please do an Opt-In with Themesis.

To do this, go to www.themesis.com/themesis.

(You’re on www.themesis.com right now. You could just hit that “About” button and you’ll be there.)

Scroll down. There’s an Opt-In form. DO THAT.

And then, please, follow through with the “confirmation” email – and then train yourself and your system to OPEN the emails, and CLICK THROUGH. That way, you’ll be current with the latest!

THANK YOU!!! – AJM


Most-Recommended Starting Reads

Figure 1. “Start here” with a selection of accessible/intuitive introductions &/or tutorials for the Kullback-Leibler divergence, free energy, and variational inference, as well as related topics. (Photo courtesy Dreamstime: Photo 88741587 © Megalomaniac | Dreamstime.com)

This section serves those who want to start in. Most recommendations are for things that provide a sort of intuitive and/or contextual overview.

Begin with the End in Mind – Active Inference

“Begin with the end in mind” is Habit #2 from Steven Covey’s highly-regarded 7 Habits of Highly Effective People.

For us, our end-game is active inference.

So, even if we can’t read the entire paper (on first pass), if we start by looking at active inference, then we have context for everything else that we’ll study. I’d suggest starting with this review and comparison of active inference vs. reinforcement learning, by Noor Sajid et al.

  • Sajid, N., Philip J. Ball, Thomas Parr, and Karl J. Friston. 2020. “Active Inference: Demystified and Compared.” arXiv:1909.10863v3 [cs.AI] 30 Oct 2020. (Accessed 17 June 2022; https://arxiv.org/abs/1909.10863 )

Abstract:

Active inference is a first principle account of how autonomous agents operate in dynamic, non-stationary environments. This problem is also considered in reinforcement learning, but limited work exists on comparing the two approaches on the same discrete-state environments. In this paper, we provide: 1) an accessible overview of the discrete state formulation of active inference, highlighting natural behaviors in active inference that are generally engineered in reinforcement learning; 2) an explicit discrete-state comparison between active inference and reinforcement learning on an OpenAI gym baseline …

Sajid, N., et al. See reference.

Explanatory (Intuitive) Reads

AJM’s Note: This article, by Scott Alexander, is humorous – but is also is a very good starting point. This is actually very well-written, as an equations-free approach to interpreting the “free energy principle” (or FEP, as it has come to be called), and from there, active inference. (Also available on the LessWrong blogpost.)

  • Alexander, Scott. 2018. “God Help Us, Let’s Try to Understand Friston on Free Energy.” Slate Star Codex (Mar. 4, 2018). (Accessed Nov. 17, 2022; link.)

Alexander mentions Peter Freed’s report in the Research Digest of Neuropsychoanalysis (2010); 12(1). Peter and his paper-of-the-month-club … tried and didn’t succeed. But that was 2010; this is now. There are a number of very fine interpretations that have appeared since then.

AJM’s Note: This article, by an unknown author, is longish, but worth the read:

The author of the prior Headbirths article cites the following, which are also good.

  • Solopchuk, Oleg. 2018a. “Intuitions on Predictive Coding and the Free Energy Principle.” Medium.com (Jun 28, 2018). (Accessed Nov. 17, 2022; link.)
  • Solopchuk, Oleg. 2018b. “Tutorial on Active Inference.” Medium.com (Sept. 14, 2018). (Accessed Nov. 17, 2022; link.)

Solopchuk was inspired by a previous tutorial by Bogacz:

  • Bogacz, Rafal. 2017. “A Tutorial on the Free-Energy Framework for Modelling Perception and Learning.” J. Math. Psychology 76B (February 2017): 98-211. (Accessed Nov. 17, 2022; link.)

Compendium Organization

Following the protocol that we introduced earlier in this series, we are now grouping resources & references according to difficulty level – from Bunny Trails to Blue Squares to Double Black Diamond.

In this Compendium, we’re including ALL of the resources that we’ve identified throughout the two five-part blogpost series; the first one began with the Kullback-Leibler divergence (and had Parts 1.5 and 2.5 in addition to Parts 1, 2, & 3), and the second on Variational Free Energy and Active Inference (Parts 1 – 5). And of course, the kick-off for the whole thing was the upfront admission that I’d made a major blooper in interpreting notation in one of my earlier papers. (Now rectified, thank God!)

Almost all of Friston’s works are double black. Love the man. Think he’s genius. But that “double-black” rating is just what is so.

I’m putting a couple of early blogposts about Friston in at the Bunny Trails level, as well as some bio & auto-bio materials (thank you, Simon!) and a couple of (relatively) easy-to-read articles (thank you, Biswa and Simon!) are in the Blue Squares section.


Bunny Trails – Decent Introductory Source Stuff

AJM’s Note: This article, by Scott Alexander, is humorous – but is also is a very good starting point. This is actually very well-written, as an equations-free approach to interpreting the “free energy principle” (or FEP, as it has come to be called), and from there, active inference. (Also available on the LessWrong blogpost.)

  • Alexander, Scott. 2018. “God Help Us, Let’s Try to Understand Friston on Free Energy.” Slate Star Codex (Mar. 4, 2018). (Accessed Nov. 17, 2022; link.)

Alexander mentions Peter Freed’s report in the Research Digest of Neuropsychoanalysis (2010); 12(1). Peter and his paper-of-the-month-club … tried and didn’t succeed. But that was 2010; this is now.

  • Freed, P. 2010. “Research Digest,” Neuropsychoanalysis, vol. 12(1):103-106, 2010. doi:10.1080/15294145.2010.10773634. (Accessed Nov. 17, 2022; image of first page only.)
CAPTION: The Bunny. We decided (see Kullback-Leibler 2.5/3, or “Black Diamonds,” that we needed trail markings for the resource materials.

Bunny slopes: the introductory materials. Some are web-based blogposts or tutorials – good starter sets. All Themesis blogposts and YouTubes in this category.


Bio and Autobio Articles re/ Friston


AJM’s Note: This is an article that I’ve recommended to my students; it does a pretty good job of introducing Friston to the world-at-large.

  • Raviv, S. 2018. “The Genius Neuroscientist Who Might Hold the Key to True AI,” Wired, vol. 11 (November):127–138. https://www.wired.com/story/karl-friston-free-energyprinciple-artificial-intelligence/.

AJM’s Note: A big shout-out and thank you to Simon Crase, who suggested these two articles in a comment to last week’s post – much appreciated, Simon!

Friston – Semi-Bio:

  • Fridman, Lex, interviewing Karl Friston. (2020). “You Are Your Own Existence Proof (Karl Friston) | AI Podcast Clips with Lex Fridman.” Lex Fridman YouTube Channel series (July 1, 2020). (Accessed Oct. 18, 2022; https://www.youtube.com/watch?v=k8Zomsf3uBI)

Friston – Semi-Auto-Bio:


Bios re/ Two Important “Self-Organization” Scientists – Onsager & Prigogine

AJM’s Note: What follows are some very good biographical sketches of scientists mentioned earlier in this series (and sometimes cited later in the “Black Diamond” section). These are readable and give good insights into their thoughts as they developed their new insights and theories.

Ilya Prigogine

  • Kondepudi, Dilip, Tomio Petrosky, and John A. Pojman. 2017. Dissipative Structures and Irreversibility in Nature: Celebrating 100th Birth Anniversary of Ilya Prigogine (1917–2003).” Chaos: An Interdisciplinary Journal of Nonlinear Science27, 104501. doi: 10.1063/1.5008858. (Accessed Oct. 10, 2022; https://aip.scitation.org/doi/10.1063/1.5008858 )

For a bit about Lars Onsager, read:



Variational Inference (PDF of MS PPTTM Presentation by David Blei)

One of the “big things” that we want to understand is variational inference. David Blei (with Andy Ng and Michael Jordan) has written an excellent tutorial; it is in the Black Diamonds section.

For a more visual overview, check out Blei’s PPT, suggested in Part 1 of the Kullback-Leibler series:


The Notion of “Entropy”

AJM’s Note: I found this tutorial by Jake Tae just a few weeks ago, and have wanted to insert it here … it is a remarkably clean and lucid read. I was searching for accessible tutorials on entropy; this fits the bill – and I like that Jake talks about information entropy as well as regular, “thermodynamic” entropy.

AJM’s Note: Mr. Tae has done an OUTSTANDING job of addressing not only the entropy topic, but a whole lot more — I’m giving the link to his GitHub site below, and will be reading a good deal more of his blogposts in the near future.

Naturally, I also like my own blogpost on entropy:

The Notion of “Surprise”

AJM’s Note: For someone who is not an information-theory person, this is an excellent and lucid two-part tutorial on the notion of “surprise.”

  • Bernstein, Matthew N. 2020a. “What Is Information? (Foundations of Information Theory: Part 1)” Matthew Bernstein GitHub Blog Series (June 13, 2020). (Accessed Oct. 11, 2022; https://mbernste.github.io/posts/self_info/ )
  • Bernstein, Matthew N. 2020b. “Information Entropy (Foundations of Information Theory: Part 2)” Matthew Bernstein GitHub Blog Series (August 07, 2020). (Accessed Oct. 11, 2022; https://mbernste.github.io/posts/entropy/ )

AJM’s Note: I like this article for its attempt to explain Friston’s notion of “surprise” in simple terms.



Related Themesis Blogposts

The Variational Free Energy and Active Inference Series:

In the third post in this series, we connect Friston’s notation to that of Beal (2003):

In the second post in this series, we identified that Friston’s (and previously, Beal’s) use of the P and Q notation in the Kullback-Leibler divergence was reversed from how most authors use those variables:

In the first post in this series, we presented our overarching goals (re-presented in this post as Figure 2), and provided interpretations for two key terms, ergodicity and surprise.

The Kullback-Leibler/Free Energy/Variational Inference Series; just the kick-off post for the entire thing:

The Major Blooper – Coffee Reward Post:

Prior (much older) Blogpost on the Kullback-Leibler Divergence:

Older post on entropy:

There is an entire series of posts discussing entropy in neural networks; this one is useful because it focuses on entropy, which is a key element in the free energy metaphor used in variational inference.

Older posts on Friston:

  • Maren, Alianna J. 2019. “Interpreting Karl Friston: Round Deux.” Alianna J. Maren blogpost series; www.aliannajmaren.com (July 31, 2019). (Accessed Oct. 10, 2022; https://www.aliannajmaren.com/2019/07/31/interpreting-karl-friston-round-deux/ )
  • Maren, Alianna J. 2019. “How to Read Karl Friston (In the Original Greek).” Alianna J. Maren blogpost series; www.aliannajmaren.com (July 27, 2017). (Accessed Oct. 10, 2022; http://www.aliannajmaren.com/2017/07/27/how-to-read-karl-friston-in-the-original-greek/ )

The following (2016) blogpost is useful mostly because it has some links to good tutorial references:



CAPTION: Intermediate Reading/Viewing: Requires preliminary knowledge of both concepts and notation. Not trivially easy, but accessible – often advanced tutorials.

Alianna Maren

At the risk of being excessively self-promotional, I’m putting my own tutorial first. The reason is that if you’re going to read Friston (2013) or Friston et al. (2015), you’ll likely need a bit of an exposition to take you from Friston’s rather terse equations to a more in-depth understanding.

The best way to understand Friston’s take on variational free energy is to go back to Matthew Beal’s dissertation. (See the next sub-section.)

And the best way to understand Friston and Beal together is to read my own work, in which I cross-correlate their notation. (This is best done while reading the five-part blogpost sequence on variational free energy; see the Blogposts section previously in Bunny Trails.)

This arXiv paper is one that I wrote, mostly for myself, to do the notational cross-correspondences.

  • Maren, Alianna J. 2022. “Derivation of the Variational Bayes Equations.” Themesis Technical Report TR-2019-01v5 (ajm). arXiv:1906.08804v5 [cs.NE] (4 Nov 2022). (Accessed Nov. 17, 2022; pdf.)

This arXiv paper (Themesis Technical Report) is the source for the famous two-way deconstruction of the Kullback-Leibler divergence for a system with a Markov blanket, which is central to Friston’s work.

Figure 1. Friston’s Eqn. 2.7 from “Life as We Know It” models a system which contains both external states (Psi) and internal states (r), separated by a Markov boundary (s and a). The Kullback-Leibler divergence presented at the top can be deconstructed in two different ways. Each of the results is a valid equation; one is more computationally useful than the other. For simplicity, the modeling parameter m is not shown in the model probability distribution p(Psi, s, a, r). Figure is extracted from Maren (2022), and was discussed in the previous blogpost (Part 5), and introduced in the blogpost prior to that one (Part 4).

Matthew Beal and David Blei

AJM’s Note: Karl Friston’s 2013 “Life as We Know It” referenced Beal’s 2003 dissertation. Friston’s notation is largely based on Beal’s. Friston introduces the notion of Markov blankets as a key theme in discussing how life (or biological self-organization) necessarily emerges from a “random dynamical system that possesses a Markov blanket.” Beal’s Section 1 discusses both Bayesian probabilities as well as Markov blankets. Reading Beal’s work is a very useful prerequisite for getting into anything by Friston. It helps that Beal does his best to present material in a tutorial style. We’ll start with Markov blankets in the next post.

  • Beal, M. 2003. Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London. (Accessed Oct. 13, 2022; pdf.)

AJM’s Note: I refer to the Blei et al. tutorial because it is very solid and lucid. If we’re trying to understand variational ANYTHING (variational Bayes, variational inference, variational autoencoders, etc.); Blei et al. make some comments and offer a perspective that is very complementary to that given by Beal in his 2003 dissertation.

  • Blei, D.M., A. Kucukelbir, and J.D. McAuliffe. 2016. “Variational Inference: A Review for Statisticians.” arXiv:1601.00670v9 doi:10.48550/1601.00670 (Accessed June 28, 2022; pdf. )

Karl Friston & Colleagues

AJM’s Note: ANYTHING written by Friston is “double black diamond.” That said, a few papers are a BIT more accessible than others.

AJM’s Note: Friston’s 2010 paper is largely conceptual. (Meaning, equations-free.) Not to be blown off; he’s establishing the context for future works.

  • Friston, K. 2010. “The Free-Energy Principle: A Unified Brain Theory?” Nature Reviews Neuroscience11 (2), 127-138. (Accessed Oct. 13, 2022; online access.)

AJM’s Note: Thanks and a shout-out to Biswa Sengupta, who reminded me of this excellent (and fairly readable) 2016 paper.

  • Sengupta, Biswa, Arturo Tozzi, Gerald K. Cooray, Pamela K. Douglas, and Karl J. Friston. 2016. “Towards a Neuronal Gauge Theory.” PLoS Biol 14(3): e1002400. doi:10.1371/journal.pbio.1002400. (Accessed Oct. 18, 2022; pdf)

AJM’s Note: Thanks and a shout-out to Simon Crase, who pointed out this 2022 article by Friston et al. that promises to be a much more accessible read – Simon’s recommendation came in just as I was prepping this post, so I haven’t read this yet, much less had a chance to integrate references into this post – look for the integration and refs starting in the next post in this series.

  • Friston, Karl, Lancelot Da Costa, Noor Sajid, Conor Heins, Kai Ueltzhöffer, Grigorios A. Pavliotis, and Thomas Parr. 2022. “The Free Energy Principle Made Simpler but Not Too Simple.” arXiv:2201.06387v2 [cond-mat.stat-mech] (Jan. 28, 2022) doi:arXiv.2201.06387. (Accessed Oct. 19, 2022; pdf)

Kullback & Leibler – Orig. Work

AJM’s Note: Kullback and Leibler. Their original paper. The one that started all of this.


Neural Networks and Statistical Mechanics

AJM’s Note: Transformers. The hottest thing in AI/ML. And yet ANOTHER thing that totally rests on statistical mechanics. I found the blogposts by Mattias Bal are excellent! So … transformers were NOT a part of the original blogpost series topics (K-L, free energy, variational Bayes), but they fit in nicely with their dependency on stat mech. Also, I want to keep track of the links.

AJM’s Note: In their abstract, Bahri et al. refer to more topics than most of us can absorb. Still, their review provides a cogent and useful summary of how deep learning (and its foundational element, the restricted Boltzmann machine) all rest solidly on a statistical mechanics foundation. Need further confirmation? All the authors are gainfully employed – at Google (Google Brain), or at Stanford U., or a combination thereof. And yes, being published in a “condensed matter” physics journal, it is indeed something of a “heavy” read. (OK, couldn’t resist the pun.) Have a look, in your capacious spare time.

Possible follow-ups:



CAPTION: Double Black Diamond: Expert-only! These books, tutorials, blogposts, and vids are best read and watched AFTER you’ve spent a solid time mastering fundamentals. Otherwise, a good way to not only feel lost, but hugely insecure.

Friston & Co.

AJM’s Note: This Friston (2005) paper is his most-cited paper for the his personal genesis of active inference, and seems to be the earliest where he presents a fully-fleshed notion of how “both inference and learning rest on minimizing the brain’s free energy, as defined in statistical physics.” He refers also to a Hinton et al. (1995) paper, but several papers published between 2004 – 2006 establish the genesis timeframe for Bayesian interpretations of perception.

AJM’s Note: This paper by Knill & Pouget (2004) was published just prior to Friston’s 2005 paper; both dealing with Bayesian modeling of brain processes. Friston cites this in his 2012 works.

AJM’s Notes: These two Friston papers are useful and important predecessors to Friston (2013). These two, in turn, also cite useful and important predecessor works – by both Friston and colleagues as well as others. (See above.) It’s still TBD as to how deep we need to go in reading back into the earliest works, in order to understand the ones addressed in this (blogpost) course of study.

Active Inference: perhaps the most accessible presentation, by Noor Sajid & colleagues (first recommended in the Themesis June, 2022 blogpost Major Blooper – Coffee Reward):

  • Sajid, N., Philip J. Ball, Thomas Parr, and Karl J. Friston. 2020. “Active Inference: Demystified and Compared.” arXiv:1909.10863v3 [cs.AI] 30 Oct 2020. (Accessed 17 June 2022; https://arxiv.org/abs/1909.10863 )

AJM’s Note: Friston’s 2013 paper is the central point for theoretical (and mathematical) development of his notions on free energy in the brain, and in any living system. He starts with the notion of a system separated by Markov boundary from its external environment. Moves on from there. This blogpost series is largely focused on this paper, buttressed with Friston et al. (2015).

  • Friston, Karl. 2013. “Life as We Know It.” Journal of The Royal Society Interface. 10. doi:10.1098/rsif.2013.0475. (Accessed Oct. 13, 2022; pdf.)

AJM’s Note: Friston and colleagues, in their 2015 paper “Knowing One’s Place,” show how self-assembly (or self-organization) can arise out of variational free energy minimization. Very interesting read!

  • Friston, K.; Levin, M.; Sengupta, B.; Pezzulo, G. 2015. “Knowing One’s Place: A Free-Energy Approach to Pattern Regulation.” J. R. Soc. Interface12:20141383. doi:10.1098/rsif.2014.1383. (Accessed Oct. 3, 2022; pdf.)

Syllabi and Compendia

Jarad Tumiel has compiled an excellent list; the only difficulty here is that many of his suggestions are definitely at the Black Diamond level, and I think we need to sort things out in order-of-difficulty.

That said, if you’re wanting to cross-check what you’ve read so far against someone else’s suggestions, check out Tumiel’s Syllabus, first recommended in Part 1.5 of the Kullback-Leibler series.

AJM’s Note: This lovely little article by Damian Ejlli is a perfect example of “double black diamond.” It is a perfect read – if you ALREADY know statistical mechanics … and quantum physics … and Bayesian methods.


Books

At some time, we’re each going to want to establish our own little collection. Here are some top-rated reads:

  • Feynman, R.P. 1972, 1998. Statistical Mechanics: A Set of Lectures. Reading, MA: Addison-Wesley; Amazon book listing.
  • Jaynes, E.C. 2003. Probability Theory: The Logic of Science. (Cambridge, UK. Cambridge University Press; Annotated edition (June 9, 2003). Amazon book listing. (Honesty note: I don’t have this in my collection yet; it’s available from Amazon for between $30 – $100++, approx.; will be able to provide a better review once I have my own copy.)
  • Sethna, James. 2006. Statistical Mechanics: Entropy, Order Parameters, and Complexity. Oxford, England: Oxford University Press. (Accessed Sept. 7, 2022; https://sethna.lassp.cornell.edu/StatMech/EntropyOrderParametersComplexity20.pdf )

A classic, mostly here as a reference:

  • Nicolis, Gregoire, and Ilya Prigogine. (1977). Self-Organization in Nonequilibrium Systems: From Dissipative Structures to Order through Fluctuations. (New York: Wiley)

Also, while this is highly self-promotional, I have a book-in-progress on statistical mechanics, designed to be the gentlest-possible “on-ramp” for those who need to get a little stat mech in order to read the important works in AI and machine learning.

  • Maren, Alianna J. (In progress.) Statistical Mechanics, Neural Networks, and Artificial Intelligence. (Various chapter drafts and supporting materials available at: http://www.aliannajmaren.com/book/.)


… And Something to Go with This Week’s Theme …

Something meditative.

Share via
Copy link
Powered by Social Snap