The Kullback-Leibler Divergence, Free Energy, and All Things Variational – Part 2 of 3

Free energy is the universal solvent of AI (artificial intelligence). It is the single underlying rule or principle that makes AI possible.

Actually, that’s a simplification.

There are THREE key things that underlie AI – whether we’re talking deep learning or variational methods. These are:

  • Free energy – which we’ll discuss in this post,
  • Latent variables – we’ll have some links to prior discussions at the end of this post (see the YouTubes), and return to this topic later, and
  • Bayesian probabilities – which is what we use to express how our “observable” variables both depend on and give rise to our “latent” variables.

All of these are important.

But what we really need to understand is that all three of these concepts or notions show up in the two major forms of AI that involve low-level “statistical” or “signal-level” data. (This is as opposed to symbolic AI, which is the stuff of knowledge graphs, etc.)

So, all three show up in BOTH deep learning AND in variational methods. However, they show up DIFFERENTLY in each.

If we can get this into our heads, we’ve made substantial progress in understanding both of these AI realms.


A Quick Free Energy Review

CAPTION: Figure 1: Free energy is the linear combination of two terms; enthalpy and (the negative of the) entropy.

The free energy equation is dimorphic – it can be expressed two different ways. For our purposes, today, we’ll use this simple linear combination equation. This is shown in Figure 1, above.

The important thing about free energy is that, when we use it (as we are doing) as a model in neural networks or variational methods, then the thing that we can adjust or play with is the enthalpy – that is, the energy associated with the various elements.

The enthalpy term is the only term that has adjustable parameters in it. This is true whether we are using the free energy in its pure statistical mechanics role, or as a inspirational model in artificial intelligence.


Free Energy in Energy-Based Neural Networks

In any of the energy-based neural networks (Hopfield, Boltzmann machine, deep learning architectures), the thing that we can tune / tweak / play with to our hearts content is the set of connection weights. That’s a set of tunable parameters.

The entropy, on the other hand, is a completely parameter-free term. There is NOTHING that we can tweak – once we set our training in motion.

Thus, the way in which we set up our entropy is through our selection of the training/testing data.

More on this another day – this is just to give a bit of context for how free energy is actually used in at least one of the AI applications.


Free Energy in Variational Methods

The way that free energy shows up in variational methods is as more of a mathematical expression, introduced as a way to “decouple” the Kullback-Leibler divergence into two terms, followed by a little mathematical rearrangement. (And pulling on a little theorem here or there.)

The end result is something that looks like a free energy equation, but is really using the notion of free energy more as poetic inspiration.

CAPTION: Maren, Alianna J. 2021. “The AI Salon: Statistical Mechanics as a Metaphor.” Themesis YouTube Channel (Sept. 2, 2021). (Accessed Sept. 11, 2022; https://www.youtube.com/watch?v=D0soPGtBbRg&t=1s )

The Ising Equation

The Ising equation is the most well-known and simplest form of the free energy equation, from a statistical mechanics perspective. The Ising equation is the basis for approximately 50 years of energy-based neural networks.

That’s right.

This fundamental equation, for nearly fifty years, underlying the Little-Hopfield neural network, the Boltzmann machine (whether original or restricted), and all forms of “deep” architecture, is the same.

There have been advances in training methods, neural network layering, and implementation.

However, the underlying equation has not changed.

Not in the past fifty years.

Wondering if perhaps the AI realm has stalled out?

Check out this Themesis vid.

Caption: Maren, Alianna J. 2021. “Statistical Physics Underlying Energy-Based Neural Networks.” Themesis, Inc. YouTube channel. (August 26, 2021) (Accessed Sept. 15, 2022; https://www.youtube.com/watch?v=ZazyMS-IDg8&t=324s )

To your health, well-being, and outstanding success!

Alianna J. Maren, Ph.D.

Founder and Chief Scientist, Themesis, Inc.



Weekly Recommended Read

Recommended to me just an hour ago by my fellow-faculty colleague in Northwestern’s Master of Science in Data Science Program, Dr. Syamala Srinivasan (https://www.linkedin.com/in/syamala-srinivasan-613a2514/):

Harari, Yuval Noah. 2018. Sapiens. New York: Harper Perennial. (Accessed online Sept. 15, 2022; https://www.amazon.com/Sapiens-Humankind-Yuval-Noah-Harari/dp/0062316117/ref=sr_1_1?crid=1OYTOVS70WRBS&keywords=sapiens&qid=1663269214&s=books&sprefix=sapiens+%2Cstripbooks%2C179&sr=1-1 )

Humans think in stories, and we try to make sense of the world by telling stories.

Yuval Harari, 21 Lessons for the 21st Century

Have fun, darlings! – AJM



References

NOTE: The references, blogposts, and YouTubes included here will be useful across all blogposts in this Kullback-Leibler / free energy / variational inference series. These references are largely replicated across all the blogposts in the series. (A little more added each week.)

Beal, M. 2003. Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London. pdf.

Blei, D.M. Variational Inference: Foundations and Applications. (Presented May 1, 2017, at the Simons Institute.) http://www.cs.columbia.edu/~blei/talks/Blei_VI_tutorial.pdf

Blei, D.M., A. Kucukelbir, and J.D. McAuliffe. 2016. “Variational Inference: A Review for Statisticians.” arXiv:1601.00670v9 doi:10.48550/1601.00670 (Accessed June 28, 2022; pdf. )

Blei, D.M., Andrew Ng, and Michael Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3: 993-1022. (Accessed June 28, 2022;https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf )

Friston, K.; Levin, M.; Sengupta, B.; Pezzulo, G. 2015. “Knowing one’s place: a free-energy approach to pattern regulation.” J. R. Soc. Interface12, 20141383. doi:10.1098/rsif.2014.1383. pdf.

Friston, K. “Life as we know it.” 2013. Journal of The Royal Society Interface. 10. doi:10.1098/rsif.2013.0475 pdf.

Friston, K. 2010. “The free-energy principle: a unified brain theory?” Nature Reviews Neuroscience11 (2), 127-138. online access.

Kullback S, and R.A. Leibler. 1951. “On Information and Sufficiency.” Ann. Math. Statist. 22(1):79-86. (Accessed June 28, 2022; https://www.researchgate.net/publication/2820405_On_Information_and_Sufficiency/link/00b7d5391f7bb63d30000000/download )

Maren, Alianna J. 2019. “Derivation of the Variational Bayes Equations”  arXiv:1906.08804v4 [cs.NE] (Themesis Technical Report TR-2019-01v4 (ajm).) doi:10.48550/arXiv.1906.08804. (Accessed 2022 Sept. 6; https://arxiv.org/abs/1906.08804 )


Related Blogposts

Maren, Alianna J. 2022. “The Kullback-Leibler Divergence, Free Energy, and All Things Variational (Part 1.5 of 3).” Themesis, Inc. Blogpost Series (www.themesis.com). (Sept. 8, 2022) (Accessed Sept. 15, 2022; https://themesis.com/2022/09/08/the-kullback-leibler-divergence-free-energy-and-all-things-variational-part-1-5-of-3/ )

Maren, Alianna J. 2022. “The Kullback-Leibler Divergence, Free Energy, and All Things Variational (Part 1 of 3).” Themesis, Inc. Blogpost Series (www.themesis.com). (June 28, 2022) (Accessed Sept. 6, 2022; https://themesis.com/2022/06/28/the-kullback-leibler-divergence-free-energy-and-all-things-variational-part-1-of-3/ )

Maren, Alianna J. 2022. “Major Blooper – Coffee Reward to First Three Finders.” Themesis, Inc. Blogpost Series (www.themesis.com). (June 2, 2022) (Accessed Sept. 6, 2022; https://themesis.com/2022/06/02/major-blooper-coffee-reward/ )

Maren, Alianna J. 2022. “How Backpropagation and (Restricted) Boltzmann Machine Learning Combine in Deep Architectures.” Themesis, Inc. Blogpost Series (www.themesis.com). (January 5, 2022) (Accessed June 28, 2022; https://themesis.com/2022/01/05/how-backpropagation-and-restricted-boltzmann-machine-learning-combine-in-deep-architectures/ )

Maren, Alianna J. 2022. “Entropy in Energy-Based Neural Networks.” Themesis, Inc. Blogpost Series (www.themesis.com). (April 4, 2022) (Accessed Aug. 30, 2022; https://themesis.com/2022/04/04/entropy-in-energy-based-neural-networks-seven-key-papers-part-3-of-3/ )

Maren, Alianna J. 2014. “The Single Most Important Equation for Brain-Computer Information Interfaces.” Alianna J. Maren Blogpost Series (www.aliannajmaren.com). (November 28, 2014) (Accessed Aug. 30, 2022; https://www.aliannajmaren.com/2014/11/28/the-single-most-important-equation-for-brain-computer-information-interfaces/ )


Related YouTubes

Maren, Alianna J. 2021. “Statistical Physics Underlying Energy-Based Neural Networks.” Themesis, Inc. YouTube channel. (August 26, 2021) (Accessed Sept. 15, 2022; https://www.youtube.com/watch?v=ZazyMS-IDg8&t=324s )

Maren, Alianna J. 2021. “The AI Salon: Statistical Mechanics as a Metaphor.” Themesis, Inc. YouTube channel.. (Sept. 2, 2021) (Accessed Aug. 30, 2022; https://www.youtube.com/watch?v=D0soPGtBbRg)

Maren, Alianna J. 2021. “Statistical Mechanics of Neural Networks: The Donner Pass of AI.” Themesis, Inc. YouTube channel.. (Sept. 15, 2021) (Accessed Aug. 30, 2022; https://www.youtube.com/watch?v=DjKiU3qRr1I)


Books, Syllabi, and Other Resources

Feynman, R.P. 1972, 1998. Statistical Mechanics: A Set of Lectures. Reading, MA: Addison-Wesley; Amazon book listing.

Sethna, James. 2006. Statistical Mechanics: Entropy, Order Parameters, and Complexity. Oxford, England: Oxford University Press. (Accessed Sept. 7, 2022; https://sethna.lassp.cornell.edu/StatMech/EntropyOrderParametersComplexity20.pdf )

I just found (for last week’s post) this fabulous little syllabus put together by Jared Tumiel; it’s spot-on, very well-organized, and has enough in it to keep most of us busy for the next several years. It’s hosted on his GitHub site.

Tumiel, Jarad. 2020. “Spinning Up in Active Inference and the Free Energy Principle: A Syllabus for the Curious.” Jared Tumiel’s GitHub Repository. (Oct. 14, 2020.) https://jaredtumiel.github.io/blog/2020/10/14/spinning-up-in-ai.html

I just found this lovely little article by Damian Ejlli. It is a perfect read – if you ALREADY know statistical mechanics … and quantum physics … and Bayesian methods. (So, that rules out … a HUGE number of potential readers.) Other than that, perfectly useful.

Ejlli, Damian. 2021. “Three Statistical Physics Concepts and Methods Used in Machine Learning.” Towards Data Science (Oct. 18, 2021). (Accessed Sept. 11, 2022; https://towardsdatascience.com/three-statistical-physics-concepts-and-methods-used-in-machine-learning-f9cc9f732c4 )

4 comments

  1. Dr. A J, thanks for the links, especially Jarad Tumiel’s syllabus.
    I was able to use the Free Energy Principle to interpret something yesterday. I had to release a wasp that had strayed into our bedroom. The poor thing had an inadequate model q(…), which included the rule: if your sensory signals show you something (the outside), then your action of flying will allow you to get there. There is no concept of glass in its q(…)…

    1. Hey, Simon – thanks for chiming in, and so good to hear from you again!
      So, first – kudos on “releasing” the wasp instead of killing it. What we do in the microcosm … I’m not saying that it will end the war in Ukraine, or save the many, many lives that are being impacted by a hurricane in Japan, in Southeast Asia, or in Alaska (and that’s just the Pacific, and this week’s hurricane-hits) … but every act of kindness counts.
      And so glad you’ve found Jared’s “syllabus” useful! (I’m going to have some more comments on it soon.)
      So here’s the thing. Poor wasp. Poor model “P” (because NOTICE – you and I are both sort of caught in that notational trap; Friston & Co. use “P” for model, and so did Beal, and so did Blei et al. – and the rest of the world uses “Q” for model) and BOY did I screw up when I wrote my prior-to-most-recent arXiv paper (which I still need to go fix …)
      So the wasp had a model “P” that didn’t encompass the separation. It was stuck. It would never, NEVER have updated its model. You had to rescue the poor thing.
      This is where I think it is useful for all of us to start exercising the notion that Friston advocates, of making that distinction between “reality” (“Psi”), the representation of reality (“Q”), and then the model of the representation (“P”).
      It’s not that I can say this with certainty, but … if we start exercising our minds to think in terms of this framework, we may start giving more attention to the way in which we create our reality-representation.
      I had to be very careful, in my most recent work ( the latest on a variational approach to getting parameters for a 2-D CVM -https://arxiv.org/abs/2209.04087 ) – that I was pulling a “representation” that hit certain criteria.
      And that’s food for thought, as we start playing with how our sensing and acting interact with our representation-making.
      Could go on, but it’s best if I just close and get some sleep!
      Good to talk w/ you, always! – AJM

Comments are closed.

Share via
Copy link
Powered by Social Snap