Kullback-Leibler, Etc. – Part 3 of 3: The Annotated Resources List

I thought it would be (relatively) straightforward to wrap this up. Over the past several posts in this series, we’ve discussed the Kullback-Leibler (K-L) divergence and free energy. In particular, we’ve described free energy as the “universal solvent” for artificial intelligence and machine learning methods.

This next (and last) post in this series was intended to discuss variational methods, as the logical continuation of those prior two fundamentals.

There was a minor glitch.

It turned into a major glitch.

You might recall that, several posts back, I admitted to a MAJOR BLOOPER. It was notation-based, and it was in my arXiv paper on the Derivation of Variational Bayes Equations. (Full Chicago-style reference in the References section, below.)

Essentially, in my tight-focus of cross-correlating the x, y, and z notations used (collectively) by Friston, Beal, and Blei, I’d messed up the BIG notation differences between P and Q – not so much internal to those three authors, but between how they used those terms versus how the rest of the world used them.

Major mess-up, and I’m in the midst of resolving it – and reworking the notation cross-coupling is taking MUCH MORE TIME than anticipated.

(It is always, I swear, ALWAYS, the notation that causes major glitches. Just saying.)

So earlier, I’d thought that for this week’s post, I could simply check my sources, clean up and re-upload that arXiv paper, and write up a final post.

My friends, life is never so easy, is it?


So Instead – a Curated Resource List

Typically, when I “go dark” for a while, I’m deep-immersed in some huge project, and just can’t pull myself out of it … and then weeks, sometimes months, go by.

The last such period was just before starting the Kullback-Leibler blog sequence in earnest. (Although yes, there were a few light posts before that).

I was immersed in getting my own paper done … very much down the rabbit-hole.

New policy – both for Themesis and for myself, personally. I (we) are going to do our very damn best to get something up each week. Even if we just have to be super-honest and say that there’s nothing that we can share – because we’re deep into something very gnarly.

(Sometimes, a moment of honesty is the best thing to share.)

This week is ALMOST of that nature. I’m back to reading Friston again, along with Beal and Blei as back-ups.

The notation is MUCH more obscure than I remember it being – and I remember spending about four-to-five years writing that last Derivations paper; all in an effort to unsnarl the notation across Friston/Beal/Blei.

Apparently, I was not all that successful, the last time.

So in lieu of any attempts to unsnarl things this week, we’re going to take that reference list (which has been getting longer by the week) and repost it with annotations and “double black diamond” warnings.


DO THIS for the Weekly Blogpost Notifications

We’re doing our best to get you a “weekly word.” Usually a blogpost. Sometimes a YouTube. Sometimes something else – a survey, a timely, useful SOMETHING.

Make sure that you know when these come out.

Go to www.themesis.com/themesis.

(You’re on www.themesis.com right now. You could just hit that “About” button and you’ll be there.)

Scroll down.

There’s an Opt-In form.

DO THAT.

CAPTION: Find the Themesis Opt-In form at: http://www.themesis.com/themesis/

Then follow up with the emails – train yourself and your system to OPEN the emails, and CLICK THROUGH.

Best way to get “tail feathers.” (And the birds with the biggest and best tail feathers get the most rewards!)

CAPTION: Rooster, from my back yard back on the Big Island. Photo courtesy A.J. Maren, 2022.

The Plan, Going Forward

There’s only one thing to do, moving forward.

It’s to take the basic Friston paper – I’m going to center on Friston’s 2013 Life as We Know It – and work through it slowly, carefully, and meticulously.

I’m cross-correlating PRIMARILY with Matthew Beal’s 2003 dissertation. See all of these in the References below; Beal has a blue square rating – readable, if you’re careful and you know what you’re doing. Friston is ALWAYS double black diamond. Mostly due to (you guessed it!) NOTATION. And the man is not trying to be deliberately obscure. I truly believe that. It’s just that communicating from the Friston-space to the rest of reality is a venture into the arcane. (OK. That was a vent. I’m over it. For now.)

Oh, and you’ll see me make frequent references to Blei. Specifically, to Blei, Kucukelbir, and McAuliffe (2018); Variational Inference: A Review for Statisticians. They’re not among the Friston citations.

For good reason – Friston wrote Life as We Know It prior to 2015 (its formal publication date). He has well over 100 references – some 130++. But prescient as the man can be, he could not have foreseen (in 2015 or earlier) the excellent tutorial that Blei et al. were to publish some three years later.

So … Beal (2003) because it’s the precursor to a lot of Friston’s thinking and notation. Sort of a fundamental pre-read.

Blei et al. (2018) to round things out.

The combination DOES pose a notational conundrum.

That’s why I wrote the (to-be-revised) Derivation paper.

We’ll get this unscrambled, folks. It just might take a while.

And … while I will number the posts in the NEW series (starting next week), I will NOT number them as “Part X of Y,” because … as you saw from how this three-post series expanded from three posts to five (with posts 1.5 and 2.5 interspersing the main ones), we would undoubtedly start having increasingly small values of epsilon – as we approached some arbitrarily final post.

So we’ll do this slowly and carefully, and just number as we go along. OK?

See you in the next one!

Have fun, darlings! – AJM


P.S. – As you’ll see in the References section, below – we’ve re-organized. Things are now grouped under the “ease of read” labels of “Bunny Trail,” “Blue Square,” and “Double Black Diamond.”

Not that I’ve ever skied. Not in my life. But we need some fairly well-known warning labels.



References

Look for the labels. We’re now organized – from Bunny Trails to Blue Squares to Double Black Diamond.


Bunny Trails – Decent Introductory Source Stuff

CAPTION: The Bunny. We decided (see Kullback-Leibler 2.5/3, or “Black Diamonds,” that we needed trail markings for the resource materials.

Bunny slopes: the introductory materials. Some are web-based blogposts or tutorials – good starter sets. All Themesis blogposts and YouTubes in this category.


AJM’s Note: I found this tutorial by Jake Tae just a few weeks ago, and have wanted to insert it here … it is a remarkably clean and lucid read. I was searching for accessible tutorials on entropy; this fits the bill – and I like that Jake talks about information entropy as well as regular, “thermodynamic” entropy.

AJM’s Follow-On Note #1: Jake describes his entropy post as a “random post on the topic of randomness.” Way too often, people will describe the entropy of a system in terms of “randomness.” Maybe they even speak of the “disorder” of the system. It is much more precise to talk about finding the point at which there is maximal distribution among the available microstates. Just a shift in emphasis, but an important one. And I owe this excellent and fine distinction to another physical chemist, who made the point in an online tutorial – that I haven’t been able to re-find. (I’m still looking, and will update this post should I ever re-discover that useful source.)

AJM’s Follow-On Note #2: When we talk about information entropy, or Shannon entropy – it is very important to mentally “connect the dots.” Most of the information-entropy presentations or tutorials totally ignore the fact that what we’re talking about is the entropy of a system that has come to equilibrium – that’s why the entropy that is reported is (potentially) different from the maximal entropy of that system (the entropy that we would have if there were not an “enthalpy” term pulling the free energy minimum away from the negative entropy minimum). So we need to get back to the notion of free energy, and the equilibrium point – which is the minimum in the free energy – to get a good context.

Read THIS AJM blogpost to get a bit more insight:

AJM’s Third and Final Follow-On Note: Despite my minor twitches, above, Mr. Tae has done an OUTSTANDING job of addressing not only the entropy topic, but a whole lot more — I’m giving the link to his GitHub site below, and will be reading a good deal more of his blogposts in the near future.

AJM’s Note: Blei’s material is typically at the blue-square level or higher. However, this PDF from his lecture is fairly accessible – it has LOTS of visuals, which makes the going much easier than usual.


Related Themesis Blogposts

The Kullback-Leibler/Free Energy/Variational Inference Series:

Prior Blogpost on the Kullback-Leibler Divergence

Prior Blogposts on Entropy and Free Energy in Neural Networks – a Small Selection


Related Themesis YouTubes


CAPTION: Intermediate Reading/Viewing: Requires preliminary knowledge of both concepts and notation. Not trivially easy, but accessible – often advanced tutorials.

AJM’s Note: Karl Friston’s 2013 “Life as We Know It” referenced Beal’s 2003 dissertation. Friston’s notation is largely based on Beal’s. Friston introduces the notion of Markov blankets as a key theme in discussing how life (or biological self-organization) necessarily emerges from a “random dynamical system that possesses a Markov blanket.” Beal’s Section 1 discusses both Bayesian probabilities as well as Markov blankets. Reading Beal’s work is a very useful prerequisite for getting into anything by Friston. It helps that Beal does his best to present material in a tutorial style. We’ll start with Markov blankets in the next post.

  • Beal, M. 2003. Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London. pdf.

AJM’s Note: I refer to the Blei et al. tutorial because it is very solid and lucid. If we’re trying to understand variational ANYTHING (variational Bayes, variational inference, variational autoencoders, etc.); Blei et al. make some comments and offer a perspective that is very complementary to that given by Beal in his 2003 dissertation.

  • Blei, D.M., A. Kucukelbir, and J.D. McAuliffe. 2016. “Variational Inference: A Review for Statisticians.” arXiv:1601.00670v9 doi:10.48550/1601.00670 (Accessed June 28, 2022; pdf. )

AJM’s Note: My arXiv paper – the one that needs to be revised. The one that still has the P and Q notation scrambled. (This will be fixed. Soon.)

AJM’s Note: Kullback and Leibler. Their original paper. The one that started all of this.

AJM’s Note: Transformers. The hottest thing in AI/ML. And yet ANOTHER thing that totally rests on statistical mechanics. I found the blogposts by Mattias Bal are excellent! So … transformers were NOT a part of the original blogpost series topics (K-L, free energy, variational Bayes), but they fit in nicely with their dependency on stat mech. Also, I want to keep track of the links.

AJM’s Note: In their abstract, Bahri et al. refer to more topics than most of us can absorb. Still, their review provides a cogent and useful summary of how deep learning (and its foundational element, the restricted Boltzmann machine) all rest solidly on a statistical mechanics foundation. Need further confirmation? All the authors are gainfully employed – at Google (Google Brain), or at Stanford U., or a combination thereof. And yes, being published in a “condensed matter” physics journal, it is indeed something of a “heavy” read. (OK, couldn’t resist the pun.) Have a look, in your capacious spare time.


Books – Especially Classics / Good-to-Read

AJM’s Introductory Note: Some of you want to do a systematic, strong study of fundamentals. These are books that are frequently-cited, and people actually DO read them.

AJM’s Note re/ Feynmann: Feynmann is noted for his exceptionally lucid presentations. High school students have been known to read his books. (Brilliant high school students, of course – and looking for status points when video games aren’t enough.)

  • Feynman, R.P. 1972, 1998. Statistical Mechanics: A Set of Lectures. Reading, MA: Addison-Wesley; Amazon book listing.

AJM’s Note re/ Sethna: What has this book on my list is a Sethna comment on p. 3 of his book, “Science grows through accretion, but becomes potent through distillation.“ What a great way to express our understanding of science! (See the end of Kullback-Leibler – Part 1.5 of 3 for my original reference to him.)


CAPTION: Double Black Diamond: Expert-only! These books, tutorials, blogposts, and vids are best read and watched AFTER you’ve spent a solid time mastering fundamentals. Otherwise, a good way to not only feel lost, but hugely insecure.

Friston & Co.

AJM’s Note: ANYTHING written by Friston is “double black diamond.” That said, a few papers are a BIT more accessible than others.

AJM’s Note: Friston’s 2010 paper is largely conceptual. (Meaning, equations-free.) Not to be blown off; he’s establishing the context for future works.

  • Friston, K. 2010. “The free-energy principle: a unified brain theory?” Nature Reviews Neuroscience11 (2), 127-138. online access.

AJM’s Note: Friston’s 2013 paper is the starting point for theoretical (and mathematical) development of his notions on free energy in the brain, and in any living system. He starts with the notion of a system separated by Markov boundary from its external environment. Moves on from there. The forthcoming blogpost series will focus on this paper.

  • Friston, K. “Life as we know it.” 2013. Journal of The Royal Society Interface. 10. doi:10.1098/rsif.2013.0475 pdf.

AJM’s Note: Friston and colleagues, in their 2015 paper “Knowing One’s Place,” shows how self-assembly (or self-organization) can arise out of variational free energy minimization. Very interesting read!

  • Friston, K.; Levin, M.; Sengupta, B.; Pezzulo, G. 2015. “Knowing one’s place: a free-energy approach to pattern regulation.” J. R. Soc. Interface12, 20141383. doi:10.1098/rsif.2014.1383. (Accessed Oct. 3, 2022; pdf.)

Syllabi and Other Resources

AJM’s Note: This fabulous little syllabus was put together by Jared Tumiel; it’s spot-on, very well-organized, and has enough in it to keep most of us busy for the next several years. It’s hosted on his GitHub site. It is double-black-diamond rated because it lists numerous papers which are themselves double-black.

AJM’s Note: This lovely little article by Damian Ejlli is a perfect example of “double black diamond.” It is a perfect read – if you ALREADY know statistical mechanics … and quantum physics … and Bayesian methods.



And In Keeping with This Week’s Theme …

Right Back Where We Started From by Maxine Nightingale – a song that couldn’t be more appropriate:

Share via
Copy link
Powered by Social Snap