Seven Key Papers for Energy-Based Neural Networks and Deep Learning (Part 1 of 3)

Seven key papers provide us with the evolutionary timeline for energy-based neural networks, up through and including deep learning. The timeline for these papers begins with William Little’s 1974 work on the first energy-based neural network, and then John Hopfield’s 1982 expansion on Little’s concepts, up through deep learning architectures as described by Hinton and Salakhutdinov in 2006 and 2012.

Figure 1. Evolutionary timeline for energy-based neural networks, summarized in seven key papers, from 1974 to 2012.

The deep learning systems described by Hinton and Salakhutdinov, beginning in 2006, use multiple layers of the (restricted) Boltzmann machine, with a little backprop for fine-tuning connection weight values. These are the basis for all of the advanced neural network-based AI systems that are in use today, including convolutional neural networks (CNNs) and long short-term memory neural networks (LSTMs).

Have you tried to read any Hinton papers? 

I’ll bet that you have – and have put them down. That was because all papers about energy-based neural networks are really discussions of statistical physics.

If you’ve picked up ANY Hinton paper, or a paper by someone in his camp (LeCun, Bengio, etc.), you’ve probably seen something called an “energy equation.” And you’ve probably picked up a reference to something called the “partition function.” 

Terms such as “energy equation” or “partition function” are an absolute give-away that you’re out of the prairie grass country and solidly in the Sierra Nevada mountains of advanced AI. That is, you’re beyond the realm of simple calculus (which is enough to get you through backpropagation), and into the realm of statistical physics-based AI.

Watch this video for a quick study of how energy-based neural networks have evolved, from the original work by Little (1974) and Hopfield (1982) to the basic Boltzmann machine, and on to the restricted Boltzmann machine and deep learning.

Statistical Physics Underlying Energy-Based Neural Networks

Now, in this vid, I promise you a list of the seven key papers showing you how energy-based neural networks have evolved over time.

Here’s a quick summary of the authors and the timeline for these crucial papers:

  • William A. Little. 1974. “The Existence of Persistent States in the Brain.”
  • John J. Hopfield. 1982. “Neural Networks and Physical Systems with Emergent Collective Computational Abilities.”

(For the full references for the first two papers, see the end of this post. For the full references for all seven papers, see the last of this three-post series.)

Now, let’s have a bit closer look.

In this post, we address the first two papers in this series: those by Little (1974) and Hopfield, building on Little’s work, in 1982.

Figure 2. This post covers the first two papers in this series on seven key papers in energy-based neural networks; the 1974 work by William Little, and the 1982 work by John Hopfield.

W.A. Little (1974) – The Genesis of Energy-Based Neural Networks

The first instance of an energy-based neural network was conceived by William A. Little, and published in the Mathematical Biosciences in 1974.

Naturally, you have to understand statistical physics in order to get started with this article. Key phrases that tell you that this is truly stat-phys paper are:

  • “Ising spin system,” which is another phrase for the Ising model, which I often refer to as the Ising equation,
  • “state of persistent order,” where any kind of persistent state or ordered state is a reference to this kind of model), and
  • “spin system,” where “spin system” is almost ALWAYS a reference to something that can be described with the Ising model.

Partial Abstract:

We show that given certain plausible assumptions the existence of persistent states in a neural network can occur only if a certain transfer matrix has degenerate maximum eigenvalues. The existence of such states of persistent order is directly analogous to the existence of long range order in an Ising spin system; while the transition to the state of persistent order is analogous to the transition to the ordered phase of the spin system. 

The essential summary: Little, and twelve years later, John Hopfield, essentially propose the same system. There is a “system” of nodes, and each node can be in one of two states. (That’s where this notion of “spin systems” comes in – they can have either an “up” or a “down” spin; essentially, they can be in one of two states.

There are connections between the nodes.

It is this idea of nodes that can be “on” or “off” (“up” or “down” spin), and their connections that led these physicists to suggest that such a physics-based system could be used to model brain states, following the notion that neurons were either active or inactive, and that the neurons connected to each other.

The notion of being “on” or “off” related to whether or not a node aligned its “spin” with the direction imposed by an external (e.g., magnetic) field. If a node was in a low-energy state, then it lined up with the magnetic field. If it was in a high-energy state, then its orientation opposed the magnetic field. There’s a lot more that we can say about this analogy, but that would be a paper in its own right – and just now, we’re trying for a fast skim.

The work by Little is not as well-known as that by Hopfield, largely because Hopfield studied the memory capacity of this kind of system – and the actual reporting of those experimental results caught a lot more attention.

For all practical purposes, though, the Little and the Hopfield models are the same, which is why this neural network is most appropriately referred to as the “Little-Hopfield neural network.”

J.J. Hopfield (1982) – The First Functioning Energy-Based Neural Network

John Hopfield expanded on Little’s work.

One of the most important thing that Hopfield did was to characterize the storage properties of the neural network that he described. He found that the total number of patterns that could be stored (e.g., retrieved by inputting a partial pattern or a noisy version of the pattern) was limited to about 30% of the total number of nodes. So, if you created a ten-node Hopfield-style neural network, it could store (maximally) only three distinct patterns before the pattern storage started to get corrupted.

One of the most important things to come out of Hopfield’s work was that he was in communication with other researchers in the West Coast emerging neural network community. Specifically, his highlighting this very basic energy-based neural network was one of the sparks that contributed to the next stage; the evolution of the Boltzmann machine.

Partial Abstract:

Computational properties of use of biological organisms or to the construction of computers can emerge as collective properties of systems having a large number of simple equivalent components (or neurons). The physical meaning of content-addressable memory is described by an appropriate phase space flow of the state of a system. A model of such a system is given, based on aspects of neurobiology but readily adapted to integrated circuits. The collective properties of this model produce a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size. 

The Essential Summary: In order to understand how energy-based neural networks work, it is essential that we understand the Hopfield (Little-Hopfield) neural network.

Figure 3. The Hopfield (Little-Hopfield) neural network, from Chapter 9: The Hopfield Neural Network and the Restricted Boltzmann Machine: Two Energy-Based Neural Networks, in Statistical Mechanics, Neural Networks, and Artificial Intelligence, by A.J. Maren. (In preparation).

One of the best places to find a more thorough explanation of energy-based neural networks is in the book (in preparation); Statistical Mechanics, Neural Networks, and Artificial Intelligence, by A.J. Maren. Draft chapters are available; see Chapter 9: The Hopfield Neural Network and the Restricted Boltzmann Machine: Two Energy-Based Neural Networks for the most pertinent material.

References

William A. Little. 1974. “The Existence of Persistent States in the Brain.” Mathematical Biosciences 19, nos. 1-2 (February): 101-120. https://doi.org/10.1016/0025-5564(74)90031-5. (Accessed Oct. 05, 2022; available online at: http://wexler.free.fr/library/files/little%20(1974)%20the%20existence%20of%20persistent%20states%20in%20the%20brain.pdf) (Online citation source courtesy Simon Crane; see comments below, and a big shout-out and thank you to Simon!)

John J. Hopfield. 1982. “Neural Networks and Physical Systems with Emergent Collective Computational Abilities.” Proc. National Academy of Sciences 79, no. 8 (May): 2554-8. https://doi.org/10.1073/pnas.79.8.2554.

Alianna J. Maren. Statistical Mechanics, Neural Networks, and Artificial Intelligence. (Manuscript in preparation.) Draft chapters are available online: http://www.aliannajmaren.com/book/book-table-of-contents-linked/ .

(Note: All references are given using the Author-Date form of Chicago style. https://www.chicagomanualofstyle.org/tools_citationguide/citation-guide-2.html)

3 comments

Comments are closed.

Share via
Copy link
Powered by Social Snap