Latent Variables Enabled Effective Energy-Based Neural Networks: Seven Key Papers (Part 2 of 3)

Latent variables enabled effective energy-based neural networks.

The key problem with the Little/Hopfield neural network was its limited memory capacity. This problem was resolved when Hinton, Ackley, and Sejnowski introduced the notion of latent variables, creating the Boltzmann machine.

Seven key papers define the evolution of energy-based neural networks. Previously, we examined the first two papers, by Little and Hopfield, respectively. (See the Seven Key Papers for Energy-Based Neural Networks and Deep Learning (Part 1 of 3).)

In this post, we examine the third and fourth papers in this set, as shown in Figure 1.

Latent variables enabled effective energy-based neural networks.
Figure 1: Latent variables enabled effective energy-based neural networks. The third and fourth papers in this set introduced latent variables and an architectural refinement. The third paper in the set of seven key papers on energy-based neural networks introduced the Boltzmann machine (Ackley, Hinton, and Sejnowski,1985). An architectural refinement introduced by Smolensky (1986) yielded the restricted Boltzmann machine.

The first of these, by Ackley, Hinton, and Sejnowski (1985), introduces the Boltzmann machine. The second work, by Smolensky (1986), introduced the restricted Boltzmann machine (RBM), which is the core engine for deep learning architectures.

This blogpost accompanies the YouTube video: Statistical Physics Underlying Energy-Based Neural Networks, published on the Themesis, Inc. YouTube channel on August 26, 2021.

Ackley, Hinton, and Sejnowski (1985): Latent Variables Provided the Key Breakthrough for Energy-Based Neural Networks

The first real breakthrough came when Ackley, Hinton, and Sejnowski (1985) took Hopfield’s basic neural network and introduced one crucial new dimension: the notion of latent variables.

Latent variables were those whose values were not prescribed by the training data. Instead, their job was to learn the inherent features characterizing the different training data sets. That way – using features, or very small patterns within the training data, they could replicate the original (training) pattern, even if the input data was partial or noisy.

Figure 2. Comparing (a) the Hopfield (LIttle-Hopfield) neural network) with the (b and c) the Boltzmann machine – the key difference is the introduction of latent variables, shown in (b) on the right-hand-side of an imaginary dividing wall, and in (c) as a “hidden layer.” Note that in the Boltzmann machine, as in the Hopfield neural network, all nodes connect to each other.

The existence of hidden variables, or latent variables, made it possible to store many more patterns within a neural network.

This was a huge breakthrough.

Smolensky (1986): A Crucial Step in Architecture Refinement

Paul Smolensky (1986) suggested one crucial step that made the basic Boltzmann machine much more effective. Essentially, he suggested that the ONLY connections that should exist would be between the nodes that could be presented as part of a training pattern and (each of) the set of hidden nodes, or latent variables. This reduced the number of connections. It also meant that we could envision the structure of this Restricted Boltzmann Machine (RBM) as isomorphic with the Multilayer Perceptron (MLP).

This is shown in the following Figure 3, which shows a Multilayer Perceptron on the left, a Boltzmann machine (with sufficient connections to illustrate – although not all connections are shown) in the center, and a restricted Boltzmann machine on the right. The figures for the MLP and the RBM are essentially the same.

This meant that it was possible to compare MLPs and RBMs on the same kind of task, e.g., classification.

Figure 3. Comparison of three simple neural network architectures: (a) a Multilayer Perceptron (MLP), with an input layer, a hidden layer, and an output layer. (b) The same conceptual architecture in a Boltzmann machine; the numbers of nodes are the same as in the MLP. The difference is that in a Boltzmann machine, every node connects to every other node. (c) A restricted Boltzmann machine, in which every input and output node connects only to (each of) the nodes in the hidden layer, or to the latent variable nodes.

References

David H. Ackley, Hinton, G.E., and Sejnowski, T.J. (1985). “A Learning Algorithm for Boltzmann Machines.” Cognitive Science 9, no. 1 (January–March): 147-169. https://doi.org/10.1016/S0364-0213(85)80012-4.

Paul Smolensky. (1986). “Information Processing in Dynamical Systems: Foundations of Harmony Theory.” In Parallel Distributed Processing, Vol. 1, edited by D.E. Rumelhart and J.L. McClelland, 194-281. Cambridge, MA: MIT Press. (Available online at: https://www.researchgate.net/publication/239571798_Information_processing_in_dynamical_systems_Foundations_of_harmony_theory, accessed Nov. 11, 2021.)

Previous Related Blogposts

Alianna J. Maren. (2021). “Seven Key Papers for Energy-Based Neural Networks and Deep Learning (Part 1 of 3).” Themesis Blogpost Series (November 5). https://themesis.com/2021/11/05/seven-key-papers-part-1-of-3/

Share via
Copy link
Powered by Social Snap