Latent variables enabled effective energy-based neural networks.
The key problem with the Little/Hopfield neural network was its limited memory capacity. This problem was resolved when Hinton, Ackley, and Sejnowski introduced the notion of latent variables, creating the Boltzmann machine.
Seven key papers define the evolution of energy-based neural networks. Previously, we examined the first two papers, by Little and Hopfield, respectively. (See the Seven Key Papers for Energy-Based Neural Networks and Deep Learning (Part 1 of 3).)
In this post, we examine the third and fourth papers in this set, as shown in Figure 1.
The first of these, by Ackley, Hinton, and Sejnowski (1985), introduces the Boltzmann machine. The second work, by Smolensky (1986), introduced the restricted Boltzmann machine (RBM), which is the core engine for deep learning architectures.
Ackley, Hinton, and Sejnowski (1985): Latent Variables Provided the Key Breakthrough for Energy-Based Neural Networks
The first real breakthrough came when Ackley, Hinton, and Sejnowski (1985) took Hopfield’s basic neural network and introduced one crucial new dimension: the notion of latent variables.
Latent variables were those whose values were not prescribed by the training data. Instead, their job was to learn the inherent features characterizing the different training data sets. That way – using features, or very small patterns within the training data, they could replicate the original (training) pattern, even if the input data was partial or noisy.
The existence of hidden variables, or latent variables, made it possible to store many more patterns within a neural network.
This was a huge breakthrough.
Smolensky (1986): A Crucial Step in Architecture Refinement
Paul Smolensky (1986) suggested one crucial step that made the basic Boltzmann machine much more effective. Essentially, he suggested that the ONLY connections that should exist would be between the nodes that could be presented as part of a training pattern and (each of) the set of hidden nodes, or latent variables. This reduced the number of connections. It also meant that we could envision the structure of this Restricted Boltzmann Machine (RBM) as isomorphic with the Multilayer Perceptron (MLP).
This is shown in the following Figure 3, which shows a Multilayer Perceptron on the left, a Boltzmann machine (with sufficient connections to illustrate – although not all connections are shown) in the center, and a restricted Boltzmann machine on the right. The figures for the MLP and the RBM are essentially the same.
This meant that it was possible to compare MLPs and RBMs on the same kind of task, e.g., classification.
References
David H. Ackley, Hinton, G.E., and Sejnowski, T.J. (1985). “A Learning Algorithm for Boltzmann Machines.” Cognitive Science 9, no. 1 (January–March): 147-169. https://doi.org/10.1016/S0364-0213(85)80012-4.
Paul Smolensky. (1986). “Information Processing in Dynamical Systems: Foundations of Harmony Theory.” In Parallel Distributed Processing, Vol. 1, edited by D.E. Rumelhart and J.L. McClelland, 194-281. Cambridge, MA: MIT Press. (Available online at: https://www.researchgate.net/publication/239571798_Information_processing_in_dynamical_systems_Foundations_of_harmony_theory, accessed Nov. 11, 2021.)
Previous Related Blogposts
Alianna J. Maren. (2021). “Seven Key Papers for Energy-Based Neural Networks and Deep Learning (Part 1 of 3).” Themesis Blogpost Series (November 5). https://themesis.com/2021/11/05/seven-key-papers-part-1-of-3/