New Neural Network Class: Framework: The Future of AI (Part 2 of 3)

We want to identify how and where the next “big breakthrough” will occur in AI. We use three tools or approaches to identify where this next big breakthrough will occur:

  • Phylogenetic Etymology: This is the “what-led-to-what” storyline of neural network evolution; in this blogpost, we pay particular attention to the evolution of energy-based neural networks,
  • Problem Statements: We get a sense of what a given neural network can do by carefully reading and assessing the problem statements in the key originating publications, and
  • The Logical Topology of Neural Networks: As we update this original (1991) architecture-based classification of different kinds of neural networks, we gain a general perspective of not only their architectures, but also their key operating equations; this gives us an overview of the diverse neural networks classes – and how a new class might fit in.

The Quick Overview

Get the quick overview with this YouTube #short:

Get the overview in under a minute: Maren, Alianna J. 2023. “Logical Topology: Six Classes of Neural Networks #short.” Themesis YouTube Channel (March 23, 2023). https://www.youtube.com/shorts/UZye842ELS0

The Full YouTube


Maren, Alianna J. 2023. “A New Neural Network Class: Creating the Framework.” Themesis, Inc. YouTube Channel (May 16, 2023). (Accessed May 16, 2023; available at https://www.youtube.com/watch?v=KHuUb627POs.)


“Logical Topologies” – A Framework for a New Neural Networks Class

Many years ago, when neural networks were still young – and dinosaurs roamed the earth – the big challenge for neural networks practitioners was to make sense of the huge variety of neural networks that were popping up – seemingly from all over.

I was the first to create an over-arching organizational structure for all these neural networks.

Back then, I called it a “logical topology for neural networks.”

There were six major kinds of neural networks back then – organized according to their inherent architectures.

Here’s that original logical topology:

Figure 1. The original “logical topology of neural networks,” from Maren, Alianna J. 1991. “A Logical Topology of Neural Networks.” Proceedings of the Second Workshop in Neural Networks – Academia, Industry, NASA, & Defense (WNN-AIND 91) (Auburn, GA; Feb. 14-16, 1991). doi:10.13140/RG.2.1.3503.2806. (Accessed Feb. 16, 2023; available online from ResearchGate at: https://www.researchgate.net/publication/281121286_A_Logical_Topology_of_Neural_Networks, and from the Alianna J. Maren website at: http://www.aliannajmaren.com/Downloads/Logical-topology-neural-networks.pdf.)

We’re jumping ahead in our story-arc here; we were going to introduce a little follow-on to Part 1 (Kuhnian Normal and Breakthrough Moments), and we were going to do an extended Part 2 series on “Problem Statements” – identifying just what the problems were that breakthrough inventors addressed …

BUT … we can create more structure and context if we go back in time, and examine:

  • How the earliest important neural networks related to each other,
  • How they evolved, and
  • The key elements defining the breakthroughs.

Previously in Our Story …

Our first step was to identify Kuhnian “paradigm-breaking” inventions that provided the crucial underpinnings for two major kinds of neural networks.

This YouTube video presented these two major breakthroughs.

https://www.youtube.com/watch?v=Tpe5m-mSheQ&t=178s
Maren, Alianna J. 2023. “The Future of AI (Part 1 of 3): Kuhnian Normal and Breakthrough Moments.” Themesis, Inc. YouTube Channel (Feb. 1, 2023). (Available at: https://www.youtube.com/watch?v=Tpe5m-mSheQ&t=179s. )

These two big breakthroughs were:

  • Backpropagation – invented by Paul Werbos in 1974, not widely recognized until the first International Neural Network Conference in 1987, and yielding the classic Multilayer Perceptron (MLPs).
  • Energy-based neural networks – invented by William Little in 1974, expanded and espoused by John Hopfield in 1982 (yielding the Little-Hopfield network), and then made effective by David Ackley, Terrence Sejnowski, and Geoffrey Hinton in 1983 (the Boltzmann machine), with a refined structure (following a suggestion by Smolensky) resulting in the restricted Boltzmann machine in 1987.

Latent Variables – Key to Breakthroughs

The key insight that made these two breakthroughs possible was the introduction of latent variables – along with a training method that allowed these latent variables to learn features characterizing the input/output pattern pairs (for MLPs), or recollection of the entire original pattern (Boltzmann machines, whether simple or restricted).

It’s not that the scientists didn’t recognize the value of latent variables – the key breakthroughs were in finding workable training methods.

We discuss these methods, in the context of reviewing each of the major neural network architectures, as shown in Figure 1, “The original ‘logical topology of neural networks.'”

The first such was backpropagation.

The second, due to Ackley, Sejnowski, and Hinton, was simulated annealing. That algorithm – for training (restricted) Boltzmann machines – has been succeeded by contrastive divergence, which Hinton introduced in 2002.

All the steps since then have used combinations of backpropagation (or some similar stochastic gradient descent training) together with contrastive divergence.


Class (a): Multilayer Perceptrons (MLPs)

The key point here is not just that Multilayer Perceptrons, or MLPs, are a certain kind of (multilayered) neural network architecture, or that they are trained with some kind of stochastic gradient descent method.

Rather, the important point is that when Paul Werbos figured out how to incorporate latent variables into (pre-existing ADALINE) neural network, he made it possible for the ADALINE/MADALINE to grow into the Multilayer Perceptron.

Figure 2. Key take-aways for understanding Multilayer Perceptron (MLP) neural networks. (Slide taken from Maren, Alianna J. 2023. “The Future of AI (Part 3.2 of 3): The Pre-Matrix.” (Accessed April 4, 2023; available at https://www.youtube.com/watch?v=SjXHIvD9jtY.)

The progenitor of the Multilayer Perceptron was the ADALINE (Adaptive Linear Neuron) / MADALINE (Many ADALINEs) architecture, developed by Bernard Widrow and Ted Hoff at Stanford University in 1959.

Actually, that is a bit of a mis-statement, because the simple Perceptron was the architectural progenitor for the Multilayer Perceptron. However, the learning algorithm for the ADALINE (MADALINE) was based on minimizing the least squared error across the output(s) – which sounds to us suspiciously like the idea behind backpropagation. (When we step back and look at this, Werbos’s innovation is less bolt-of-lighting and more the logical evolution or “next sensible thing.”)

For a very useful (and fast-read) tutorial on the ADALINE, check out an article by Pablo Caceres (2020).

The key thing to remember is that the ADALINE (or its many-headed cousin, the MADALINE) could only learn linearly-separable classes; the evolution to learning non-linearly-separable classes (such as the X-OR) problem) was made possible through the use of latent variables, which let the neural network learn the features characterizing different classes.

There is new work going on in this basic class. For example, Hinton (2023) has recently introduced his “forward-forward” algorithm. However, it remains within this neural network class, rather than creating a new category.

Class (b): Single-Layer, Laterally-Connected Networks

If you’ve read some prior Themesis blogposts, or watched the Themesis YouTubes on energy-based neural networks, you’ll recall that the main problem with the (Little-)Hopfield neural network was that it had a very limited memory capacity; essentially, its memory was about 14% of the total number of neurons. So, a ten-neuron network might possibly “recall” one pattern – on a good day, maybe two patterns. Not so good.

The Boltzmann machine solved that problem with the introduction of latent variables. A Boltzmann machine uses latent variables to learn the “features” characteristic to different patterns; it has a storage capacity of about 60% of the total number of neurons (Kojima et al., 1995).

Figure 3. The restricted Boltzmann machine (RBM) is based on the Hopfield neural network, with the introduction of latent variables. (Slide taken from Maren, Alianna J. 2023. “The Future of AI (Part 3.2 of 3): The Pre-Matrix.” Accessed April 4, 2023; available at https://www.youtube.com/watch?v=SjXHIvD9jtY.)

The Little-Hopfield neural network is the foundation for the Boltzmann machine, and the subsequent restricted Boltzmann machine (RBM).

I’ve discussed this neural network class extensively; the best way to get the background is to consult the prior blogposts (listed in the Resources and References below). The most important formal articles are included in the References for this blogpost.


Summing Up

Our real goal is to establish a new class of neural networks; one with a new architecture and a new set of capabilities. This new class would be substantially different from anything else that exists right now.

To do this, we’re assembling a set of “key insights” that characterize breakthroughs across different neural networks classes. This organized “set of insights” will become our “Matrix” – a tabulation of neural network classes vs. the defining features that we can ascribe to each class.

The “Matrix” will come later.

Right now, we’re just starting – and so we’re calling this work the “Pre-Matrix.”

We’ve investigated the two most widely-used classes of neural networks; one that is based on the Multilayer Perceptron (typically trained with some form of stochastic gradient descent), and the other based on laterally-connected architectures (the Hopfield neural network, evolving into the Boltzmann machine and the Restricted Boltzmann machine). This latter class is built on a statistical mechanics model, and is trained by minimizing a free energy function.

The key features, common to both classes of networks, are:

  • Neurons (also called “nodes”), neural connections, and neural activations,
  • “Hidden” nodes or latent variables, and
  • Something gets minimized in the course of training – a summed-squared error, or a free energy function.

In the next blogpost, we’ll investigate a new class of neural networks – the “vector-matching” networks. These were originally devised by Teuvo Kohonen, and evolved to become the mainstays of natural language processing (NLP) methods.


Live free or die,” my friend!*

* “Live free or die” – attrib. to U.S. Revolutionary War general John Starck. https://en.wikipedia.org/wiki/Live_Free_or_Die

Alianna J. Maren, Ph.D.

Founder and Chief Scientist

Themesis, Inc.



Prior Related Blogposts (with References)

Predecessor Blogpost

How Backpropagation Is Used WITH the Boltzmann Machine for Deep Learning

How to Compare MLPs with Boltzmann Machines and RBMs

The three-part blogpost series on “Seven Key Papers” (containing more detailed descriptions of the seven key papers identified here, including YouTube links):



Resources and References

Organized by Neural Network Class


Overview: The Logical Topology of Neural Networks


Class (a): Multilayer Perceptrons

  • Caceres, Pablo. 2020. “The ADALINE – Theory and Implementation of the First Neural Network Trained With Gradient Descent.” Pablo Caceres’ GitHub blogpost series (10 March 2020). (Accessed Feb. 14, 2023; available at The ADALINE.)
  • Hinton, Geoffrey. 2023. “The Forward-Forward Algorithm: Some Preliminary Investigations.” Preprint. (Accessed Jan. 7, 2023; available at https://www.cs.toronto.edu/~hinton/FFA13.pdf.)
  • Schmidhuber, Jurgen. 2022. “Who Invented Backpropagation?” AIBlog@SchmidhuberAI. (Accessed March 28, 2023; available online at https://people.idsia.ch/~juergen/who-invented-backpropagation.html#BP2.)
  • Werbos, Paul J. 1974. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Science. Harvard University. Thesis (Ph. D.). Appl. Math. Harvard University.(Available online at https://www.researchgate.net/publication/279233597_Beyond_Regression_New_Tools_for_Prediction_and_Analysis_in_the_Behavioral_Science_Thesis_Ph_D_Appl_Math_Harvard_University.)
  • Werbos, Paul J. 1982. “Applications of Advances in Nonlinear Sensitivity Analysis.” In R. Drenick, F. Kozin, (eds): System Modeling and Optimization: Proc. IFIP. (Springer). PDF. (Note: Werbos’s first application of backpropagation to neural networks, making more specific thoughts from his 1974 doctoral dissertation).
  • Werbos, Paul J. 1994. The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting. (New York: Wiley.)
  • Widrow, B. 1960. “An Adaptive ‘ADALINE’ Neuron Using Chemical ‘Memistors.'” Stanford University Technical Report No. 1553-2. (Technical Report for the Office of Naval Research; Oct. 17, 1960.) (Accessed March 28, 2023; available online at https://isl.stanford.edu/~widrow/papers/t1960anadaptive.pdf.)
  • Widrow, Bernard, and Michael A. Lehr. 1990. “30 years of Adaptive Neural Networks: Perceptron, Madaline, and Backpropagation.”  Proceedings of the IEEE78 (9): 1415–1442.(Accessed March 28, 2023; available online at https://ieeexplore.ieee.org/document/58323.)

Class (b): Laterally-Connected Networks (Free Energy-Minimizing Networks, Including the Boltzmann Machine and the RBM)

  • Ackley, David H., Hinton, G.E., and Sejnowski, T.J. (1985). “A Learning Algorithm for Boltzmann Machines.” Cognitive Science 9, no. 1 (January–March): 147-169. https://doi.org/10.1016/S0364-0213(85)80012-4.
  • Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. “Generative Adversarial Nets.” Proceedings of the International Conference on Neural Information Processing Systems (NIPS 2014); 2672–2680. (Accessed Jan. 31, 2023; available at https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.)
  • Hinton, Geoffrey E. 2002. “Training Products of Experts by Minimizing Contrastive Divergence.” Neural Computation 14(8) (August, 2002):1771-800. doi: 10.1162/089976602760128018 (Accessed April 3, 2022.) https://www.cs.toronto.edu/~hinton/absps/tr00-004.pdf
  • Hinton, Geoffrey, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury. 2012. “Deep Neural Networks for Acoustic Modeling in Speech Recognition: Four Research Groups Share Their Views.” IEEE Signal Processing Magazine 29 (6) (Nov. 2012); 82-97. doi: 10.1109/MSP.2012.2205597. (Accessed Jan. 31, 2023; available at https://www.cs.toronto.edu/~hinton/absps/DNN-2012-proof.pdf.)
Share via
Copy link
Powered by Social Snap