Kuhnian Normal and Breakthrough Moments: The Future of AI (Part 1 of 3)

Over the past fifty years, there have only been a few
Kuhnian “paradigm shift” moments in neural networks. These include:

1974 – Paul Werbos invents backpropagation
1974 – William Little invents what becomes known as the Little-Hopfield neural network, with John Hopfield expanding understanding of this network in 1982
1985 – Ackley, Hinton, and Sejnowski invent the Boltzmann machine, with the restricted Boltzmann machine (RBM) due to suggestions by Smolensky in 1986.

Thereafter, the major steps in energy-based neural networks have largely enabled combinations of the Boltzmann machine learning algorithm (revised and now known as “contrastive divergence”) work in combination with the backpropagation method (or similar form of gradient descent).

Figure 1. The Kuhnian paradigm shifts, defining key inventions for energy-based neural networks, occurred between 1974 to 1986.

This blogpost accompanies the Themesis, Inc. YouTube, “Kuhnian Normal vs. Breakthrough Moments: The Future of AI.”

What Happened During these Early “Paradigm Shifts”

Each paradigm shift invention introduced a new branchpoint in neural network’s phylogenetic evolution.

We start with the initial idea that there could be such a thing as a computational neural network; an idea advanced by Frank Rosenblatt in 1958.

Figure 2. Frank Rosenblatt (1958) proposed the original notion for a computational neural network, which he called a “perceptron.”

Yes, Major Advances Since 1987 …

Of course there have been noteworthy achievements since 1987.

Hinton invented contrastive divergence, the key algorithm enabling deep learning in 2002.
Hinton and Salakhutdinov cracked the “neural networks winter” in 2006 with their Science article describing their first deep learning architecture.
Salakhutdinov and Hinton published a detailed exposition of their method in 2012. (See detailed references and links in Resources and References at the end of this post.)

We’ve also had GANs (generative adversarial networks), introduced by Ian Goodfellow et al. in 2014.

Essentially, there are two major learning algorithms and computational architectures in neural networks:

Multilayer Perceptrons (typically classifier architectures) trained with backpropagation (or some other stochastic gradient descent algorithm, where the key invention was due to Werbos in 1974, and
Boltzmann machines (typically autoencoder architectures) trained using contrastive divergence, where the key inventions were due to Little in 1974, then Hopfield in 1982, then Hinton (as senior investigator, with his student David Ackley and colleague Terrence Sejnowski) in 1985, and the refinement into restricted Boltzmann machines following a suggestion by Smolensky in 1986.

We have two primary ways of combining these two major algorithms:

Deep learning architectures, where the two algorithms support each other, and
Generative adversarial networks, where the two algorithms compete against each other.

Deep Learning

GANs (Generative Adversarial Networks)

The other primary way in which backpropagation is used with a Boltzmann machine is in GANs (generative adversarial networks), developed by Goodfellow and colleagues in 2014.

Figure 4. Generative adversarial networks (GANs) use both Boltzmann machines and Multilayer Perceptrons, working “adversarially” to create effective neural network structures.

Recent Innovations: Transfer and Diffusion Learning

Recently, we’ve seen transfer learning and diffusion methods – both of these are designed to improve how DL architectures and GANs work. These are also part of the “Kuhnian normal” evolutions.

Figure 5. Transfer learning and diffusion learning help us use one previously-trained neural network structure to rapidly train a new one.

Quick Recap and Summary

When we back up to a satellite-level view of neural networks, as they’ve evolved over the last 50 years, we see two primary methods, used either in combination (deep learning) or “advesarially” (GANs).

The core equations have remained the same.

Most importantly, Boltzmann machines have relied on the Ising equation, from statistical mechanics.

The formulation of this equation has been consistent, since 1974. The only real difference – in the basic equation – is that, starting around 1986 (with Smolensky’s suggestion of a “restricted” Boltzmann machine), the Ising equation has specifically identified interactions between the visible and hidden nodes, together with individual activation strengths (“biases”) for both the visible and hidden nodes.

This is shown in Hinton et al.’s 2012 paper.

Figure 6. Hinton and colleagues (2012) use an Ising equation that shows explicit interaction energies for visible-to-hidden connections, and individual activation energies (“biases”) for each of the visible and hidden nodes. The same equation was also used by Salakhutdinov and Hinton in 2006. (See ***Resources and References*** for full citations and links.)

What This Means

We’ve got two primary neural networks methods and their associated learning algorithms, respectively.

We can combine them in two different ways: cooperatively (deep learning) or competitively (adversarial).

We’ve built enormous structures using each of these methods, and garnered numerous successful applications.

By now, we are NOT LIKELY to gain significant future advances with either of these two (combined) approaches.

That is, adding more layers will not substantially change our world.

We’re ready for the next Kuhnian paradigm-shifting breakthrough.

Our next step is to figure out the where and how of this next breakthrough.

It is NOT likely to be a further twitch or tweak to either deep learning or GANs.

For useful branchpoints in the phylogenetic evolution of neural networks, we need to look into the earlier neural network history – when our first paradigm-shifting breakthroughs occurred.

This means – we need to assess the big ideas, the problem statements, that were on the minds of the inventors in the 1974 – 1986 timeframe.

Tune in next week as we turn our attention to how these inventors phrased their problem statements.

P.S. – Well, What About ChatGPT?

Billions and billions of parameters served.

Does some things well, some are … “reductio ad absurdum.” (See last week’s post: “Reductio ad Absurdum.”)

Adding more billions (of parameters) will not make it remarkably smarter.

We’ve just exposed its weakness – it’s an amalgam of the masses.

No real intelligence there; let’s just keep moving on.

“Live free or die,” my friend!*

* “Live free or die” – attrib. to U.S. Revolutionary War general John Starck. https://en.wikipedia.org/wiki/Live_Free_or_Die

Alianna J. Maren, Ph.D.

Founder and Chief Scientist

Themesis, Inc.

Prior Related Blogposts (with References)

The three-part blogpost series on “Seven Key Papers” (containing more detailed descriptions of the seven key papers identified here, including YouTube links):

Maren, Alianna J. 2022. “Entropy in Energy-Based Neural Networks – Seven Key Papers (Part 3 of 3).” Themesis Blogpost Series (April 4, 2022). (Accessed Jan. 31, 2023; available at https://themesis.com/2022/04/04/entropy-in-energy-based-neural-networks-seven-key-papers-part-3-of-3/.)
Maren, Alianna J. (2021). “Latent Variables Enabled Effective Energy-Based Neural Networks: Seven Key Papers (Part 2 of 3).” Themesis Blogpost Series (November 16, 2021). (Accessed April 3, 2022.) https://themesis.com/2021/11/16/latent-variables-enabled-effective-energy-based-neural-networks-seven-key-papers-part-2-of-3/
Maren, Alianna J. (2021). “Seven Key Papers for Energy-Based Neural Networks and Deep Learning (Part 1 of 3).” Themesis Blogpost Series (November 5, 2021). (Accessed April 3, 2022.) https://themesis.com/2021/11/05/seven-key-papers-part-1-of-3/

How Backpropagation Is Used WITH the Boltzmann Machine for Deep Learning

Alianna J. Maren, 2022. “How Backpropagation and (Restricted) Boltzmann Machine Learning Combine in Deep Architectures.” Themesis Blogpost Series (Jan. 5, 2022). (Accessed Jan. 31, 2023; available at https://themesis.com/2022/01/05/how-backpropagation-and-restricted-boltzmann-machine-learning-combine-in-deep-architectures/.)

And, of Course, ChatGPT (See Resources and References)

Maren, Alianna J. 2023. “The Future of AI: Part 0 (Prelude) – Reductio ad Absurdum.” Themesis Blogpost Series (Jan. 24, 2022). (Accessed Jan. 31, 2023; available at https://themesis.com/2023/01/24/the-future-of-ai-part-0-prelude-reductio-ad-absurdum/.)

Resources and References

This Week’s Read-Along’s

These are the primary works specifically identified in today’s post.

Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. “Generative Adversarial Nets.” Proceedings of the International Conference on Neural Information Processing Systems (NIPS 2014); 2672–2680. (Accessed Jan. 31, 2023; available at https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.)
Hinton, Geoffrey. 2023. “The Forward-Forward Algorithm: Some Preliminary Investigations.” Preprint. (Accessed Jan. 7, 2023; available at https://www.cs.toronto.edu/~hinton/FFA13.pdf.)
Hinton, Geoffrey, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury. 2012. “Deep Neural Networks for Acoustic Modeling in Speech Recognition: Four Research Groups Share Their Views.” IEEE Signal Processing Magazine 29 (6) (Nov. 2012); 82-97. doi: 10.1109/MSP.2012.2205597. (Accessed Jan. 31, 2023; available at https://www.cs.toronto.edu/~hinton/absps/DNN-2012-proof.pdf.)
Hinton, G.E., and R. R. Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science 313(5786) (July 28, 2006): 504-507. doi: 10.1126/science.1127647. (Accessed April 5, 2022, available at https://www.cs.toronto.edu/~hinton/science.pdf.)
Salakhutdinov, Ruslan, and Geoffrey Hinton. 2012. “An Efficient Learning Procedure for Deep Boltzmann Machines.” Neural Computation 24(8) (August, 2012): 1967–2006. doi: 10.1162/NECO_a_00311. (Accessed April 3, 2022; available at https://www.cs.cmu.edu/~rsalakhu/papers/neco_DBM.pdf.)

Matias Bal’s Blogposts

AJM’s Note: Transformers. The hottest thing in AI/ML. And yet ANOTHER thing that totally rests on statistical mechanics. I found this pair of blogposts by Mattias Bal on transformers when doing a prior blogpost series on the K-L divergence, free energy, variational Bayes. This pair of posts also supports our study of transformers for NLP methods, such as BERT and the GPT series.

Bal, Mattias. 2020. “Transformer Attention as an Implicit Mixture of Effective Energy-Based Models.” Mattias Bal’s GitHub Blog Series (Dec. 30, 2020.) (Accessed Oct. 3, 2022. https://mcbal.github.io/post/transformer-attention-as-an-implicit-mixture-of-effective-energy-based-models/ .)
Bal, Mattias. 2020. “Transformers Are Secretly Collectives of Spin Systems: A Statistical Mechanics Perspective on Transformers.” Mattias Bal’s GitHub Blog Series. (Nov 29, 2021.) (Accessed Oct. 3, 2022; https://mcbal.github.io/post/transformers-are-secretly-collectives-of-spin-systems/ .)