When a Classifier Acts as an Autoencoder, and an Autoencoder Acts as a Classifier (Part 1 of 3)

One of the biggest mental sinkholes into which AI students can get trapped is not quite understanding the fundamental difference between how our two basic “building block” networks operate: the Multilayer Perceptron (MLP), trained with backpropagation (or any form of gradient descent learning), and the (restricted) Boltzmann machine (RBM), trained with contrastive divergence.

It’s easy to see why this happens. First, a lot of students are being run at rapid pace through an AI program that is often more applications-driven than theory-centric. This isn’t bad – it’s the fastest way to a new job and a new career – but it leaves students with holes in their “AI understanding” armor.

Second, most students move (as rapidly as they can) into some deep learning application; one that uses layers of neural network nodes. When they start learning neural networks, they get well-grounded in the MLP basics. Then, there’s a brief mention of the problems inherent in adding layers into an MLP (vanishing and exploding gradients, etc.), and the need to have a different method. At some point, the name “Hinton” might be briefly mentioned. There may be a faint nod in the direction of (restricted) Boltzmann machines. But the students are so busy getting their application together, using the twin miracles of TensorFlow and Keras, that they skip that step of understanding (restricted) Boltzmann machines.

And who can blame them?

The third big factor is: all too often, the professors don’t understand Boltzmann machines. The textbook authors don’t understand Boltzmann machines.

Ergo, the simple and expedient thing is to focus on building the application, and glossing over that theoretical armor-hole.

Thus – we have a situations where hundreds, thousands, and possibly tens of thousands of people are labeling themselves as “AI professionals,” but who have serious knowledge-gap weaknesses.

It’s like bleeding out from multiple puncture wounds, each one getting through one of those gaping armor-holes.

Let’s see if we can address this, ok?

It’s not going to be that hard – and it won’t take long – and if you’re in the situation that I’ve described, you’ll emerge from this series of posts feeling -and actually being – much more confident in your understanding.

First Step Contrast-and-Compare: Neural Network Architectures

Perhaps the easiest way to get insight is to start by visualizing two network architectures – a classic, simple Multilayer Perceptron (MLP) and a classic, simple Boltzmann machine – in both its original and restricted forms. These are shown in Figure 1.

Figure 1. The image on the far left shows a simple three-layer Multilayer Perceptron (MLP). The image in the center shows a Boltzmann machine. Note that all nodes are connected to each other. The image on the right shows a restricted Boltzmann machine (RBM), arranged to look like a classifier or MLP. Note that the configurations on the left and right seem to be identical.

Here are the key points:

An MLP MUST be arranged in layers – at least three layers. The “hidden nodes” (latent variables, in machine-learning-speak) comprise the middle layer.
A Boltzmann machine DOES NOT have to be arranged in layers. However, a basic (non-restricted) Boltzmann machine has every node connecting to every other node – a spaghetti of node connections. (See the middle image in Figure 1.)
A restricted Boltzmann machine has many of those connections removed – and it can be REDRAWN so that it LOOKS LIKE an MLP – but it is not, and never will be, and MLP. (Wolf in sheep’s clothing and all that.)

We can get a better insight into those last two points when we look at how a Boltzmann machine (and its derivative, the restricted Boltzmann machine) evolved from a Hopfield neural network. This is shown in Figure 2.

Figure 2. A Hopfield neural network (left-most image) has every node connecting to every other node. A Boltzmann machine (central image) is essentially a Hopfield neural network with a set of new nodes – ones that are NOT defined by the training patterns. These are the “latent variables” or “hidden nodes.” Every node still connects to every other node. in a restricted Boltzmann machine (right-most image), a lot of those connections are removed. Now, those original “pattern” nodes can ONLY connect to the latent variable nodes, and the latent variable nodes can ONLY connect to the original pattern nodes – and not to any other latent variable node.

This blogpost will be continued, as we look at how the respective learning method for backpropagation (or in general, stochastic gradient descent) require the MLP architecture, and how the energy equation and contrastive divergence (for the restricted Boltzmann machine) indicate the RBM structure. We’ll play some contrast-and-compare, and increase our level of insight-understanding into both types of neural networks.

To your health and outstanding success!

Alianna J. Maren, Ph.D.

Founder and Chief Scientist, Themesis, Inc.

Previous Related Blogs

Maren, Alianna J. (2022). “Entropy in Energy-Based Neural Networks – Seven Key Papers (Part 3 of 3).” Themesis Blogpost Series (April 4, 2022). (Accessed April 3, 2022.) https://themesis.com/2022/04/04/entropy-in-energy-based-neural-networks-seven-key-papers-part-3-of-3/

Maren, Alianna J. (2021). “Latent Variables Enabled Effective Energy-Based Neural Networks: Seven Key Papers (Part 2 of 3).” Themesis Blogpost Series (November 16, 2021). (Accessed April 3, 2022.) https://themesis.com/2021/11/16/latent-variables-enabled-effective-energy-based-neural-networks-seven-key-papers-part-2-of-3/

Maren, Alianna J. (2021). “Seven Key Papers for Energy-Based Neural Networks and Deep Learning (Part 1 of 3).” Themesis Blogpost Series (November 5, 2021). (Accessed April 3, 2022.) https://themesis.com/2021/11/05/seven-key-papers-part-1-of-3/

Good Vibes – Music by Nadia Boulanger

Marina Thibeault. 2020. “Nadia Boulanger Trois Pièces- Marina Thibeault, viola & Corey Hamm, piano.” Marina Thibeault YouTube channel. (Nov. 5, 2020). (Accessed April 21, 2022.) https://www.youtube.com/watch?v=0jOEe8b7LRc

Famous Salonnières & Professors

In this section (and for the next few blogposts), we’ll look at one of the most under-rated teachers of music in our recent times, and how she directly influence some of the most well-known recent and contemporary composers.

Nadia Boulanger and her sister, Lili, were highly influential figures. More on these two fascinating women to come.

2 comments

Simon Crase says:

April 22, 2022 at 10:35 pm

Dr. A.J., Thanks for the article. Since you’ve also written about Nadia Boulanger, you might be interested in the following anecdote. The Australian composer Peggy Glanville-Hicks “…lay in wait on the pavement opposite Nadia’s house ( where she could be seen better ) at 36 Rue Ballu ( now Place Lili Boulanger ) and bombarded Boulanger with noted pleading to be taken on as a student…in all weathers, outside the house”[https://books.google.co.nz/books?id=MZ50ojGMfKYC&pg=PA24&lpg=PA24&dq=peggy+glanville+hicks+nadia+boulanger+applied&source=bl&ots=L1h5A2lXMr&sig=ACfU3U0wfq24_GwP3w_qRRguxyPVwnnDqA&hl=en&sa=X&ved=2ahUKEwiY_vnm2aj3AhWaRmwGHciXC8Y4ChDoAXoECBYQAw#v=onepage&q=peggy%20glanville%20hicks%20nadia%20boulanger%20applied&f=false]. It took 2 months of this to convince Boulanger that she was serious].
1. AJ Maren says:
  
  April 25, 2022 at 2:25 pm
  
  This is fabulous, Simon – thank you so much! I’ll be certain to follow up and share! (And yes, credit you w/ the inspiration!) – much appreciated! – AJM

Comments are closed.