First, Make Sure that You Understand Discriminative Neural Networks
Discriminative neural networks are the kinds of neural networks where you train the network with a known and pre-labeled training and testing data set. Essentially, you “have the answers in the back of the book” – and you train the network by constantly having it check on the answers.
(In contrast, generative AI methods are for neural networks where you DON’T know the answers in advance; you DO NOT have a pre-labeled training data set. Instead, the network discovers “features” and attempts to build a model of its training data that makes sense based on the features that are “inherent” across a very large set of training data – training data that has been assembled, but NOT pre-labeled!)
Learning AI is like taking the Oregon Trail from Elm Grove (near St. Louis, MO) through the prairie grass country to Ft. Laramie in Wyoming, and from there, crossing the Sierra Nevada mountains to reach the “Gold Coast” of AI in California.
The first stage of this journey is to get to Ft. Laramie. This is learning basic neural networks; the discriminative networks – based on the Multilayer Perceptron (MLP) and stochastic gradient descent training methods (including backpropagation).
You can’t skip this step, because the stochastic gradient descent (SGD) methods continue to pop up in the generative AI methods. Specifically, the Adam optimizer (an evolution from SGD) is used in transformers. So you really need the groundwork provided by a thorough study of the methods used in discriminative neural networks.
Here’s a very good Analytics Vidhya blogpost on various kinds of optimizers.
If you are going to start at the beginning, then you need to learn:
- The Multilayer Perceptron (MLP) neural network structure,
- The backpropagation algorithm – and then the broader set of stochastic gradient descent algorithms and various optimizers, and
- Descendants of the MLP, such as convolutional neural networks (CNNs).
This YouTube playlist covers MLP neural networks, and the last two vids in this sequence specifically help you evaluate the Sum-Squared-Error (SSE) for your trained neural network, using the simple X-OR (exclusive-OR) neural network as a reference.
This next YouTube playlist takes you through the backpropagation algorithm, up through training the output-to-hidden layer connection weights. (The next layer of weight training has not yet been added to the playlist.)
Generative AI is Like Going over the Mountains
In terms of our Oregon Trail of AI analogy, generative AI starts in Fort Laramie.
The journey from Elm Grove to Ft. Laramie was pretty rough, but fairly straightforward. The most math that you needed was the chain rule from differential calculus. (To do the backpropagation method, you use the chain rule … again, and again, and again. Boring, but very doable.)
In contrast, generative AI rests on a combination of three distinct methods, as mentioned previously:
- The reverse Kullback-Leibler divergence,
- Bayesian conditional probabilities, and
- Statistical mechanics.
Just as it is important to find the best route INTO a mountain range – one that takes you most safely and directly through the best passes – it’s important to start your study of generative AI in the right place, that is, with the reverse Kullback-Leibler divergence. This leads naturally to inserting Bayesian conditional probabilities. Then, the final (major, theoretical) step is that we deconstruct the resulting equation and interpret it using the methods of statistical mechanics.
This is most evident when we look at variational inference.
Start Learning Generative AI Here
The best place in which to START your understanding of generative AI is to begin at the beginning – which means going back to the Boltzmann machine.
Start here, with this YouTube vid that identifies key breakthroughs in the development of neural networks, particularly the evolution of generative AI methods.
Your next step is to do a contrast-and-compare between the restricted Boltzmann machine (RBM) and the classic Multilayer Perceptron, or MLP. You can get that solidly and thoroughly with THIS YouTube vid.
Then, do the generative AI deep-dive by dedicating a weekend, and watching two video playlists on generative AI.
Here’s the first video playlist on Generative AI Basics.
There is a corollary playlist, on Learning Energy-Based Neural Networks.
There’s a bit of overlap between these two playlists. We recommend that you listen to both. For example, you could take a weekend, and listen to one full playlist on one day, and the second playlist the next day. (Or do one full playlist on one weekend, and the next full playlist on the second.) This way, you’ll reinforce the important points.
If You’re Really Serious …
If you’re serious, we offer three deeper study options:
- The Kullback-Leibler divergence – a VERY detailed cross-compare of what this divergence is, and how various authors use different notations (mind-boggling, and needing serious straightening-out in order to cross-compare), see the link on the Themesis Resources page, look for the “Generative AI” link, and you’ll find a paper on the Kullback-Leibler divergence. All 20-some pages of it. It will help you with notation. It will SERIOUSLY help you. So if you’re serious, start there.
- Book chapters – three chapters from the book-in-progress, Statistical Mechanics, Neural Networks, and Artificial Intelligence; see the “Book” link on the Themesis Resources page, and
- “Introduction to Generative AI” – a three-week Themesis Short Course. This is actually called Top Ten Terms in Statistical Mechanics – but the bonus sections evolved into a full presentation on generative AI. To the best of our knowledge, this is the most straightforward and direct way of learning the fundamental theory elements underlying all forms of generative AI. Although you can join at any time, we start new cohorts on a regular basis. (Those who have “Opted-In” with Themesis – scroll down to see the Opt-In form – will get email invitations to join upcoming cohorts.)
If Too Much is Not Enough …
There’s more.
We have a longish (60-pg, approx.) tutorial on variational inference. (It needs a bit of tweaking, and what we’ve presented here is enough to fill a quarter’s worth of study.) We’ll put the link here when it’s ready.
But, the real, serious, super-important thing is:
We’re actively developing AGI.
No kidding.
It’s really real.
We lay out the basic AGI architecture in one of our recent YouTube vids. (We presented this vid earlier in this blogpost, but didn’t really mention the AGI aspect.)
AGI is much more complex than generative AI.
Generative AI involves multiple disciplines, as you saw when we mentioned the three “fundamental” elements (the reverse Kullback-Leibler divergence, etc.).
AGI involves even MORE disciplines.
It starts with an understanding of how to build ontologies – such as Google’s Knowledge Graph. Then, we need (new concept) activation ontologies – a means of “activating” ontology components depending on the “stimulus” from a signal layer. (This “signal layer” could come from a transformer, or a number of other possibilities – including a very simple neural network.) This is a new thing, and it’s on our drawing boards right now.
Also, we need a means of connecting the ontologies and the signals – and this requires a wee bit of statistical mechanics that we’ve encapsulated in a new kind of neural network that we’ve named the CORTECON(R) (COntent-Retentive, TEmporally-CONnected neural network).
CORTECONs(R) are the magical sticky-glue that will connect signal-layer elements (e.g., items processed by a transformer) to the ontologies, and back again. They will enable transformer-based methods to access real “knowledge,” and stop making silly mistakes.
We’ve spent several months building a framework in which to introduce CORTECONs(R), and provide context for them with regard to other neural networks.
This YouTube playlist captures that framework.
Once this framework positions CORTECONs(R) vis-a-vis other neural networks, we introduce CORTECONs(R) via their own YouTube playlist.
This playlist has it all:
- Architecture and equations (at least, a simple presentation of the key equations – no derivations here),
- Code – real, working object-oriented Python code, and
- A worked example. (Simple, but real.)
We even have an “Executive/Board/Investors Briefing” for those who are equations-and-code averse, and a light-hearted and fluffy “sandwich” analogy.
There is some VERY serious statistical mechanics involved.
We’ve been doing this for years. (About ten years, really – we’ve got a LONG stealth history.) We give links to some of our key papers on the Resources page – same page that had the Generative AI and Book resources; just scroll down a bit more.
What’s Next with Themesis (and How to Stay Informed)
With CORTECONs now introduced, we’re moving back to the realm of AGI, and building out components of a very basic, very rudimentary, yet conceptually-complete AGI.
We’ll be reporting on this as time goes on.
To be sure that you get updates, be certain to go to our Opt-In page, which is the “About” menu button.
Scroll down, find the Opt-In form, fill it in, check your email for the confirmation, and confirm.
Be certain to MOVE THE THEMESIS EMAILS to your preferred folder. We’re pretty strict about our email subscibers these days; those who don’t open their emails find themselves dropped. Nemesis-style justice; the milder version.
If you REALLY want to stay on our email list (which is a privilege, not an obligation), then CLICK on some buttons occasionally. That tells us:
- That you’re alive and you’re paying attention, and
- What interests you – so we can respond to those interests.
The real fun – with AGI – is just ahead. Join us for an uproarious, wild, fun journey.