Recently (as of late December, 2024), a huge hoopla about OpenAI’s o3, and also about Gemini 2 – with the question percolating (once again): is this AGI? Are we heading towards “superintelligence”?
Let’s pull back a moment.
OpenAI’s o3 uses strategic chain-of-thought reasoning. (See Wang et al. for a good exposition; Sept. 2024.) This is a step beyond straightforward chain-of-thought reasoning (Wei et al., 2023, all from Google Research, Brain Team). As Wei et al. note, “a chain of thought — a series of intermediate reasoning steps — significantly improves the ability of large language models to perform complex reasoning.”
Chain-of-thought reasoning uses reinforcement learning (Wei et al. 2022).
Reinforcement is our preferred means for implementing narrow AI, or ANI (“artificial narrow intelligence”).
But First, These Results
OpenAI’s o3 vs its earlier o1 – here’s the results-at-a-glance, illustrated for:
- CodeForces competition code, and
- ARC-AGI-1 Pub (using the “Low” trained versions of o1 & o3).
Sources for these results:
OpenAI’s “12 Days of OpenAI” YouTube:
- ARC-AGI-1 Pub (by Francois Chollet), showing results for o3 Low (Tuned) vs. o1 Low (Tuned).
- CodeForces (competition code), showing results for versions of o3 and o1.
Chain-of-Thought Reasoning: Some Brief Notes
The OpenAI o1 Contributors note that: “Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute).”
Wei et al. (2023) note that chain-of-thought reasoning offers improvements over both direct LLM training and prompt engineering.
Beyond Performance
Just because we like being contrarian (and because we really believe this is so), we offer a radical notion:
Intelligence is a combination of both behavior and performance.”
Alianna J. Maren, 2024.
Looking down the road – specifically, the entire year of 2025 – we can envision more and more powerful NARROW AI capabilities – very impressive, albeit narrowly-focused, performances.
Essentially, we’ll see the birth of numerous AI idiot savants. They’ll be good for certain tasks – but we probably won’t let them cross the street without holding their hands. And they’ll still need shoes with velcro closures.
We need to expand our notion (and definition) of intelligence beyond what Turing offered in 1949. (Although the word “behavior” was used in the original description of the Turing test, what was being evaluated was text-based conversation between the AI and a single human.)
We need to think beyond an AI’s ability to perform mathematical or abstract (e.g., pattern-based) reasoning, or to write code, or do any of the other current benchmarks.
Instead of being narrowly performance-focused, we need to introduce another dimension into evaluating an AI’s actual intelligence.
This is a notion that has been around for over a decade, with some of the earliest work along these lines done by Friston and colleagues, most notably a kick-off paper by Schwartenbeck et al. (including Friston) in 2013.
The Key Tradeoff for Intelligent Behavior: Exploration vs. Exploitation
What is really interesting is that the discussion of “exploration vs. exploitation” is not new; it originated in the same year in which Friston proposed active inference in a complete manner (2013 for both works).
Schwartenbeck and colleagues, working with Friston at the time in 2013, describe this tradeoff. Their abstract is worth reproducing in full:
“This paper reviews recent developments under the free energy principle that introduce a normative perspective on classical economic (utilitarian) decision-making based on (active) Bayesian inference. It has been suggested that the free energy principle precludes novelty and complexity, because it assumes that biological systems—like ourselves—try to minimize the long-term average of surprise to maintain their homeostasis. However, recent formulations show that minimizing surprise leads naturally to concepts such as exploration and novelty bonuses. In this approach, agents infer a policy that minimizes surprise by minimizing the difference (or relative entropy) between likely and desired outcomes, which involves both pursuing the goal-state that has the highest expected utility (often termed “exploitation”) and visiting a number of different goal-states (“exploration”). Crucially, the opportunity to visit new states increases the value of the current state. Casting decision-making problems within a variational framework, therefore, predicts that our behavior is governed by both the entropy and expected utility of future states. This dissolves any dialectic between minimizing surprise and exploration or novelty seeking.”
2025 Predictions
Prediction #1: We will get tired of narrow-focused (and narrow-minded) AIs very soon. We’ll shift from “Oh, wow!” and “Gee, whiz!” to “But can it sing? Can it dance?”
Prediction #2: We, as a community, will start creating more comprehensive qualifiers of artificial GENERAL intelligence, and perhaps lean more towards different levels of behavior and performance exhibited in biology than highly-abstract and cognitively taxing “benchmark” studies.
Prediction #3: A certain population within the AI community – those that are now learning reinforcement learning – will (grudgingly, unhappily, and VERY grumpily) start learning not only variational inference, but active inference.
And once we get a sizeable number of folks to settle down and learn variational methods (up to and including active inference, and then beyond, to renormalising generative models), we’ll see the limitations with THAT approach – because once again, active inference (even expanded into RGMs) will not be all that is needed for “general intelligence.”
A step in that direction, certainly.
But not all.
What to Read/Watch/Do Right Now
Start with something that is fairly accessible. Most of the language is in fairly understandable English. The trade-offs, the contrast-and-compare, between active inference and reinforcement learning, is well-presented and substantially clear.
I’m referring, of course, to that notable paper for which Noor Sajid was the lead author, and contributing authors included Philip Ball, Thomas Parr, and (of course) Karl Friston. See Sajid et al. (2020).
References and Resources
Chain-of-Thought (Strategic and Straightforward)
- OpenAI o1 Contributors. 2024. “Learning to Reason with LLMs.” OpenAI.com (September 12, 2024). (Accessed Dec. 27, 2024; available at OpenAI.)
- Wang, Yu, Shiwan Zhao, Zhihu Wang, Heyuan Huang, Ming Fan, Yubo Zhang, Zhixing Wang, Haijun Wang, and Ting Liu. 2024. “Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation.” arXiv:2409.03271v1 [cs.AI] (5 Sep 2024). (Accessed Dec. 27, 2024; available at arXiv.)
- Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma. Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2023. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” arXiv:2201.11903v6 [cs.CL] 10 Jan 2023 (Accessed Dec. 27, 2024; available at arXiv.)
“Exploration vs. Exploitation”
AJM’s Note: To the best of my knowledge, this paper is the one that introduces the “exploration vs. exploitation” dialectic.
- Schwartenbeck, Philipp, Thomas FitzGerald, Raymond J. Dolan, and Karl Friston. 2013. “Exploration, Novelty, Surprise, and Free Energy Minimization.” Front. Psychol. 06 (October 2013). (Accessed Dec. 27, 2024; available at Front Psychol.)
AJM’s Note: Probably the best starting place for comparing active inference with reinforcement learning is this tutorial/review by Sajid et al.
- Sajid, Noor, Philip J. Ball, Thomas Parr, and Karl J. Friston. 2020. “Active Inference: Demystified and Compared.” arXiv:1909.10863v3 [cs.AI] 30 Oct 2020. (Accessed 17 June 2022; https://arxiv.org/abs/1909.10863 )
Early Friston Works
- Friston, Karl. 2013. “Life as We Know It.” Journal of The Royal Society Interface. 10. doi:10.1098/rsif.2013.0475. (Accessed Oct. 13, 2022; pdf.)