The Brilliance Nobody Wanted

Wolfram has been doing something close to true machine intelligence for over three decades. So why did a chatbot that occasionally makes things up capture the world's imagination, and its money, instead?

Let's begin with an uncomfortable observation. In 2009, a brilliant and somewhat eccentric mathematician named Stephen Wolfram launched a computational engine that could correctly answer almost any structured question you put to it. Mathematics, chemistry, geography, finance, ask it anything within its domain and it would give you a precise, sourced, verifiably correct answer. No guessing. No hallucination. No confident nonsense dressed up in fluent prose

The world said: interesting. And largely moved on.

Then, in late 2022, OpenAI released ChatGPT, a system that, by its own technical admission, predicts statistically plausible sequences of words rather than computing verified answers. It can, and regularly does, invent facts. It has no formal reasoning engine. Ask it a complex integral and it might get it wrong with complete confidence. And yet within two months it had 100 million users, triggered a multi-hundred-billion dollar investment frenzy, and convinced half of Silicon Valley that AGI was two years away.

What on earth is going on?

Two Entirely Different Machines

To understand the paradox, it helps to be clear about what these technologies actually are, because they are genuinely, fundamentally different beasts.

Wolfram Language, the engine behind Wolfram Alpha and Mathematica, is a symbolic computation system. It works through formal rules. When you ask it to solve a differential equation, it actually solves it, using centuries of accumulated mathematical knowledge encoded explicitly into the system. The answer is either correct or it isn't, and the system knows the difference. This is the tradition of deterministic, verifiable computation stretching back to the earliest days of computing.

Large language models like ChatGPT or Claude are something else entirely. They are statistical pattern engines trained on vast quantities of human text. They have learned, with extraordinary sophistication, how humans write, reason, explain, and express themselves. They can produce text that is contextually appropriate, tonally nuanced, and uncannily human. But they are not, at their core, reasoning — they are completing patterns. The difference matters enormously, and also, as it turns out, barely at all to most people.

Wolfram / Symbolic AI

Deterministic — same input, same output
Verifiably correct within its domain
Requires structured, precise queries
Narrow but deeply reliable
Feels like a sophisticated tool

Large Language Models

Probabilistic — outputs vary
Can hallucinate convincingly
Accepts messy, natural language
Broad but unreliable in specifics
Feels like a conversation with a mind

The Competence Paradox

Here is the irony that Wolfram himself has noted, with what one imagines is considerable wryness: his tools actually know things are true. Language models merely sound like they know things are true. And in the attention economy, the latter is worth considerably more.

This is not, when you think about it, entirely surprising. Wolfram's tools reward expertise. To unlock their real power, you need to understand the domain you're querying, and ideally know the Wolfram Language itself. It is powerful in the way a concert grand piano is powerful — extraordinary in the hands of someone who knows what they're doing, and mildly impressive as a curiosity to everyone else.

ChatGPT required nothing. You typed as you would to a friend. It wrote your cover letter, explained your legal document, summarised the meeting you half-slept through, and composed a birthday poem for your aunt in the style of Keats. The barrier to feeling the value was approximately zero.

We tend to reward things that feel intelligent over things that are intelligent — and those are not always the same thing.
— The uncomfortable truth about how we value technology

What Financial Markets Are Actually Pricing

The investment frenzy around generative AI tells us something important about how capital markets actually function, something that business school textbooks are somewhat reluctant to say directly.

Markets are not, in the short to medium term, pricing correctness or technical rigour. They are pricing narrative, addressable emotion, and perceived disruption at scale. The question investors were asking was not "which technology produces the most reliable outputs?" It was "which technology makes the most people feel like the future has arrived?"

Wolfram's answer to that question was always: physicists, mathematicians, and engineers. OpenAI's answer was: everyone with a keyboard. One of those is a niche enterprise software story. The other is a story about replacing cognitive labour across the entire economy. The narrative differential is worth, apparently, several hundred billion dollars.

This is not to say the market is wrong, exactly. LLMs have demonstrated genuine, transformative utility across an almost comically wide range of tasks. But there is something clarifying about observing that the less technically rigorous system attracted orders of magnitude more capital than the more rigorous one, simply because it was more emotionally accessible.

The Deeper Insight

The Wolfram vs. LLM story is a precise case study in how human perception mediates the relationship between genuine capability and perceived value. We are not, it turns out, primarily rational evaluators of technical merit. We are story-seeking creatures who respond to fluency, relatability, and the feeling of intelligence far more than to verifiable correctness.

For those of us working in digital — building products, crafting communications, shaping user experiences — this is not a peripheral observation. It is arguably the central one. The product that feels right, in the right moment, to the right person, will almost always outperform the product that is technically superior but harder to emotionally access.

Wolfram built a better engine. OpenAI built a better mirror. And we, apparently, prefer to look at ourselves.

So What Does This Mean in Practice?

For businesses navigating the AI landscape right now, the lesson cuts both ways. The hype around generative AI is real and the utility is genuine, but the technology that attracted the most capital is also the one most prone to elegant, confident failure. The businesses that will extract the most long-term value are those who understand when to use the fluent mirror, and when to reach for the rigorous engine.

The smartest applications we're seeing emerge are hybrid: LLMs for natural language interface, contextual understanding, and creative generation; symbolic or structured systems for verification, computation, and anything where correctness is non-negotiable. The art is in the orchestration.

But perhaps the most useful thing the Wolfram paradox teaches us has nothing to do with technology selection at all. It teaches us that in any market, financial, attention, or commercial, the gap between what is best and what wins is almost always mediated by accessibility, narrative, and the emotional resonance of the first impression.

Stephen Wolfram built something that was, in important ways, genuinely smarter. He just forgot to make it feel that way. In digital, that is rarely a mistake you get to make twice.

It all starts with a conversation.
Do you have an exciting idea but nobody to bring it to life?

Right now Binaryfold4 has a limited number of partners we can work with in order to provide the highest quality service to each and every one.

Name *

Email *

Phone number

Project *

By submitting this form I accept the Privacy Policy of this site.

reCAPTCHA and the Google Privacy Policy and Terms of Service apply.