The real danger of AI isn’t hyper-intelligence, it’s human stupidity

The real danger of AI isn’t hyper-intelligence, it’s human stupidity

Bing, ChatGPT, and SoundHound Chat AI icons on an Android homescreen

Rita El Khoury / Android Authority

AI continues to be 2023’s enduring tech buzzword, with ChatGPT, Bard, and the like generating headlines and, just occasionally, powering a shiny new use case that might improve some aspects of our lives a little as well.

Thankfully, AI hasn’t taken over the world. In fact, the looming threat of a fast-paced AI takeover has perhaps receded a little, at least for the time being. Instead, I’ve become increasingly concerned that the bigger threat comes from the fact that humans don’t really understand AI very well at all. Whether we’re asking asinine questions or finding a way to offload our work, there’s a risk that we replace our own critical thinking with an alternative that’s not yet equipped for it.

What AI really is (and what it isn’t)

The problem is that AI isn’t really intelligent, not yet anyway, they’re just very good at fooling us into believing they are. The clue is in the name ChatGPT (the GPT bit is important too). But whether it’s Bard, Bing, or similar, these are large language models (LLMs) that essentially specialize in generating human-like text. What that means, at a very crude level, is that they are exceedingly good at statistically modeling the next likely word (or token) that appears in a sentence. Thanks to the swathes of training data, that same statistical modeling isn’t just good at writing sentences; it becomes much more creative and useful.

What these models certainly aren’t, despite their often impressive responses, is general-purpose intelligence (though AGI is the goal). In fact, there’s no analysis or critical thinking when an AI spews out a sonnet or generates working code. The fact that LLMs are seemingly very good at a wide range of things was a happy accident discovered back around the time of GPT-2. With today’s much more massive datasets, models are even better at conjuring accurate responses from a wider range of inputs.

Large language model specialize in generating human-like text. Correct answers are a bonus.

To elaborate on why this is, consider what an LLM does when you ask it to name the planets in the solar system. It doesn’t scour its memory for an answer; there is no database-like entry to look up. Rather it takes your input tokens and produces a statistically likely string of text based on its training data. In other words, the more often the model saw Mars, Earth, and Saturn in sentences about planets during training, the more likely it is to generate these words when it encounters a similar discussion in the future. It’s a simulation of genuine knowledge, but it’s not the same way you or I learn. Likewise, if the training data mostly consisted of articles pre-2006, your LLM may incorrectly insist that Pluto is a planet too (sorry, Pluto).

This situation is somewhat complicated by Bard and Bing, which can access data from the internet. But the guiding principle remains the same, LLMs are primarily designed to generate readable text outputs that humans would give the thumbs up to. Producing a correct answer is a bonus, which can and has been incentivized through reinforcement training, but a no stage does it “think” about the correct answer to your query. Hence their all too common mistakes and an inability to respond to some basic questions such as “What is the time?”

Mathematics is another very good example to help understand this point. LLMs don’t calculate like a traditional computer; no number-crunching processor guarantees a correct answer. It doesn’t function like our brain either. Instead, LLMs perform math in essentially the same way they generate text, outputting the most statistically likely next token, but that’s not the same as actually calculating the answer. However, the fascinating revelation is that the more data you provide an LLM, the better it becomes at simulating how to do maths (among other things). This is why GPT-3 and 4 are magnitudes better than GPT-2 at simple two and three-digit arithmetic and score much higher on a wide variety of tests. It has nothing to do with being more capable from a traditional data-crunching perspective, rather that they were trained on so much more data.

AIs will increase in power, but at the moment they’re far from general purpose problem solvers.

It’s the same for writing essays, generating code, and all the other seemingly miraculous emergent LLM capabilities. There’s a simulation of effort and thought, but the results are still text-based probabilities. Hence why you’ll often see repetitive styles and examples, as well as factual errors. Still, this “in-context” learning capability makes LLMs incredibly powerful and adaptable to a wide range of use cases.

However, if you want an extremely capable and robust AI for math, physics, or other science experiments, then you have to train the model very differently from a large language model. Those familiar with the broader landscape will already know that OpenAI offers various models, such as DALL.E for image generation and Whisper for audio-to-text translation. So while ChatGPT4 and eventually 5 will undoubtedly continue to improve in the accuracy and range of things they can do, they are still language models at heart.

Let’s stop asking AI such stupid questions

Siri versus ChatGPT

Robert Triggs / Android Authority

So back to the headline; we really need a better understanding of these strengths and pitfalls before setting AI to task.

Hopefully, it’s clear that it would be foolish to ask an AI to write your science coursework. It’s unlikely to understand equations correctly and even then will produce a formulaic response. And it would be downright irresponsible to take financial advice from one. But even seemingly more banal questioning can be problematic too. While it might be fun to tease out musing on controversial topics or trick it into a wrong answer, sharing what is tantamount to a probabilistic text string as anything close to a genuine opinion is beyond ignorant.

Let’s not surrender our critical thinking to an upmarket text predictor.

If you ask a chatbot for a preference or to make a comparison, it’s not drawing from its own thoughts, a vast vault of human knowledge, or even a collectivist opinion hidden inside its dataset. Instead, it’s statistically modeling what it determines to be the optimum text response it can produce for your query, but that’s very different from thinking of a genuine answer. Hence why these models are co-piloted to filter out queries and responses that the model really isn’t built for. Even if you can tease out such a response, they should almost certainly be ignored.

In a nutshell, we shouldn’t confuse a human-like response with human-like thought. That isn’t to diminish the impressiveness of AI simulacrum and the swathes of emerging use cases that they are genuinely useful for. But ultimately, there are many more exciting and existential AI topics to muse on than their preferences in fast food chains and designer brands. Let’s not surrender our critical thinking to an upmarket text predictor.

Leave a Reply