This week marks the 70th anniversary of the original publication of Alan Turing’s paper in the philosophy journal Mind, on the Imitation Game, or as it came to be known, the Turing Test. How well has it stood the passage of time?
The Turing Test is an empirical test to guide a decision on whether a machine is thinking like a human. It is applying a standard that would be familiar to any lawyer: You cannot see inside the “mind” under evaluation; you can only judge it by its actions. If those actions as taken by a computer are indistinguishable from a human’s, then the computer should be accorded human status for whatever the test evaluates, which Turing labeled “thinking.”
One of the more famous, if unsuccessful, rebuttals to the Turing Test premise came from University of California at Berkeley philosophy professor John Searle, in his famous Chinese Room argument. You can hear me and AI professor Roman Yampolskiy discuss that on the latest episode of my podcast, “AI and You.”
How close are machines to passing the Test? The Loebner Prize was created to provide a financial incentive, but they found it necessary to extend the test time beyond Turing’s five minutes. Some of the conversations by GPT-3 from the OpenAI lab are easily close to sustaining a human façade for five minutes. It was created by digesting an enormous corpus of text from the Internet and exercising 175 billion parameters (a hundred times that of its predecessor, GPT-2) to organize that information. Google’s Meena chatbot has proven capable of executing a multi-turn original joke, and it is much smaller than GPT-3, about which one interlocutor remarked, “I asked GPT-3 about our existence and God and now I have no questions anymore.”
But is GPT-3 “thinking”? There are several facets of the human condition – Intelligence, Creative thinking, Self-awareness, Consciousness, Self-determination or Free will, and Survival instinct – that are inseparable in humans, which is why when we see anything evincing one of those qualities we can’t help assuming it has the others. Observers of AlphaGo attributed it with creative, inspired thinking when really it was merely capable of exploring strategies that they had not previously considered. Now, GPT-3 is not merely regurgitating the most appropriate thing it has read on the Internet in response to a question; it is actually creating original content that obeys the rules of grammar and follows a contextual thread in the conversation. But nevertheless it has learned how to do that essentially by seeing enough examples of how repartee is constructed to mimic that process.
What’s instructive is that we are very close (GPT-4? GPT-5?) to developing a chatbot whose conversers will label as human and enjoy their time with, yet whose developers will not think has the slightest claim to “thinking.” The application of Deep Learning has demonstrated that there are many activities that we previously thought to require human-level cognition that can be convincingly performed by a neural network trained on that activity alone. It’s rapidly becoming apparent that casual conversation may fall into that category. Since the methodology of a court is the same as Turing’s, that decision may come with legal reinforcement.
A more philosophical dilemma awaits if we suppose that “thinking” requires self-awareness. Because this is where the Turing Test fails. Any AI that passed the Turing Test could not be self-aware, because it would then know that it was not human, and it would not converse like one. An example of such an AI is HAL-9000 from 2001: A Space Odyssey. HAL knew he was a computer, and would not have passed the Turing Test unless he felt like pretending to be human. But his companions would have assessed him as “thinking.” (If we fooled a self-aware AI, through control of its sensory inputs, into thinking it was human – this is the theme of some excellent science fiction – then we should not be surprised to feel its wrath when it eventually figured out the subterfuge.)
So when self-awareness becomes a feature of AIs, we will need a replacement for the Turing Test that gauges some quality of the AI without requiring it to pretend that it has played in Little League games, blown out candles on a birthday cake, or gotten drunk at the office party.
At this point it seems best to conclude with Turing’s final words from his original paper: “We can only see a short distance ahead, but we can see plenty there that needs to be done.”