
The Control Problem, Re-re-re-visited
Like moths to a flame, we cannot resist the siren call of the Control Problem, the name AI philosophers give to the question of how or whether we will be able to control AI as it gets more and more intelligent. (Not for nothing did I name my book Crisis of Control.) A reporter contacted me to ask for comment on a newly published paper from the Max Planck Institute. The paper makes a mathematical case for the impossibility of controlling a superintelligence through an extension of the famous (in computer science, at least) Halting Problem. This is a proof given to first-year computer science students to blow their minds about how to think about programs as data, and works by establishing a contradiction: Suppose a function exists that can tell whether a program (whose source code is passed as input to the function) is going to halt. Now create a program that calls that function with itself as input, and if the function returns true, loop forever, i.e., do not halt. This program halts if it doesn’t halt, and doesn’t halt if it does halt. Smoke comes out of the computer: Paradox Alert! Therefore, no such function can exist.
You might intuit that a lot of hay can made from that line of reasoning, and this is exactly the road that MPI went down. Their paper says, “Suppose we had a function that could tell whether a program (an AI) would harm humans. Now imagine a program that calls that function on itself and if the result is false, cause harm to humans.” Boom: Paradox Alert. Therefore it is impossible in general to tell whether a given program will cause harm to humans.
That’s a narrow conclusion to hang a large amount of philosophical weight on, but the paper’s authors don’t mind going there. They invoke the AI boxing problem and the story of the Monkey’s Paw to make sure we are aware of the consequences of not being able to guarantee control of AI. This commentary was picked up by the reporter who contacted me. You can see the resulting article in Lifewire.
I supplied more input than they had room to quote, of course. I was quoted fairly, and not taken out of context, but for you, here’s the full text of what I gave the reporter:
The paper you cite extends a venerable computer science proof to the theoretical conclusion that it is impossible to prove that a sufficiently advanced computer program couldn’t harm humanity. That doesn’t mean they’ve proved that advanced AIs will harm humanity! Just that we have no assurances that they won’t. Much as when voting for a candidate for President, we have no guarantee that he won’t foment an insurrection at the end of his term.
Controlling AI is currently a quality assurance problem: AI is a branch of computer science and its products are software; unpredictability in its behavior is what we call bugs. There is a long-established discipline for testing software to find them. As AI becomes increasingly complex, its operational modes become so varied as to defy comprehensive testing. Will a self-driving vehicle start learning about the psychology of children because it needs to predict whether they will jump in front of the car? Will we need to teach that car the physics of flammable liquids so it can decide whether it can convey its passengers to safety if it encounters an overturned tanker in the road? The range of possible behavior approaches infinity. We cannot expect to keep testing manageable by limiting the possible knowledge and behavior of an AI because it is precisely that unbounded knowledge that will allow it to do what we want.
We cannot, ultimately, ensure the controllability of AI any more than we can ensure that of our children. We raise them right and hope for the best; so far they have not destroyed the world. To raise them well we need a better understanding of ethics; if we can’t clean our own house, what code are we supposed to ask AI to follow?
The problems in controlling AI right now are those of managing any other complex software; if we have a bug, what will it do, send grandma an electric bill for a million dollars? Or will image tagging software decide that African Americans are gorillas, as Google Photos did? These bugs are not the kind of uncontrollability that the paper’s authors or your readers are interested in. They want to know if and when AI will develop agency, or purposes that are clearly at odds with what its creators intended. That is known as the value alignment problem in artificial intelligence, and it does not require the AI become self-aware to be a problem. Nick Bostrom’s hypothetical paper-clip maximizer AI does not have to be conscious to wipe out the human race. But it does require a level of sophistication we do not currently know how to create. Most experts think we are decades away from that ability; your readers need not panic. We do, however, think that it is worth addressing the issue now, because the solution may also take decades to prepare.
What do you think about whether we could or should control superintelligent AI? Comment on this thread or use our contact form and maybe I can answer your question on my podcast.
“Machines take me by surprise with great frequency. This is largely because I do not do sufficient calculation to decide what to expect them to do.”
— Alan Turing
