What Comes After the Turing Test?

Over the weekend, the news broke that a “supercomputer” program called “Eugene Goostman”—an impersonation of a wisecracking, thirteen-year-old Ukranian boy—had become the first machine to pass the Turing Test. Kevin Warwick, a professor of cybernetics at the University of Reading, who administered the test, wrote, “In the field of Artificial Intelligence there is no more iconic and controversial milestone than the Turing Test, when a computer convinces a sufficient number of interrogators into believing that it is not a machine but rather is a human.” Warwick went on to call Goostman’s victory “ a milestone” that “would go down in history as one of the most exciting” moments in the field of artificial intelligence.

The considerable hype around the announcement—nearly every tech blog and newspaper reported on the story—ignored a more fundamental question: What, exactly, is Eugene Goostman, and what does “his” triumph over the Turing Test really mean for the future of A.I.?

Here’s what Eugene Goostman isn’t: a supercomputer. It is not a groundbreaking, super-fast piece of innovative hardware but simply a cleverly-coded piece of software, heir to a program called ELIZA that was first developed—as a joke—in the nineteen-sixties. Users would type in their personal problems, and ELIZA, a crude simulation of a patient but nondirective therapist, would spit back responses—for example, “How does that make you feel about your family?”—without understanding the first thing about what the user had said. Many users were too narcissistic (or at least too naïve) to realize that the person they were “talking” to via teletype, an ancestor to text messaging, was a computer. Sooner or later, though, ELIZA would respond in ways that were ungrammatical, confusing, or just plain nonsense.

Goostman can maintain the illusion of human conversation for longer than ELIZA, and its lexicon expands beyond psychiatry and family conflict. Unlike ELIZA, Goostman has a working grasp of grammar and a “personality,” engineered to fool judges into thinking he is a flippant adolescent.

Here, for example, is how a slightly early version of the program responded last night when I asked about its reading habits:

Marcus: Do you read The New Yorker?

Goostman: I read a lot of books … So many—I don’t even remember which ones.

Marcus: You remind me of Sarah Palin.

Goostman: If I’m not mistaken, Sarah is a robot, just as many other “people.” We must destroy the plans of these talking trash cans!

Clever—or so it initially seems.

What Goostman’s victory really reveals, though, is not the advent of SkyNet or cyborg culture but rather the ease with which we can fool others. A postmortem of Goostman’s performance from 2012 reports that the program succeeded by executing a series of “ploys” designed to mask the program’s limitations. When Goostman is out of its depth—which is most of the time—it attempts to “change the subject if possible … asking questions, steer[ing] the conversation, [and] occasionally throw[ing] in some humour.” All these feints show up even in short conversations like the one above.

It’s easy to see how an untrained judge might mistake wit for reality, but once you have an understanding of how this sort of system works, the constant misdirection and deflection becomes obvious, even irritating. The illusion, in other words, is fleeting.

The nicest thing one can say about Eugene Goostman is that his win on Saturday should be seen as encouraging news for anyone trying to build video games. If Goostman can fool a third of its judges, the creation of convincing computer-based characters in interactive games—the next generation of Choose Your Own Adventure storytelling—may be a lot easier than anyone realized.

In terms of practical significance for artificial intelligence, though, passing the Turing Test means little. As I wrote last year last year on this site:

The winners aren’t genuinely intelligent; instead, they tend to be more like parlor tricks, and they’re almost inherently deceitful. If a person asks a machine “How tall are you?” and the machine wants to win the Turing test, it has no choice but to confabulate. It has turned out, in fact, that the winners tend to use bluster and misdirection far more than anything approximating true intelligence.

Goostman, like ELIZA, relies mainly upon pattern recognition, not genuine understanding. It is a refinement of an old idea, not a fundamental change in artificial intelligence.

Almost nobody in A.I. these days seems to aim for what Alan Turing himself envisioned—a flexible, general-purpose intelligence of the sort that human beings have, which allows any ordinary individual to master a vast range of tasks, from tying his shoes to holding conversations and mastering tenth-grade biology. In the years since Turing, many machines have mastered individual tasks, like playing chess (IBM’s Deep Blue exceeded even the best humans) and even Jeopardy (IBM’s Watson). But each such program is tailored to a particular task, and none has possessed the sort of broad intelligence that characterizes humans. No existing combination of hardware and software can learn completely new things at will the way a clever child can.

As I have learned from two decades of work as a cognitive scientist, the real value of the Turing Test comes from the sense of competition it sparks amongst programmers and engineers. So, in the hope of channelling that energy towards a project that might bringing us closer to true machine intelligence—and in an effort to update a sixty-four-year-old test for the modern era—allow me to propose a Turing Test for the twenty-first century: build a computer program that can watch any arbitrary TV program or YouTube video and answer questions about its content—“Why did Russia invade Crimea?” or “Why did Walter White consider taking a hit out on Jessie?” Chatterbots like Goostman can hold a short conversation about TV, but only by bluffing. (When asked what “Cheers” was about, it responded, “How should I know, I haven’t watched the show.”) But no existing program—not Watson, not Goostman, not Siri—can currently come close to doing what any bright, real teenager can do: watch an episode of “The Simpsons,” and tell us when to laugh.

_

Gary Marcus, a professor of cognitive science at NYU and the author of “Guitar Zero,” is a co-editor of the forthcoming book “The Future of the Brain.”