The Pith of Performance: Is the Turing Test Tough Enough?

In the recent GDAT class, we covered machine learning (ML) applied to performance data analysis and particularly the application of so-called support vector machines. In that section of the course I have to first explain what the word "machine" means in the context of ML. These days the term machine refers to software algorithms, but it has its roots in the development of AI and the history of trying to build machines that can think. That notion of intelligent machines goes back more than sixty years to Alan Turing, who was born a hundred years ago today.

The Turing Test (TT) was introduced as "the imitation game" in Computing Machinery and Intelligence, Mind, Vol. 59, No. 236, pp. 433-460 (1950):

The new form of the problem can be described in terms of a game which we call the "imitation game." It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart from the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either "X is A and Y is B" or "X is B and Y is A." The interrogator is allowed to put questions to A and B.

Although TT is relatively easy to state, it may be harder than Turing anticipated to implement in a definitive way, given the distributed computer technologies that didn't exist in Turing's time. In some sense, the Web is the largest human distributed supercomputer ever devised. IBM's "Watson" machine recently beat human champions on the Jeopardy TV game show. Machines today can even ask you (the human) questions as part of checking its own "understanding," e.g., consider voice-activated phone assistants that attempt to direct you though the usual labyrinth of menu options.

On Turing's birthday it seems both fitting and interesting to reflect on where things stand more than 60 yrs after he proposed his test.

So, what does it mean to not be able to distinguish between a human and a machine? It would seem to be imply that the machine will always tend to produce expected responses. There seems to be a bias in TT toward testing how the machine handles unexpected input from the human. How we might catch it out, so to speak.

But what if the response from the machine is unexpected? How do you decide whether an unexpected response is either valid or proof that the machine has failed the test? In fact, this even raises the question: what constitutes a proof? Is the machine demonstrating that it's actually smarter than us or did TT just uncover a bug in the machine? Similar unexpected responses is part of A. C. Clarke's 2001: A Space Odessey, where the HAL 9000 machine decides to mutiny.

So, sixty years on, we are indeed interacting with machines in our daily lives without always knowing it:

I've already mentioned speech-recognizing assistants in phone menus. Since these machines are still brittle, they can't handle my unexpected responses. I often provide persistently unexpected answers to force the machine to bypass itself to a human. This type of machine seems to consistently fail TT.
Credit card fraud detection seems to be much more reliable. These machines employ ML to look for exceptions to historical purchase patterns; something humans could never do, given the shear deluge of data. The machine then calls my machine (my voice mail) and asks me to call the fraud unit. In the past twenty years, I think I've received one or possibly two false positives. This type of machine seems to consistently to pass TT, in the sense that I'm not aware of the fact that it is a machine that is doing most of the work.

It seems to me that a truly intelligent machine will also produce unexpectedly valid responses. Very little seems to be said about this aspect, perhaps because of the huge effort that is usually required to produce anything that resembles machine intelligence in the first place. Typically, a huge effort is required just to reach the TT threshold, let alone exceed it.

Here are some examples of what I mean:

GPS. When I got my first car navigator unit, I had to learn to stop fighting its advice. It's routes were better than the ones I knew, but it took a while to get used to that response.
Digits of π. The discovery of the BBP formula in 1995 came from a machine; or with the assistance of a machine. The first surprise is that it computes the n-th digit of π without calculating the preceding digits; something previously thought to be unachievable due to the random nature of the digits. The second surprise is that the correct formula is expressed in hexadecimal digits; not the typical working base of a number theorist, but de rigueur for a machine.
A Ramanujan machine? Srinivas Ramanujan was a mathematical prodigy, but not a calculating savant. He was a kind of human Turing machine. Like a Turing tape, Ramanujan used a simple chalk and slate that got erased every alternate turn. He derived many unexpected and deep results in number theory but the paths he took to get there were permanently lost as he derived them. The best contemporary number theorists in Britain at the time had great difficulty in formally proving many of Ramanujan's results. What if Ramanujan had been a real machine?
What constitutes a proof? TT has ramifications for what it means to prove something mathematically. What is acceptable? To how many experts? I'm reminded of Andrew Wiles' proof of Fermat's Last Theorem. After seven years work with consultation among some of his peers, a bug was finally discovered in his original proof. It took him another two years to patch it up. Proof of Kepler sphere-packing conjecture was originally rejected because it incorporated a computer program. Which, in itself, is Turing-ironic.

Performance analysis has it's roots in the comparison between man and machine. Can a man or woman be made to work as efficiently as a machine? Often this question arises in the more Draconian context of assembly-line workers—during the early twentieth century in the USA, and Foxconn in China today. Time and motion performance analyses, instituted by Frederick Winslow Taylor, were ultimately published in his book The Principles of Scientific Management.

The Pith of Performance

Saturday, June 23, 2012

Is the Turing Test Tough Enough?

No comments: