Google’s Instant Translation

In case they weren’t working on enough already, Reuters reports that Google is working on the ability to instantly translate documents. BabelFish on steroids, if you will.

Google logoHow would you build such a thing? You might think that Google’s programmers would try to break down languages into sophisticated formulae depicting sentence structure and grammatical rules and that kind of thing. Wrong. If I’m reading the article correctly, Google’s essentially trying to do the whole thing through pattern recognition.

This means that when you feed this sentence into the Google translator —

Nel mezzo del cammin di nostra vita
mi ritrovai per una selva oscura
ché la diritta via era smarrita.

— the program doesn’t particularly care that you’re trying to translate the opening stanzas of Dante’s Inferno. It doesn’t care about the Renaissance or Biblical allusions or Italian grammar. All it cares about is the fact that 76.4% of all Italian documents with this sentence translate it into English the way Henry Wadsworth Longfellow did:

Midway upon the journey of our life
I found myself within a forest dark,
For the straightforward pathway had been lost.

In other words, Google is treating language translation as one big black box. Let the computer figure out the complex algorithms that magically turn Italian into English; all Google cares about is the result. (Of course, you understand that I’m drastically simplifying things here.)

This is more or less the same way that the Google search engine works, and it’s proven to be a major breakthrough. When you type “Britney Spears” into the Google search box, the computer logic behind the scenes has no idea what or who a “Britney Spears” is. All it cares about is that people who are looking for that particular term are very likely looking for the website of the pop star and not some obscure brand of English toothpick. And the more searching and clicking web surfers do, the more aggregate data Google has to fine-tune its searching algorithms. The end result? Well, Google works. It’s a zillion times more effective than any other search engine, and in an astoundingly large percentage of searches you find what you’re looking for.

So how accurate is the Google translator? The Reuters article tactfully states that “the quality is not perfect” and “it is an improvement on previous efforts at machine translation,” which is a nice way of saying it kind of sucks at the moment.

But the good thing about pattern recognition is that it improves dramatically the more patterns you feed it. And, hey hey, wouldn’t ya know it — Google’s right in the middle of scanning the complete collections of a number of libraries around the world. Certainly there must be hundreds of thousands of source documents and their miscellaneous translations in the Google databases now that are just ripe for analysis.

But not only is such a system likely to improve with time — it could theoretically adapt to changes in the language too. For instance, the Google algorithm might notice that words which were once translated as “colored people” and “Negroes” are now being translated as “blacks” and “African-Americans” instead.

Some will wonder if such an effort can ever truly succeed. Could you build a perfect translator from a machine that doesn’t understand anything it’s translating?

I’m no neurological expert, but I can’t see why not. I’m a big believer in the Turing test, which basically posits that any machine that can fool you into thinking it’s intelligent is in fact intelligent. (See my earlier screed about Turing tests and mind uploading.) In the same way, it seems to me that if you feed enough data into a pattern recognition algorithm, eventually that algorithm will be much more capable of linguistic translation than any human could ever be.

Of course, there will be limitations. The Google search engine is vulnerable to various efforts of “googlehacking” and social engineering. Witness sex advice columnist Dan Savage’s successful campaign to redefine the word “santorum” from the last name of an evangelical U.S. senator to — well, you look up “santorum” on Google and see what the number one result is.

There’s also a well-known phenomenon where the top Google search results end up reinforcing their status because, well, they’re right there and everyone clicks on them. It’s an endless loop. One can imagine a similar scenario happening whenever we start relying on Google translators instead of human translators rather than as a supplement to them.

And it appears that day is coming sooner rather than later. What I can’t wait for is the day when Google opens up the API to its translation service, allowing people to do homebrew instantaneous translation mashups on the fly. That will be fun.