Google’s Instant Translation

In case they weren’t working on enough already, Reuters reports that Google is working on the ability to instantly translate documents. BabelFish on steroids, if you will.

Google logoHow would you build such a thing? You might think that Google’s programmers would try to break down languages into sophisticated formulae depicting sentence structure and grammatical rules and that kind of thing. Wrong. If I’m reading the article correctly, Google’s essentially trying to do the whole thing through pattern recognition.

This means that when you feed this sentence into the Google translator —

Nel mezzo del cammin di nostra vita
mi ritrovai per una selva oscura
ché la diritta via era smarrita.

— the program doesn’t particularly care that you’re trying to translate the opening stanzas of Dante’s Inferno. It doesn’t care about the Renaissance or Biblical allusions or Italian grammar. All it cares about is the fact that 76.4% of all Italian documents with this sentence translate it into English the way Henry Wadsworth Longfellow did:

Midway upon the journey of our life
I found myself within a forest dark,
For the straightforward pathway had been lost.

In other words, Google is treating language translation as one big black box. Let the computer figure out the complex algorithms that magically turn Italian into English; all Google cares about is the result. (Of course, you understand that I’m drastically simplifying things here.)

This is more or less the same way that the Google search engine works, and it’s proven to be a major breakthrough. When you type “Britney Spears” into the Google search box, the computer logic behind the scenes has no idea what or who a “Britney Spears” is. All it cares about is that people who are looking for that particular term are very likely looking for the website of the pop star and not some obscure brand of English toothpick. And the more searching and clicking web surfers do, the more aggregate data Google has to fine-tune its searching algorithms. The end result? Well, Google works. It’s a zillion times more effective than any other search engine, and in an astoundingly large percentage of searches you find what you’re looking for.

So how accurate is the Google translator? The Reuters article tactfully states that “the quality is not perfect” and “it is an improvement on previous efforts at machine translation,” which is a nice way of saying it kind of sucks at the moment.

But the good thing about pattern recognition is that it improves dramatically the more patterns you feed it. And, hey hey, wouldn’t ya know it — Google’s right in the middle of scanning the complete collections of a number of libraries around the world. Certainly there must be hundreds of thousands of source documents and their miscellaneous translations in the Google databases now that are just ripe for analysis.

But not only is such a system likely to improve with time — it could theoretically adapt to changes in the language too. For instance, the Google algorithm might notice that words which were once translated as “colored people” and “Negroes” are now being translated as “blacks” and “African-Americans” instead.

Read more