David Louis Edelman David Louis Edelman

Google’s Instant Translation

In case they weren’t working on enough already, Reuters reports that Google is working on the ability to instantly translate documents. BabelFish on steroids, if you will.

Google logoHow would you build such a thing? You might think that Google’s programmers would try to break down languages into sophisticated formulae depicting sentence structure and grammatical rules and that kind of thing. Wrong. If I’m reading the article correctly, Google’s essentially trying to do the whole thing through pattern recognition.

This means that when you feed this sentence into the Google translator —

Nel mezzo del cammin di nostra vita
mi ritrovai per una selva oscura
ché la diritta via era smarrita.

— the program doesn’t particularly care that you’re trying to translate the opening stanzas of Dante’s Inferno. It doesn’t care about the Renaissance or Biblical allusions or Italian grammar. All it cares about is the fact that 76.4% of all Italian documents with this sentence translate it into English the way Henry Wadsworth Longfellow did:

Midway upon the journey of our life
I found myself within a forest dark,
For the straightforward pathway had been lost.

In other words, Google is treating language translation as one big black box. Let the computer figure out the complex algorithms that magically turn Italian into English; all Google cares about is the result. (Of course, you understand that I’m drastically simplifying things here.)

This is more or less the same way that the Google search engine works, and it’s proven to be a major breakthrough. When you type “Britney Spears” into the Google search box, the computer logic behind the scenes has no idea what or who a “Britney Spears” is. All it cares about is that people who are looking for that particular term are very likely looking for the website of the pop star and not some obscure brand of English toothpick. And the more searching and clicking web surfers do, the more aggregate data Google has to fine-tune its searching algorithms. The end result? Well, Google works. It’s a zillion times more effective than any other search engine, and in an astoundingly large percentage of searches you find what you’re looking for.

So how accurate is the Google translator? The Reuters article tactfully states that “the quality is not perfect” and “it is an improvement on previous efforts at machine translation,” which is a nice way of saying it kind of sucks at the moment.

But the good thing about pattern recognition is that it improves dramatically the more patterns you feed it. And, hey hey, wouldn’t ya know it — Google’s right in the middle of scanning the complete collections of a number of libraries around the world. Certainly there must be hundreds of thousands of source documents and their miscellaneous translations in the Google databases now that are just ripe for analysis.

But not only is such a system likely to improve with time — it could theoretically adapt to changes in the language too. For instance, the Google algorithm might notice that words which were once translated as “colored people” and “Negroes” are now being translated as “blacks” and “African-Americans” instead.

Some will wonder if such an effort can ever truly succeed. Could you build a perfect translator from a machine that doesn’t understand anything it’s translating?

I’m no neurological expert, but I can’t see why not. I’m a big believer in the Turing test, which basically posits that any machine that can fool you into thinking it’s intelligent is in fact intelligent. (See my earlier screed about Turing tests and mind uploading.) In the same way, it seems to me that if you feed enough data into a pattern recognition algorithm, eventually that algorithm will be much more capable of linguistic translation than any human could ever be.

Of course, there will be limitations. The Google search engine is vulnerable to various efforts of “googlehacking” and social engineering. Witness sex advice columnist Dan Savage’s successful campaign to redefine the word “santorum” from the last name of an evangelical U.S. senator to — well, you look up “santorum” on Google and see what the number one result is.

There’s also a well-known phenomenon where the top Google search results end up reinforcing their status because, well, they’re right there and everyone clicks on them. It’s an endless loop. One can imagine a similar scenario happening whenever we start relying on Google translators instead of human translators rather than as a supplement to them.

And it appears that day is coming sooner rather than later. What I can’t wait for is the day when Google opens up the API to its translation service, allowing people to do homebrew instantaneous translation mashups on the fly. That will be fun.

Comments RSS Feed

  1. Jetse on March 31, 2007 at 3:38 pm  Chain link

    Hmm, Google instant translation as Searle’s Chinese Room? Echoes of Peter Watts’s Blindsight mixed with Charlie Stross’s Accelerando, where the evil, subconscious Google offspring will take over the world.

    The end is near
    As I get another beer.

    (and that rhymes!)

  2. David Louis Edelman on March 31, 2007 at 3:59 pm  Chain link

    I’m not sure I buy Searle’s Chinese Room thing. My personal belief is that the “magic” of human thinking is much overrated. Our consciousness really isn’t much more than big, supercomplicated Google algorithms bouncing against each other.

    And btw, good to have you reading and commenting on here now, Jetse. If I had known that all I needed to do to get your ear was mention Krokus, I would have done it a long time ago. :-)

  3. Jetse on April 1, 2007 at 6:46 am  Chain link

    I don’t think there’s any “magic” in human thinking: it’s just a process we don’t fully understand as yet. Interestingly, I just read the lead article in the latest New Scientist about a ‘quantum gravity computer’ (not a quantum computer, but something even stranger than that) which also — among other things — rehashed Roger Penrose’s arguments that goes a bit like:

    a) quantum gravitational processes are uncomputable;

    and

    b) quantum gravitiational processes might lie behind human consciousness;

    thus

    c) human consciousness cannot be understood by humans themselves.

    To which my reaction is: mwah. Penrose is a superb mathemathician, but I think he’s taking some very fanciful steps here.

    For one, as far as I understand decoherence prevents any quantum mechanical processes from manifesting on an everyday scale. So why *human* brains would suddenly have overcome this, and the brains of countless other species haven’t: well, I don’t see that.

    Like you, I don’t think there’s anything ‘magical’, or ‘uncomputable’ about human thinking: there are some processes behind it we don’t fully comprehend (not sure if they’re supercomplicated Google algorithms, but I’d buy that over ‘intractable’ anytime), but will over time.

    I mean if my thoughts are facilitated by quantum gravity, then all that headbanging in my youth should have completely demolished it. Still, I think. Or I think that I think. Have I become a Chinese Room?

    😉

  4. Kevin Pirkl on April 20, 2007 at 11:52 am  Chain link

    I have checked the Google Page translate versus the drop in textarea translation services page and the results are the same with embedded HTML. You can already create screen scraping alternatives like I have at http://zombiebob.blogspot.com/2007/04/google-translator-mashup-another-gen.html that provide simple translation services (Google STS) deliverable in an AJAX style.

    I do note that the Google Translate has improved on their HTML handling though and have some safe HTML entity decoder embedded for re-rendering the return text. It works pretty well…

    Now all I need to do is build in a more intelligent parser for inline HTML handling to ensure that HTML markup gets left alone and that the content is properly parsed and the original HTML context is not lost. That is the hard part of web translation services.

    Zombie Bob

  5. Maxwell on July 16, 2008 at 7:14 am  Chain link

    Hello David,

    Great blog!
    Please feel free to visit my new website, if it is of any interest.
    It’s dedicated to providing a huge range of translation
    services, articles, case studies and information about the
    different language solutions on offer.

    You can find it at http://www.translationservices-uk.com/

    Hope to see you there and keep up the good work.

    TS

  6. Audrey on November 28, 2009 at 9:36 am  Chain link

    Yes, but that’s still flawed. Americans may translate ‘colored people’ into ‘African-Americans,’ but it would be wrong to refer a born-and-bred Ethiopian as African-American when such a person may never have been to America. Americans may have the edge through sheer number, and this may skew the results. (By the way, British black people are called ‘Afro-Caribbeans.’)

    Of course, the translation tool will still be useful for strictly literal translations, but not as much for literary ones, where allusion, wordplay/double meanings, and diction matter when creating multiple levels of meaning. A businessman might learn something useful; an arts student would be better up looking up that Longfellow.

  7. […] is another example. Its purpose: Translate text accurately from one language to the other. The mechanism: Traverse documents that have already been translated by humans, and try to compile the “best […]

  8. Daniel W on March 1, 2011 at 1:59 pm  Chain link

    Regarding the human mind, isn’t it quite possible that is basically is one (very) big adaptive pattern recognizer?

    It would be clever if google translate could recognize source material that matches it’s own and other machine translations too well, because that probably indicates that the text was already machine translated once, and would thus not be a useful contribution as to how humans translate.

Add a Comment

I don't censor comments; please don't make me have to start. You can use common HTML tags, such as <b>, <i>, <a>, and <blockquote>. Comments with more than one hyperlink automatically go into the moderation queue. Your information will not be rented or sold, ever.