Online translation
December 20, 2012The Internet connects the world, but most people are walled off from each other by language barriers.
So free online machine translators like Google Translate, Microsoft's Bing and Systran are a godsend.
More than 200 million global users click onto Google Translate alone every month, according to Franz Josef Och, who heads the search engine’s machine translation group.
"Most of the translation on the planet is now done by Google Translate," he wrote on the company’s blog earlier this year.
Still, Och has no illusions about the challenges of producing readable machine-translated text.
He believes that there's a lot of work to be done.
"If the webpage is in French and you don't speak French, the (machine) translation is not as pleasant as a human translation, but you can understand it," he told DW.
It’s a question of probability
Google Translate takes existing human translations on the web and uses statistical algorithms to compute the most probable match between two parallel texts in a given language pair.
This method is based on a breakthrough in machine translation developed by a team of IBM physicists in the 1990s. Google’s competitor, Microsoft’s Bing, also uses the same method.
A good example of how both Google and Bing work is the translation German Christmas greeting, "Wir wünschen Euch ein frohes Weihnachtsfest und einen guten Rutsch ins neue Jahr!"
Google and Bing produce, "We wish you a Merry Christmas and a Happy New Year!"
But Systran, one of the world’s machine translation systems, produces an awkward translation, "We wish you a glad Christmas and a good slide in the new year!"
Both Google and Bing’s result is the most probable one – it trumped all other matches in the two search engines’ databases.
One of the main advantages of machine translation based on linguistic rules is that it gives the users more control on the results, according to Aljoscha Burchardt, a machine translation expert at the German Research Center for Artificial Intelligence (DFKI) in Berlin.
"You know what the rule-based system is doing, but you need experts for going into a new subject domain, which requires a new lexicon and a new way of constructing sentences," he told DW.
But that’s not the case for probability-based systems like Google and Bing.
"Certain subject domains produce good results, such as parliamentary debates where tons of text translated by EU or UN professionals exist in many language combinations," Burchardt said.
The trouble with Google Translate
The quality of Google's output declines drastically in subject areas where the availability of translated text for parallel match-ups is scarce like zoology or hunting, he noted.
But even worse is the fact that there so many ambiguous words that have double or multiple meanings, which rely heavily on the context.
With the word "bank," one could be referring to a "savings bank" or a "river bank," but the user can’t "bank" on Google picking the latter for a hunting text, such as "The ducks emerged from the river and the hunter could see them by the bank."
Statistical machine translation depends on billions of translated sentences to search for parallel matches in a split second. And the UN and EU are major sources of professional quality translations. But there are also plenty of poor human and machine translations on the web, and a machine can't make judgement calls.
In addition, Google can’t distinguish between human translations and those generated by its own system.
"Lots of people are putting Google Translate on the web. As a result, our system 'learns' from our own translation, which obviously we don't want," Google's Och said.
His team is working on algorithms to weed out bad machine translations.
Also, Google is improving its translations with a bit of help from users who tend to translate text into their mother tongue. A new feature asks them to pick the better translation among two or more options.
"The good thing is that people have common sense that helps them use Google Translate, because even if it formulates something somewhat weirdly, you can still understand what is meant," said Och.