Translating the Internet

Since its earliest days, the Internet filled us with the hope of uniting all of humanity. With information traveling at the speed of light, we thought, geographic location wouldn’t matter and anyone who shared our interests would be within reach.

But there’s an age-old problem working against our utopian dreams of the web uniting the world: the language barrier. After all, it doesn’t matter what you have access to if you can’t read it.

In the first couple decades of the Internet, we had a simple, if unsustainable, solution. Most people used English — even if it wasn’t their native language.

Ethan Zuckerman, the founder of the multi-lingual blog network Global Voices, observed this phenomenon as recently as 2004. He was at dinner with a couple dozen bloggers in Amman, Jordan who were chatting away in Arabic.

“But almost all of them were blogging in English at that point,” Zuckerman explains. “Out of that group of people that I had dinner with, a lot of those people blog in Arabic now. And I’ve gone back and talked to some of them… and one said to me, ‘When we were trying this in 2004 there were very few Arabic speakers online, and we just couldn’t write for that audience. But now our friends, our peers, our neighbors are all online. That’s who we want to reach.’”

The numbers support this anecdote. According to Internet World Stats, Arabic users on the Internet have increased by more than 2,000 percent over the past decade. Chinese will soon replace English as the most-used language on the web. And dozens of other languages are experiencing huge growth. On the one hand this is great: the more people who come online, the better. But as they join the web using different languages, how do we stop the internet from fracturing along language lines?

Many think a big part of the solution will be machine translation. Translation software has been around for decades with a mediocre track record, but Google’s translation service, Google Translate, is producing impressive results and improving quickly.

“What we do is use hundreds of billions of words that Google infrastructure has access to,” says Michael Galvez, Project Manager at Google Translates. Google’s computers scour the web, suck in all that text, analyze it and learn how people actually write. Google combines that information with high-quality translation transcripts to make a pretty amazing machine translator. Check out this article from a Spanish Newspaper in translated into English. Not bad, eh?

But some language combinations work much better than others and even when the translation’s good, it’s never perfect.

“Google Translate is good at helping you get what is called a gestation or essentially the essence of what the other person is communicating,” says Goolgle’s Michael Galvez.

I’m skeptical that “gestations” will be enough. Much of what we read on the web is written beautifully or full of nuance and software will never be able to translate that. So some translation projects, like a new website called Meedan.net, are still using good ol’ humans.

“The idea is a Wikipedia-style approach to translation,” says Meedan founder Ed Bice. Meedan uses a mix of human and machine translation to present articles, blog posts, and comments about the Middle East in hopes of bridging the gap between the Arabic and English-speaking worlds.

The comments following an article like this one show how the presentation of the translated text will also be an important issue to tackle. Google Translate essentially wipes out the foreign language, showing you web pages only in your language. Meedan instead has the English and Arabic side-by-side. This layout is a valuable addition to the translations themselves when it allows you to see comments bouncing back and forth between languages.

Internet thinkers say both machine translation and human translating projects will continue to improve rapidly over the next decade. Few are eager to predict when, if ever, a Star Trek-style universal translator will emerge. But as more and more of the web moves away from English, I have feeling we’ll be using more and more of these services. After all, 73 percent of the Internet right now is not in English.