You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Multilingual Interactions through Machine Translation—Numbers from Socl

October 4th, 2013 by andresmh

For the past two years, social media platforms have been rolling out machine translation in the hopes of enabling multilingual interactions. However, the people interacting in these platforms often know each other already, and have a language in common (i.e., friends). But what happens when machine translation is used to facilitate interactions among strangers, who perhaps have common interests but not a common language?

The earliest social media platform to enable machine translation was probably Facebook, which began autotranslating conversations in Facebook pages (a good place to start given that Pages are more likely to bring together heterogeneous languages). Likewise, Google+ and Twitter later released similar features, enabling, for example, Spanish-speaking Twitter users to read the tweets from the now toppled Egyptian president Muhammad Morsi, translated from Arabic to Spanish:



How often do these types of multilingual interactions occur, though? Ethan Zuckerman posed a similar question when wondering how often people use their browsers’ machine translation to pay attention to content outside their immediate reach.

With that in mind, we decided to look into some numbers using data from our own social media platform: Socl, which started offering machine translation since last year. Socl, like Twitter, often brings strangers together who might not speak the same language, example:


Multilingualism in Socl
In the 3 months of Socl data we looked at, we found more than 6,000 multilingual posts: threads like the one above, where the language of one or more of the comments, or the thread-starter, were different—presumably representing people being able to communicate with people in other languages through machine translation.
We found that most multilingual threads (85%) are contain two languages, and the remaining 15% have 3 or more languages, up to a handful of threads with 5 languages in one single thread.
Furthermore, the majority of multilingual posts involved English and some other language, with English-Portuguese and English-Spanish being the most common pairings among bilingual threads:
These  numbers reflect the demographics of Socl itself, as almost half of the visitors come form outside the US, mainly from Brazil, India, and Germany.
It is important to note that these numbers are produced using automatic language detection, which, while it has improved a lot in the past few years, still fails when dealing with emoticons and other unusual Internet lingo.
More work is needed to understand the degree to which machine learning can support deep cross-language communication, but providing seamless automatic translations appears to be working across different platforms. That said, language is just one barrier, cultural is a much more difficult one to address, especially through algorithmic methods.

Many thanks to Elena Agapie, James van Eaton, and Bruce Haly, for helping with this post.

Comments are closed.