2013-05-28

Co.Labs

Google Translate's Gender Problem (And Bing Translate's, And Systran's...)

Google Translate and other popular translation platforms often provide unintentionally sexist translations where, among other things, doctors are men and teachers are women. The reason why has to do with a complex mix of algorithms, linguistics, and source materials.



Google Translate is the world's most popular web translation platform, but one Stanford University researcher says it doesn't really understand sex and gender. Londa Schiebinger, who runs Stanford's Gendered Innovations project, says Google's choice of source databases causes a statistical bias toward male nouns and verbs in translation. In a paper on gender and natural language processing, Schiebinger offers convincing evidence that the source texts used with Google's translation algorithms lead to unintentional sexism.

Machine Translation And Gender

In a peer-reviewed case study published in 2013, Schiebinger illustrated that Google Translate has a tendency to turn gender-neutral English words (such as the, or occupational names such as professor and doctor) into the male form in other languages once the word is translated. However, certain gender-neutral English words are translated into the female form . . . but only when they comply with certain gender stereotypes. For instance, the gender-neutral English terms a defendant and a nurse translate into the German as ein Angeklagter and eine Krankenschwester. Defendant translates as male, but nurse auto-translates as female.

Where Google Translate really trips up, Schiebinger claims, is in the lack of context for gender-neutral words in other languages when translated into English. Schiebinger ran an article about her work in the Spanish-language newspaper El Pais into English through Google Translate and rival platform Systran. Both Google Translate and Systran translated the gender-neutral Spanish words "suyo" and "dice" as "his" and "he said," despite the fact that Schiebinger is female.

These sorts of words bring up specific issues in Bing Translate, Google Translate, Systran, and other popular machine translation platforms. Google engineers working on Translate told Co.Labs that translation of all words, including gendered ones, is primarily weighed by statistical patterns in translated document pairs found online. Because "dice" can translate as either "he said" or "she said," Translate's algorithms look at combinations of "dice" in conjunction with neighboring words to see what the most frequent translations of those combinations are. If "dice" renders more often in the translations Google obtains as "he says," then Translate will usually render it male rather than female. In addition, Google Translate's team added that their platform only uses individual sentences for context. Gendered nouns or verbs in neighboring sentences aren't weighed in terms of establishing context.

Source Material, Cultural Context, And Gender

Schiebinger told Co.Labs that the project evolved out of a paper written by a student who was working on natural language-processing issues. In July 2012, a workshop was held at Stanford University with outside researchers that was turned, post-peer review, into the machine translation paper.

Google Translate, which faces the near-impossible goal of accurately translating the world's languages in real time, has faced gender issues for years. To Google's credit, Mountain View regularly tweaks Google Translate's algorithms to fix translation inaccuracies. Language translation algorithms are infamously tricky. Engineers at Google, Bing, Systran, and other firms don't only have to take grammar into account—they have to take into account context, subtext, implied meanings, cultural quirks, and a million other subjective factors . . . and then turn them into code.

But, nonetheless, those inaccuracies exist—especially for gender. In one instance last year, users discovered that translating "Men are men, and men should clean the kitchen" into German became "Männer sind Männer, und Frauen sollten die Küche sauber"—which means "Men are men and women should clean the kitchen." Another German-language Google Translate user found job bias in multiple languages—the gender-netural English language terms French teacher, nursery teacher, and cooking teacher all showed up in Google Translate's French and German editions in the feminine form, while engineer, doctor, journalist, and president were translated into the male form.

Nataly Kelly, author of Found In Translation: How Languages Shape Our Lives And Transform The World, whose firm offers language-technology products, told Co.Labs that a male bias in machine translating is extremely common. "If you're using a statistical approach to produce the translation, the system will mine all past translations and will serve up the most likely candidate for a "correct" translation based on frequency. Given that male pronouns have been over-represented throughout history in most languages and cultures, machine translation tends to reflect this historical gender bias," Kelly said.

"The results can be highly confusing, even inaccurate. For example, in Google Translate, if you translate engineer into Spanish, it comes out as the masculine ingeniero, but if you put in female engineer, you get ingeniero de sexo feminino, which means something like a male engineer of the feminine sex. This sounds pretty strange in Spanish, to say the least! If you type female engineer into Bing Translate, you get ingeniera, which is technically correct. But still, you have to specify female in order to produce a feminine result. You don't have to specify male engineer to get ingeniero. You only need to type in engineer. [There is] an inherent gender bias in most machine translation systems."

The Statistical Nature Of The Corpus

The reason why this happens is statistical. In every language that Google Translate operates in, algorithms process meaning, grammar, and context through a massive number of previously uploaded documents. These documents, which vary from language to language, determine how Google Translate actually works. If source material used for translations has an aggregated bias in terms of one gender being preferred over another, that will be reflected in translations received by users.

When a user on Google Groups questioned male gender bias in Hebrew translations in 2010, Google's Xi Cheng noted that "Google Translate is fully automated by machine; no one is explicitly imposing any rules; the translation is generated according to the statistical nature of the corpus we have."

According to Schiebinger, machine translation systems such as Google Translate use two separate kinds of corpuses. A "parallel corpus" with text in one language that is used to compare a translation in another language, while a large monolingual corpus in the target language being translated to is used to determine grammar and word placement. If masculine or feminine forms of words are systematically favored in the corpus used, it leads the algorithm to translate in favor of that particular gender.

Machine translation ultimately depends on translators and linguists giving context to both algorithms and the source material they use. Google Translate, Bing Translate, and Systran all do a stunning job of providing instant translations in a staggering array of languages. The challenge for translation platform developers is how to further refine their product and increase accuracy—something we'll see more of in the future.

[Teacher Image: Everett Collection via Shutterstock]






Add New Comment

7 Comments

  • Hannah Grap

    Great article, Neal. I recently talked about this topic with our Chief Science Officer at SDL (Daniel Marcu) who is also a leading researcher in machine translation. He though you did a great job of articulating the gender bias in SMT systems, and added that the observation that SMT engines are gender biased does not reflect primarily a technology deficiency but, rather, a social state of affairs; texts produced during the last 30 years, which SMT engines use for training, are gender biased on the male side. If 85% of the engineers in a geography/language are today males, using the male engineer equivalent in that language will most likely be correct in 85% of the cases.  He also believes that this will self-correct in the coming years.  Pew Research has recently reported that women are now breadwinners in 40% of the US households, a major change compared to the 1960s, when women were the breadwinners in only 10% of the US households. As these social changes expand further in the US and globally, SMT engines will likely catch up and get in sync with the then current state of affairs.  Could be an interesting topic for a future article!

  • Azzurra Camoglio

    The translation into German of the example should be “Männer sind Männer, und Frauen sollten die Küche sauber machen”, not only "sauber". The verb is missing otherwise.

  • Pierrot74

    I have found when going from English to French, that due to word order sometimes being an issue. The translation often comes out in the wrong order as there might have been a following the Eglish word order. More oftern I've found this when Conditional or Past tenses are used.

  • CalvinballPro

    Why isn't it the languages' problem for having gendered nouns in the first place?  Gender-neutral nouns would fix everything. 

  • Sultan Hekmat

    Here in Afghanistan, still we don't use these services for translating a full sentence or paragraph, only as dictionary. I am sure, if I use, I will spot many mistakes or vocabulary usage and ... Also I think it is a matter of right to left issue, as we write Farsi (Persian).

  • Pjack22

    I appreciate this article. My company does business in South America and a lot of times I have to use one of these translation services for longer emails and docs. I've noticed these issues many times.

    I used Google Translate for a long time but switched to Bing as I thought it seemed to work better. Which translation service do you feel works best?

  • fvguima

    The best translation service is that provided by a human professional, and it is often surprisingly affordable. I strongly advise you to stop entrusting your business to a technology that is still incipient and often produces wildly inaccurate results.