Amazon Researchers Find 'Shocking Amount' of Faulty Machine Translations in the Web

In collaboration with the University of California, Santa Barbara, researchers at the Amazon Web Services Artificial Intelligence Lab have uncovered a substantial prevalence of faulty machine translations across the web, raising concerns about the reliability and quality of content generated through artificial intelligence (AI).

"The low quality of these ... translations indicates they were likely created using machine translation," the authors wrote. "Our work raises serious concerns about training models such as multilingual large language models on both monolingual and bilingual data scraped from the web."

Amazon Researchers Find 'Shocking Amount' of Faulty Machine Translations in the Web — Researchers at the Amazon Web Services Artificial Intelligence Lab have uncovered a substantial prevalence of faulty machine translations across the web. Mohamed Hassan from Pixabay

Analyzing 6 Billion Sentences Online

According to Tech Xplore, after analyzing over six billion sentences online, the researchers discovered that more than half had undergone translation into two or more languages, with a significant portion exhibiting poor translation quality.

Moreover, the study highlighted a concerning trend: as these translations underwent further iterations - up to eight or nine languages - the quality deteriorated markedly.

In their report titled "A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism," the researchers expressed apprehension about using multilingual large language models on both monolingual and bilingual data scraped from the web.

The study revealed that texts are not only being translated by AI but also being created by AI. The AI-generated translations were particularly highest in lower-resource languages, such as Wolof and Xhosa, which are African languages.

The researchers found that highly multi-way parallel translations are significantly lower quality than two-way parallel translations, which means regions under-represented on the web, such as African countries and other nations with more obscure languages, will face more significant challenges in establishing reliable AI large language models.

They must heavily rely on tainted translations flooding the market due to the lack of native resources to draw upon.

Mehak Dhaliwal, a former applied science intern at Amazon Web Services, noted that colleagues working with machine training in low-resource languages observed a pervasive presence of machine-generated content in their native languages on the internet. Dhaliwal cautioned users to be aware that machines might generate the content encountered on the web.

Bias in Selecting Content for AI Training

The researchers also identified bias in selecting content for AI training, with machine-generated, multi-way parallel translations dominating the total translated content in lower-resource languages.

According to the researchers, this content, often more straightforward and lower in quality, is speculated to be produced for ad revenue generation, contributing to the potential spread of inaccurate information.

The study's findings underscore the challenges posed by machine-generated translations, highlighting concerns about the accuracy, fluency, and reliability of content generated through AI systems.

While the prevalence of machine-generated content continues to grow, it becomes crucial to address the associated issues to ensure the integrity of information accessible on the web. The study's findings were published in arXiv.

Tags:AI Amazon Amazon Web Services Artificial Intelligence

Join the Discussion

Amazon Researchers Find 'Shocking Amount' of Faulty Machine Translations in the Web

After analyzing six billion sentences online, the researchers found a "shocking amount" of faulty machine translations.

Analyzing 6 Billion Sentences Online

Bias in Selecting Content for AI Training

Coca-Cola Faces Intense Backlash For Using AI Again For Its 2025 Christmas Ad: 'Disgusting'

Elon Musk's $1 Trillion Tesla Pay Proposal Hits Resistance from Norway's Mega Fund

M5 Ultra Mac Studio Potential Release Date, Specs, Features, Price and Everything We Know So Far

Thriving in the Digital Age: Mental Wellness Strategies to Stay Balanced Online and Offline

Biohacking Explained: Essential Wellness Safe Practices and Health Tips for Optimizing Your Body and Mind