Wikipedia has finally taken a stance against companies that scrape data from their website, particularly those that use it for training their AI models without consent, compensation, or permission from the company.

The Wikimedia Foundation's latest statement is not exactly gatekeeping Wikipedia's mounds of data and information that is already available on their website, but they are preventing the abusive practices.

For those who still want to use or access Wikipedia's data for AI training and other purposes, the company is offering paid API access to those wanting to unlock its potential.

Wikipedia Wants Companies to Stop Scraping for AI Training

After already sharing a statement earlier this year, the Wikimedia Foundation has shared a new blog post that details its solid stance against illegal data scraping that is being done by some companies on its platform.

The Wikimedia Foundation talked about how valuable its company is for AI companies as the data available on its platforms is all human-generated content, and this is significant content for AI training.

They further talked about how human-generated data remains irreplaceable to this day, and this made them a target for data scraping cases over the years.

Want Wikipedia's Data? Pay For API Access

While Wikipedia relies on volunteer data to deliver its free-to-access content for users worldwide, the extensive work done to set up pages that deliver information still relies on human creativity, efforts, and knowledge, which AI cannot replicate.

Moreover, Wikipedia does not rely on ads to fund its continued services; rather, it uses donations from users globally.

With this, the Wikimedia Foundation denounces illegal data scraping and want AI companies scraping data from them to pay for their access via their latest Enterprise API.

The Wikimedia Foundation invites companies to support its endeavors and mission to deliver quality human content and to avoid illegal scraping by following the proper processes.

AI Scraping and Paid API Access

Reddit is one of the top companies that started charging entities looking to train their AI and models using the human-generated content.

There have been many issues about illegal data scraping by Big Tech companies for their AI training needs, and some have been called out for allegedly doing so. Reddit once called out Microsoft for illegally accessing their data for the software company's AI training and has since sought to have them pay for access to their API.

Last year, Midjourney enforced a company-wide ban against using Stability AI's technologies as the company allegedly caught its rival scraping data from their systems, causing a massive outage on their tech.

Data scraping has been a notorious practice in the AI industry over the past years as there were companies that faced court battles for their practices to advance their AI models.

That said, some entities offer paid API access to legitimize scraping, making it a mutually beneficial relationship where the ones being scraped earn from their resources, and the ones scraping get the data they need for AI training.