AI Training Data Contains Child Sexual Abuse Images, Discovery Points to LAION-5B

There have been significant problems with AI's training data, with various complaints already filed by those who claimed their work was stolen, but the most recent discovery saw child sexual abuse images in their dataset. In a recent study, the large open dataset known as LAION-5B contained these illegal and sensitive materials, best known for being used by a famous AI platform.

Massive disputes against AI have been present since it debuted, from the unlicensed and unpermitted access to online data, down to the sensitive information it used.

AI Training Data Contains Child Sexual Abuse Images

A new report from the Stanford Internet Observatory (SIO) and its researcher David Thiel uncovered an alarming case of AI training data that contained more than 1,000 child sexual abuse materials (CSAM). This latest discovery corroborates the rumor from 2022, with claims that the LAION-5B also features illegal images in its dataset made available to many.

The rumors from before (via Bloomberg) centered on fears regarding the wide access of AI, now confirmed in the recent findings of the study.

Thiel regarded via Ars Technica that the availability of these child sexual abuse images on AI models may enable to create "new, potentially realistic child abuse content."

LAION-5B Dataset is Used by a Known AI Platform

That being said, the LAION-5B is a renowned open dataset that is best known for being the tool used by Stable Diffusion 1.5, with the investigation claiming that these models were trained directly on CSAM.

LAION-5B's dataset has billions of images taken from renowned social media websites including Reddit, WordPress, X, and Blogspot. It also contained materials from known adult video sites.

It was regarded that LAION is removing datasets from the internet as part of its "zero tolerance policy," but will be republished after verification.

AI's Training Data and Access to Online Info

For a long time, one of the top issues against artificial intelligence has been security, and this is because it trains in the world's massive data, particularly the internet, for it to be able to create what it delivers for all. After significant disputes, different companies have taken it upon themselves to make AI models safe, with OpenAI also announcing their new "Preparedness Framework" for it.

While some want to use AI for good, there is a bad side to it where threat actors use it for malicious attacks, with the technology prone to these undertakings.

There have been massive investigations into AI by different countries, particularly with its access to personal data which it gets online, and the issue of licenses is still present.

Data and information are massive on the internet, but there is also a bad side of the web which centers on abusive and illegal content, including the lowest ones you could think of. That being said, the recent discovery of child sexual abuse materials on the LAION-5B dataset is one alarming case, especially as Stable Diffusion 1.5 is known for using it.