Popular AI Image Generators Trained on Explicit Photos of Children, Study Shows

Disturbing revelations have emerged from a recent study, showing that popular AI image generators have been trained on a vast number of photos of child sexual abuse.

According to the Associated Press, the study, conducted by the Stanford Internet Observatory, sheds light on a critical flaw in the technology's foundation, urging companies to address this issue promptly.

Popular AI Image Generators Trained on Explicit Photos of Children, Study Shows — A new study found that popular AI image generators have been trained on a vast number of photos of child sexual abuse. Brian Penny from Pixabay

LAION Database

According to the study, a substantial number of images related to child sexual abuse, surpassing 3,200, were discovered in the Large-scale Artificial Intelligence Open Network (LAION) database.

LAION, a significant AI resource, serves as an index for online images and captions and is extensively used to train popular AI image-generating models such as Stable Diffusion.

The Stanford Internet Observatory collaborated with organizations like the Canadian Centre for Child Protection to identify and report illegal content to law enforcement.

Approximately 1,000 of the identified images were externally validated, prompting swift action. LAION, also known as the Large-scale Artificial Intelligence Open Network, reacted promptly by temporarily removing its datasets.

In a statement, the organization emphasized a zero-tolerance policy for illegal content but expressed caution by taking down datasets to ensure safety before republishing. The images in question constitute only a fraction of LAION's vast index, which comprises around 5.8 billion images.

Nevertheless, the Stanford group contends that these images likely influence the output of AI tools, potentially reinforcing the prior abuse of real victims who may appear repeatedly. One of the prominent users of LAION is Stability AI, a London-based startup that played a role in shaping the dataset's development.

While newer versions of their model, Stable Diffusion, aim to mitigate harmful content, an older version from last year, which Stability AI claims not to have released, remains in circulation and is deemed the most popular for generating explicit imagery, according to the study.

The Need for Clean Datasets

The study recommends users who built training sets from LAION-5B to delete them or work with intermediaries to clean the material. Additionally, it advocates for removing older versions of models like Stable Diffusion from legitimate platforms, preventing their download and use.

The Stanford report also raises questions about the ethics of feeding any photos of children into AI systems without their family's consent, citing concerns related to the federal Children's Online Privacy Protection Act.

Child safety organizations emphasize the need for clean datasets to develop AI models and propose implementing digital signatures, or "hashes," similar to those used to track and take down child abuse materials in videos and images, to mitigate the misuse of AI models.

"The most obvious solution is for the bulk of those in possession of LAION‐5B‐derived training sets to delete them or work with intermediaries to clean the material. Models based on Stable Diffusion 1.5 that have not had safety measures applied to them should be deprecated and distribution ceased where feasible," the study's authors noted.