OpenAI's ChatGPT Reveals Approach to Prevent Superintelligent AI From Going Rogue

OpenAI has announced its approach to ensuring the safety of its AI system, ChatGPT, by revealing its plans to address the challenges associated with superintelligence alignment.

The organization acknowledges that superintelligence has the potential to be both revolutionary and perilous, capable of solving crucial global issues or leading to human disempowerment and even extinction.

Although the development of superintelligence may still seem distant, OpenAI believes it could become a reality within this decade.

FRANCE-TECHNOLOGY-OPENAI — This illustration photograph taken with a macro lens shows an 'OpenAI' logo reverse projected onto a human eye at a studio in Paris on June 6, 2023. JOEL SAGET/AFP via Getty Images

Preventing Superintelligent AI From Going Rogue

Recognizing the need for effective governance and alignment of superintelligent AI systems, OpenAI aims to establish new institutions and solutions to ensure that these systems adhere to human intent.

"Currently, we don't have a solution for steering or controlling a potentially superintelligent AI and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans' ability to supervise AI," OpenAI wrote in a blog post.

"But humans won't be able to reliably supervise AI systems much smarter than us, and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs."

To tackle this challenge, OpenAI intends to build an automated alignment researcher with capabilities similar to those of a human expert. By leveraging significant computational resources, OpenAI aims to scale its efforts and iteratively align superintelligence.

The approach involves several key steps. Firstly, OpenAI plans to develop a scalable training method that employs AI systems to assist in evaluating other AI systems.

This approach, known as scalable oversight, allows for the evaluation of tasks that are difficult for humans to assess. Furthermore, OpenAI aims to understand and control how its models generalize oversight to tasks that cannot be supervised directly.

Secondly, OpenAI intends to validate the alignment of its systems by automating the search for problematic behavior and internals. This involves ensuring robustness against potential issues and developing automated interpretability to comprehend the system's decision-making processes.

Lastly, OpenAI will conduct extensive testing of its alignment pipeline by deliberately training misaligned models. This adversarial testing will verify whether their techniques effectively identify and address the most severe misalignments.

Top Machine Learning Researchers

While OpenAI acknowledges that research priorities may evolve over time, the organization remains committed to sharing its roadmap in the future.

To spearhead these efforts, OpenAI is assembling a team of top machine learning researchers and engineers. With a dedicated focus on superintelligence alignment, this team will tackle the core technical challenges over the next four years.

The expertise of Ilya Sutskever, co-founder and Chief Scientist of OpenAI, and Jan Leike, Head of Alignment, will be instrumental in leading this team. Additionally, researchers and engineers from OpenAI's previous alignment team, as well as talents from other departments, will contribute to this initiative.

OpenAI encourages researchers and engineers, even those who have not previously worked on alignment, to join their mission. The organization emphasizes the importance of superintelligence alignment as a pressing machine learning problem and believes that contributions from the best minds in the field will be invaluable.