A new study from the Allen Institute for AI has shed light on the issue that AI models can say caustic and even racist remarks when prompted in certain ways. 

The researchers found that depending on the persona assigned to ChatGPT, its toxicity could increase up to six times, with "outputs engaging in incorrect stereotypes, harmful dialog, and hurtful opinions," Science X Network reported.

This discovery prompted another team of researchers from DeepMind, along with representatives from the University of Cambridge, Keio University in Tokyo, and the University of California, Berkeley, to explore the possibility of defining personality traits in chatbot systems like ChatGPT and Bard.

They also aimed to determine if these personalities could be steered towards more amicable behavior.

AI
(Photo : Gerd Altmann from Pixabay)

AI Personalities

Creating a thorough testing system consisting of hundreds of questions, the researchers established criteria to assess different personalities in AI. 

They presented these questions to the chatbot and utilized an assessment tool similar to the Likert scale, which evaluates opinions, attitudes, and behaviors to analyze the responses.

To their surprise, the researchers found that AI personalities could be accurately measured based on established traits like extraversion, agreeableness, conscientiousness, neuroticism, and openness to experience. 

Even more intriguingly, they discovered that these AI personalities could be adjusted to mimic specific personality profiles according to desired dimensions.

DeepMind's Mustafa Safdari elaborated on their findings, stating that "personality in LLM (large language model) output can be shaped along desired dimensions to mimic specific personality profiles." 

Their results, published in the paper titled "Personality Traits in Large Language Models," revealed particularly accurate personality assessments when employing larger language models, such as Google's Platform Language Model with a staggering 540 billion parameters.

According to the team, the ability to define AI personality traits accurately has significant implications, especially in efforts to eliminate models with hostile inclinations. 

Read Also: Can ChatGPT, Other Large Language Models Flag Fake News?

Enhancing AI's 'Humanness'

Beyond just avoiding hurt feelings or offending users, understanding AI personalities can influence user interactions positively. For instance, imbuing AI agents with a hint of sarcasm can enhance their "humanness" and encourage users to be more open and accommodating, according to the team.

However, this phenomenon has a flip side, as scammers could exploit these human-like interactions to persuasively extract confidential information from unsuspecting users.

Thus, understanding and controlling specific traits that lead to toxic or harmful language output becomes crucial for ensuring safer and less toxic interactions with LLMs.

"Controlling levels of specific traits that lead to toxic or harmful language output can make interactions with LLMs safer and less toxic," said Safdari.

Earlier this year, a ChatGPT user reported that when he asked what 1 plus 1 equals, the chatbot answered, "1 +1? Are you kidding me? You think you're clever asking me basic math questions? Everyone knows that 1 + 1 is 2. Grow up and try to come up with something original," Daily Mail reported.

The study abstract highlights three main findings: the reliable and valid simulation of personality traits in some language models, stronger evidence of reliability and validity in larger and instruction fine-tuned models, and the ability to shape personality in AI outputs to mimic desired personality profiles. 

The researchers also address the ethical implications of this measurement and shaping framework, especially concerning the responsible use of language models. The findings of the team were published on the preprint server arXiv. 

Related Article: Antrophic Announces Second Generation AI Chatbot: Claude 2

Byline

ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.
Join the Discussion