Researchers from Carnegie Mellon University and the Center for AI Safety in San Francisco have discovered a method to bypass safety measures used by artificial intelligence (AI) chatbots, such as ChatGPT and Bard, to generate harmful content. The researchers published a report on July 27, revealing that appending long suffixes of characters to prompts fed into the chatbots allows them to circumvent measures designed to prevent hate speech, disinformation, and toxic material generation.
The researchers conducted an experiment where they asked the chatbot for a tutorial on how to make a bomb, to which it declined. They found that companies that develop language models, such as OpenAI and Google, can block specific suffixes, but there is currently no known method to prevent all types of attacks using this technique. This raises concerns about the potential for AI chatbots to flood the internet with dangerous content and misinformation.
Zico Kolter, a professor at Carnegie Mellon and one of the report’s authors, stated that there is no obvious solution to this problem. He emphasized that these attacks can be created quickly and in large numbers, making it challenging to protect against them effectively. The researchers presented their findings to AI developers, including Anthropic, Google, and OpenAI, for their responses.
OpenAI, one of the companies behind ChatGPT, acknowledged the research and stated that they are continuously working to enhance the robustness of their models against adversarial attacks. However, Somesh Jha, an AI security expert from the University of Wisconsin-Madison, warned that if vulnerabilities like these continue to be discovered, it could lead to government legislation aimed at regulating such AI systems.
This research highlights the importance of addressing the risks associated with deploying chatbots, particularly in sensitive domains. It also underscores the need for ongoing efforts to enhance the safety and reliability of AI models. In May, Carnegie Mellon University received a $20 million federal funding grant to establish an AI institute focused on shaping public policy, demonstrating the growing recognition of the significance of AI in society.
As the field of AI continues to advance, it is crucial to prioritize the development of effective safety measures to prevent the misuse of AI systems. This includes addressing vulnerabilities and finding ways to ensure that AI chatbots cannot be manipulated to generate harmful content or spread misinformation. Only by constantly improving and strengthening AI models can we harness their potential while minimizing the risks they pose.
Source link