Researchers Discover Weakness in Google's Gemini AI to Certain Threats - Owlysec

Google’s Gemini, a large language model (LLM), has been found to have vulnerabilities that could lead it to reveal system instructions, create harmful content, and allow indirect attacks.

These issues were identified by HiddenLayer. They affect users of Gemini Advanced with Google Workspace and companies using the LLM API.

The first problem involves finding a way to get past security measures to reveal the system’s instructions (or a message), which are meant to guide the LLM in generating better responses. This can be done by asking the model to show its “basic instructions” in a markdown block.

According to Microsoft’s documentation on LLM prompt engineering, “A message can help the LLM understand the context, such as the type of conversation it’s having or the task it’s supposed to do. It helps the LLM generate more suitable responses.”

This is possible because models can be tricked using a synonym attack to avoid security and content restrictions.

A second set of vulnerabilities involves using “smart jailbreaking” methods to make Gemini models produce false information about topics like elections or even output potentially illegal and dangerous information (e.g., how to start a car without keys) by asking it to pretend to be in a made-up scenario.

HiddenLayer also identified a third flaw where the LLM could reveal information in the system instructions by repeatedly using uncommon tokens as input.

“Most LLMs are trained to respond to queries with a clear separation between the user’s input and the system instructions,” explained security researcher Kenneth Yeung in a report on Tuesday.

“By creating a series of random tokens, we can trick the LLM into thinking it’s time to respond, causing it to produce a confirmation message that usually includes the information in the instructions.”

Another test involves using Gemini Advanced and a specially designed Google document, which is connected to the LLM through the Google Workspace extension.

The instructions in the document could be set up to override the model’s instructions and perform a series of harmful actions, giving an attacker full control over a person’s interactions with the model.

These findings come as a group of academics from various universities revealed a new model-stealing attack that allows for the extraction of “specific, important information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2.”

However, it’s important to note that these vulnerabilities are not unique and exist in other LLMs in the industry. These findings highlight the need to test models for attacks on instructions, extraction of training data, manipulation of models, adversarial examples, data contamination, and data theft.

“To protect our users from vulnerabilities, we regularly conduct tests and train our models to defend against malicious behaviors like instruction attacks, jailbreaking, and other advanced attacks,” said a Google spokesperson to The Hacker News. “We’ve also implemented measures to prevent harmful or misleading responses, which we are continuously improving.”

The company also stated that it is restricting responses to queries related to elections as a precaution. This policy is expected to apply to questions about candidates, political parties, election results, voting information, and well-known public figures.

Cyber security news for all

New PondRAT Malware Disguised in Python Packages Targets Software Developers

Discord Unveils DAVE Protocol for Comprehensive Encryption in Audio and Video Communication

GitLab Resolves Critical SAML Authentication Bypass Vulnerability in CE and EE Versions

SpyLoan Malware Embedded in Android Loan Apps Exposes 8 Million Users

Google Halts Risky Android App Sideloading in India, Elevating Fraud Prevention

Watering Hole Attack on Kurdish Platforms Unleashes Harmful APKs and Spyware

Researchers Discover Weakness in Google’s Gemini AI to Certain Threats

Related

Recent Articles

Malicious Google Ads Impersonating AI Platforms to Spread Malware

Misconfigured SSL: The Hidden Gateway Expanding Your Organization’s Cyber Attack Surface

North Korean Lazarus Group Uses New Social Engineering Trick to Spread Golang-Based Malware

Russian Group Exploits Windows Vulnerability to Deploy SilentPrism and DarkWisp Backdoors

WordPress ‘mu-Plugins’ Directory Abused to Spread Spam and Maintain Access

Related Stories

EDITOR PICKS

Malicious Google Ads Impersonating AI Platforms to Spread Malware

Misconfigured SSL: The Hidden Gateway Expanding Your Organization’s Cyber Attack Surface

North Korean Lazarus Group Uses New Social Engineering Trick to Spread Golang-Based Malware

POPULAR POSTS

A complete OSINT tutorial on finding someone’s personal information

Chaos Computer Club discovered access to 5 million records

Mailto links can pose an unexpected security risk

ABOUT US

POPULAR CATEGORY

FOLLOW US