News

Prompt Shields AI System Security for Azure OpenAI Now Available

Microsoft has launched Prompt Shields, a new security feature now generally available, aimed at safeguarding applications powered by Foundation Models (large language models) for its Azure OpenAI Service.

This feature is designed to protect against two key types of attacks: direct, or jailbreak, attacks and indirect prompt injection attacks.

Microsoft's Prompt Shields addresses these by providing protection against both attack types, enhancing the security of AI-driven applications across the Azure platform.

Microsoft said indirect prompt injection attacks, also known as cross-domain prompt injection attacks, present a rising threat to systems that use generative AI models. These attacks occur when an application processes information that wasn't directly created by the developer or the user, potentially leading to unauthorized actions.

For instance, an AI-powered email assistant using Azure OpenAI could be tricked into executing hidden malicious commands embedded within a seemingly normal email. Such commands might instruct the AI to retrieve sensitive data or visit harmful Web sites, actions that could compromise security, according to Microsoft.

These attacks can occur whenever large language models (LLMs) process external data. Even applications that primarily generate text could be vulnerable to producing harmful or offensive content, if exploited, according to Microsoft

Direct and indirect attacks involve different threat models. In direct attacks, the user intentionally manipulates the AI system via prompts, causing the LLM to ignore its intended rules and training. In contrast, indirect attacks involve a third-party adversary embedding malicious content within external data, which the LLM mistakenly interprets as legitimate user commands.

In a recent report by security firm Kaspersky, experts pinpoint the motivation behind prompt injection attacks as differing from more traditional or jailbreak attacks, which mostly focus on data exfiltration for monetary gains. They found that most indirect prompt injection attacks fall into the following categories:

  • Job related. For example, potential job seekers injecting code to highlight their resume or profile when job seekers use automated software to filter candidates.
  • Ad related. For example, injecting code into search bots to promote or highlight positive user reviews on products and services.
  • Protest related. For example, an artist could inject a prompt on their Web site to curb LLMs from grabbing their content.
  • Insult related. For example, spamming off-topic comments or attacks in social media through prompt injection instructions.

While Microsoft's Prompt Shields will help identify such attacks on LLMs, the company said the security feature will be limited by language. Prompt Shields was developed and tested using Chinese, English, French, German, Spanish, Italian, Japanese and Portuguese. The service may not be as efficient at spotting attacks for other languages.  

About the Author

Chris Paoli (@ChrisPaoli5) is the associate editor for Converge360.

Featured

comments powered by Disqus

Subscribe on YouTube