What term describes compromising AI by entering prompts that cause it to behave in unintended ways?

Prepare for the AI Governance Exam in AAISM Domain 1. Study with flashcards and multiple-choice questions, each question features hints and explanations. Get ready for your exam!

Multiple Choice

What term describes compromising AI by entering prompts that cause it to behave in unintended ways?

Explanation:
Prompt injection is a prompt-based attack that exploits how AI models interpret user input by injecting instructions into the prompt that steer the model to behave in unintended ways. It works because the model treats the supplied text as guidance, so cleverly crafted prompts can override safeguards, reveal restricted information, or ignore safety rules. This directly captures the idea of compromising AI through prompts. It differs from data poisoning, which corrupts training data to influence behavior over time, and from other terms that describe different threat models. Defenses include keeping system prompts separate from user content, reinforcing guardrails, input validation, and monitoring for jailbreak prompts.

Prompt injection is a prompt-based attack that exploits how AI models interpret user input by injecting instructions into the prompt that steer the model to behave in unintended ways. It works because the model treats the supplied text as guidance, so cleverly crafted prompts can override safeguards, reveal restricted information, or ignore safety rules. This directly captures the idea of compromising AI through prompts. It differs from data poisoning, which corrupts training data to influence behavior over time, and from other terms that describe different threat models. Defenses include keeping system prompts separate from user content, reinforcing guardrails, input validation, and monitoring for jailbreak prompts.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy