Prompt Defense
Detect prompt injection, jailbreaks, and instruction manipulation attempts.
Detectors are Glitch’s security mechanisms that analyze content to identify threats, sensitive data, and policy violations. They form the foundation of Glitch’s security model.
Detectors are configurable security checks that:
Every detector configuration includes:
| Component | Description |
|---|---|
| Type | The specific detector (e.g., prompt_attack, pii/email) |
| Threshold | Sensitivity level (L1-L4) that determines when the detector triggers |
| Action | What happens when triggered: block, flag, or allow |
| Scope | Whether to run on input, output, or both |
Glitch organizes detectors into four main categories:
Prompt Defense
Detect prompt injection, jailbreaks, and instruction manipulation attempts.
Content Moderation
Filter harmful, toxic, or inappropriate content.
Data Leakage Prevention
Identify and protect sensitive data like PII and credentials.
Malicious Links
Detect and validate URLs, blocking known malicious domains.
Learn more about detector categories →
Glitch uses multiple detection approaches:
Detectors are configured within Policies, which define: