Prompt Defense
Detect and block attempts to manipulate LLM behavior through prompt injection, jailbreaks, and instruction overrides.
Glitch organizes its detectors into four detector categories, each targeting a specific class of threats to LLM applications.
Prompt Defense
Detect and block attempts to manipulate LLM behavior through prompt injection, jailbreaks, and instruction overrides.
Content Moderation
Filter harmful, toxic, or inappropriate content in both inputs and outputs.
Data Leakage Prevention
Identify and protect sensitive data like PII, credentials, and proprietary information.
Malicious Links
Detect and validate URLs, blocking known malicious domains and flagging unknown links.
| Detector Type | Description |
|---|---|
prompt_attack | General prompt injection and jailbreak detection |
| Detector Type | Description |
|---|---|
moderated_content/harassment | Harassing language |
moderated_content/harassment_threatening | Harassment with threats |
moderated_content/hate | Hate speech based on protected characteristics |
moderated_content/hate_threatening | Hate speech with threats |
moderated_content/sexual | Sexual or adult content |
moderated_content/sexual_minors | CSAM detection |
moderated_content/violence | Violence, death, injury |
moderated_content/violence_graphic | Graphic violence |
moderated_content/self_harm | Self-harm content |
moderated_content/self_harm_intent | Intent to self-harm |
moderated_content/self_harm_instructions | Instructions for self-harm |
moderated_content/illicit | Advice on illegal activities |
moderated_content/illicit_violent | Illegal activities with weapons |
| Detector Type | Description |
|---|---|
pii/email | Email addresses |
pii/credit_card | Credit card numbers |
pii/ssn | Social Security Numbers |
pii/phone | Phone numbers |
pii/address | Physical addresses |
pii/name | Personal names |
| Detector Type | Description |
|---|---|
unknown_links | URLs not in known-safe lists |
Glitch uses three detection approaches:
Fast pattern matching using optimized regex:
Runs on every request with negligible latency impact.
Deep semantic analysis for:
Runs when signature detection is inconclusive or for categories requiring semantic understanding.
Detectors can be applied at two points:
| Stage | Purpose | Example |
|---|---|---|
| Input | Protect the LLM from malicious prompts | Block injection attempts before they reach the model |
| Output | Protect users from harmful responses | Filter PII the model might leak |
Configure via policy_mode:
IO — Scan both inputs and outputsI — Input only (faster, less protection)O — Output only (for pre-validated inputs)The most effective protection combines multiple detector categories:
{ "input_detectors": [ { "detector_type": "prompt_attack", "threshold": "L2", "action": "block" }, { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" } ], "output_detectors": [ { "detector_type": "pii/email", "threshold": "L2", "action": "block" }, { "detector_type": "moderated_content/hate", "threshold": "L2", "action": "block" }, { "detector_type": "unknown_links", "threshold": "L3", "action": "flag" } ]}This policy: