Skip to content

Detectors

Detectors are Glitch’s security mechanisms that analyze content to identify threats, sensitive data, and policy violations. They form the foundation of Glitch’s security model.

Detectors are configurable security checks that:

  • Analyze content using pattern matching, semantic analysis, or custom rules
  • Return confidence scores indicating how likely content matches a threat pattern
  • Trigger actions (block, flag, or allow) based on configured thresholds
  • Support both input and output scanning to protect LLM interactions end-to-end

Every detector configuration includes:

ComponentDescription
TypeThe specific detector (e.g., prompt_attack, pii/email)
ThresholdSensitivity level (L1-L4) that determines when the detector triggers
ActionWhat happens when triggered: block, flag, or allow
ScopeWhether to run on input, output, or both
  1. Content Analysis: Detector analyzes the content using its detection method
  2. Confidence Scoring: Returns a confidence score (0.0-1.0) indicating threat likelihood
  3. Threshold Comparison: Compares score against configured threshold level
  4. Action Execution: If threshold is met, executes the configured action

Glitch organizes detectors into four main categories:

Prompt Defense

Detect prompt injection, jailbreaks, and instruction manipulation attempts.

Content Moderation

Filter harmful, toxic, or inappropriate content.

Data Leakage Prevention

Identify and protect sensitive data like PII and credentials.

Malicious Links

Detect and validate URLs, blocking known malicious domains.

Learn more about detector categories →

Glitch uses multiple detection approaches:

  • Signature Detection: Fast pattern matching using regex (~11µs)
  • LLM Detection: Deep semantic analysis for nuanced threats (~50-100ms)
  • Custom Detectors: User-defined patterns and rules

Detectors are configured within Policies, which define:

  • Which detectors to run
  • Threshold levels for each detector
  • Actions to take when threats are detected
  • Whether to scan inputs, outputs, or both

Learn more about Policies →