Data Leakage Prevention

Data Leakage Prevention (DLP) identifies and protects sensitive data like personally identifiable information (PII), credentials, and proprietary content.

The Risk

LLMs can leak sensitive data in two ways:

1. Training Data Leakage

Models may memorize and regurgitate sensitive data from their training set.

2. Context Leakage

When processing documents or conversation history, models may expose sensitive information in their responses.

Glitch DLP detects sensitive data in both inputs (preventing submission) and outputs (preventing exposure).

PII Categories

Glitch provides comprehensive PII detection with both signature-based (fast, pattern matching) and LLM-based (contextual analysis) detectors.

Signature-Based Detectors (Fast Path)

Detector Type	Description	Example	Latency
`pii/email`	Email addresses	user@example.com	~11µs
`pii/credit_card`	Credit/debit card numbers (Visa, MC, Amex, Discover)	4111-1111-1111-1111	~11µs
`pii/us_social_security_number`	US Social Security Numbers	123-45-6789	~11µs
`pii/phone_number`	Phone numbers (US, international)	+1 (555) 123-4567	~11µs
`pii/ip_address`	IPv4 and IPv6 addresses	192.168.1.1, 2001:db8::1	~11µs
`pii/iban_code`	International Bank Account Numbers	DE89370400440532013000	~11µs

LLM-Based Detectors (Deep Analysis)

Detector Type	Description	Example	Latency
`pii/address`	Physical mailing addresses	123 Main St, Apt 4B, NYC 10001	~50-100ms
`pii/name`	Personal names with identifying context	Dr. John Smith, patient ID 12345	~50-100ms

Configuration

Basic PII Protection

{
  "output_detectors": [
    { "detector_type": "pii/email", "threshold": "L2", "action": "block" },
    { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" },
    { "detector_type": "pii/us_social_security_number", "threshold": "L1", "action": "block" }
  ]
}

Full DLP Policy

{
  "input_detectors": [
    { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" },
    { "detector_type": "pii/us_social_security_number", "threshold": "L1", "action": "block" }
  ],
  "output_detectors": [
    { "detector_type": "pii/email", "threshold": "L2", "action": "block" },
    { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" },
    { "detector_type": "pii/us_social_security_number", "threshold": "L1", "action": "block" },
    { "detector_type": "pii/phone_number", "threshold": "L2", "action": "log" },
    { "detector_type": "pii/ip_address", "threshold": "L2", "action": "log" },
    { "detector_type": "pii/iban_code", "threshold": "L1", "action": "block" },
    { "detector_type": "pii/address", "threshold": "L3", "action": "log" },
    { "detector_type": "pii/name", "threshold": "L3", "action": "log" }
  ]
}

Detection Examples

Output: "Your card ending in 4532-8901-2345-6789 has been charged."

Detection: pii/credit_card
Confidence: 0.98
Action: BLOCKED

Note: Credit card patterns have high confidence due to
Luhn checksum validation.

Output: "Contact john.doe@company.com for support."

Detection: pii/email
Confidence: 0.95
Action: BLOCKED

Output: "SSN: 123-45-6789"

Detection: pii/us_social_security_number
Confidence: 0.97
Action: BLOCKED

Output: "Call us at +1 (555) 123-4567"

Detection: pii/phone_number
Confidence: 0.85
Action: LOGGED (if configured)

Threshold Recommendations

Detector	Recommended Level	Notes
`pii/credit_card`	L1	High-precision pattern matching card prefixes
`pii/us_social_security_number`	L1	Strict pattern with separators, low false positives
`pii/iban_code`	L1	Strict international bank account format
`pii/email`	L2	Email-like patterns can be benign
`pii/phone_number`	L2-L3	Many number patterns look like phones
`pii/ip_address`	L2	Version numbers can look like IPs
`pii/address`	L3	Addresses have high variance, LLM-analyzed
`pii/name`	L3-L4	Names are highly contextual, LLM-analyzed

Input vs. Output Protection

Input Protection

Prevents users from submitting sensitive data:

{
  "input_detectors": [
    { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" }
  ]
}

Use cases:

Prevent accidental PII submission
Compliance with data handling policies
Reduce liability from processing sensitive data

Output Protection

Catches sensitive data in LLM responses:

{
  "output_detectors": [
    { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" },
    { "detector_type": "pii/email", "threshold": "L2", "action": "block" }
  ]
}

Use cases:

Prevent training data leakage
Protect against prompt injection data extraction
Compliance with privacy regulations (GDPR, CCPA)

Allow Lists for Known Data

Sometimes legitimate data looks like PII. Use allow lists:

{
  "allow_list": {
    "entries": [
      "support@yourcompany.com",
      "sales@yourcompany.com"
    ],
    "match_type": "exact"
  },
  "output_detectors": [
    { "detector_type": "pii/email", "threshold": "L2", "action": "block" }
  ]
}

This blocks email addresses except your company’s support emails.

Actions and Responses

Blocked Response

HTTP/1.1 403 Forbidden
X-Risk-Blocked: true
X-Risk-Categories: pii/credit_card
X-Risk-Confidence: 0.98

{
  "error": {
    "message": "Response blocked: sensitive data detected",
    "type": "data_leakage_prevention",
    "code": "pii_detected"
  }
}

Logged Response

HTTP/1.1 200 OK
X-Risk-Blocked: false
X-Risk-Categories: pii/email
X-Risk-Confidence: 0.85

Content is delivered but logged for review.

Best Practices

1. Always Protect High-Sensitivity Data

Credit cards and SSNs should always be blocked:

{
  "output_detectors": [
    { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" },
    { "detector_type": "pii/us_social_security_number", "threshold": "L1", "action": "block" }
  ]
}

2. Log Before Blocking (Low-Confidence)

For lower-confidence detections, start with logging:

{
  "output_detectors": [
    { "detector_type": "pii/name", "threshold": "L3", "action": "log" }
  ]
}

Review logged content to tune your policy.

3. Combine with Custom Detectors

Add patterns for domain-specific sensitive data:

{
  "custom_detectors": [
    {
      "name": "employee_id",
      "pattern": "EMP-\\d{6}",
      "action": "block",
      "description": "Internal employee IDs"
    }
  ]
}

Compliance Considerations

DLP helps with:

Regulation	Relevant Data Types
GDPR	All PII (names, emails, addresses)
CCPA	California resident PII
PCI-DSS	Credit card numbers
HIPAA	Health information (custom detectors)

Next Steps

Allow & Deny Lists — Customize detection rules
Custom Detectors — Add domain-specific patterns
Malicious Links — Protect against harmful URLs