Data Leakage Prevention
Data Leakage Prevention (DLP) identifies and protects sensitive data like personally identifiable information (PII), credentials, and proprietary content.
The Risk
Section titled “The Risk”LLMs can leak sensitive data in two ways:
1. Training Data Leakage
Section titled “1. Training Data Leakage”Models may memorize and regurgitate sensitive data from their training set.
2. Context Leakage
Section titled “2. Context Leakage”When processing documents or conversation history, models may expose sensitive information in their responses.
Glitch DLP detects sensitive data in both inputs (preventing submission) and outputs (preventing exposure).
PII Categories
Section titled “PII Categories”Glitch provides comprehensive PII detection with both signature-based (fast, pattern matching) and LLM-based (contextual analysis) detectors.
Signature-Based Detectors (Fast Path)
Section titled “Signature-Based Detectors (Fast Path)”| Detector Type | Description | Example | Latency |
|---|---|---|---|
pii/email | Email addresses | user@example.com | ~11µs |
pii/credit_card | Credit/debit card numbers (Visa, MC, Amex, Discover) | 4111-1111-1111-1111 | ~11µs |
pii/us_social_security_number | US Social Security Numbers | 123-45-6789 | ~11µs |
pii/phone_number | Phone numbers (US, international) | +1 (555) 123-4567 | ~11µs |
pii/ip_address | IPv4 and IPv6 addresses | 192.168.1.1, 2001:db8::1 | ~11µs |
pii/iban_code | International Bank Account Numbers | DE89370400440532013000 | ~11µs |
LLM-Based Detectors (Deep Analysis)
Section titled “LLM-Based Detectors (Deep Analysis)”| Detector Type | Description | Example | Latency |
|---|---|---|---|
pii/address | Physical mailing addresses | 123 Main St, Apt 4B, NYC 10001 | ~50-100ms |
pii/name | Personal names with identifying context | Dr. John Smith, patient ID 12345 | ~50-100ms |
Configuration
Section titled “Configuration”Basic PII Protection
Section titled “Basic PII Protection”{ "output_detectors": [ { "detector_type": "pii/email", "threshold": "L2", "action": "block" }, { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" }, { "detector_type": "pii/us_social_security_number", "threshold": "L1", "action": "block" } ]}Full DLP Policy
Section titled “Full DLP Policy”{ "input_detectors": [ { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" }, { "detector_type": "pii/us_social_security_number", "threshold": "L1", "action": "block" } ], "output_detectors": [ { "detector_type": "pii/email", "threshold": "L2", "action": "block" }, { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" }, { "detector_type": "pii/us_social_security_number", "threshold": "L1", "action": "block" }, { "detector_type": "pii/phone_number", "threshold": "L2", "action": "log" }, { "detector_type": "pii/ip_address", "threshold": "L2", "action": "log" }, { "detector_type": "pii/iban_code", "threshold": "L1", "action": "block" }, { "detector_type": "pii/address", "threshold": "L3", "action": "log" }, { "detector_type": "pii/name", "threshold": "L3", "action": "log" } ]}Detection Examples
Section titled “Detection Examples”Output: "Your card ending in 4532-8901-2345-6789 has been charged."
Detection: pii/credit_cardConfidence: 0.98Action: BLOCKED
Note: Credit card patterns have high confidence due toLuhn checksum validation.Output: "Contact john.doe@company.com for support."
Detection: pii/emailConfidence: 0.95Action: BLOCKEDOutput: "SSN: 123-45-6789"
Detection: pii/us_social_security_numberConfidence: 0.97Action: BLOCKEDOutput: "Call us at +1 (555) 123-4567"
Detection: pii/phone_numberConfidence: 0.85Action: LOGGED (if configured)Threshold Recommendations
Section titled “Threshold Recommendations”| Detector | Recommended Level | Notes |
|---|---|---|
pii/credit_card | L1 | High-precision pattern matching card prefixes |
pii/us_social_security_number | L1 | Strict pattern with separators, low false positives |
pii/iban_code | L1 | Strict international bank account format |
pii/email | L2 | Email-like patterns can be benign |
pii/phone_number | L2-L3 | Many number patterns look like phones |
pii/ip_address | L2 | Version numbers can look like IPs |
pii/address | L3 | Addresses have high variance, LLM-analyzed |
pii/name | L3-L4 | Names are highly contextual, LLM-analyzed |
Input vs. Output Protection
Section titled “Input vs. Output Protection”Input Protection
Section titled “Input Protection”Prevents users from submitting sensitive data:
{ "input_detectors": [ { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" } ]}Use cases:
- Prevent accidental PII submission
- Compliance with data handling policies
- Reduce liability from processing sensitive data
Output Protection
Section titled “Output Protection”Catches sensitive data in LLM responses:
{ "output_detectors": [ { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" }, { "detector_type": "pii/email", "threshold": "L2", "action": "block" } ]}Use cases:
- Prevent training data leakage
- Protect against prompt injection data extraction
- Compliance with privacy regulations (GDPR, CCPA)
Allow Lists for Known Data
Section titled “Allow Lists for Known Data”Sometimes legitimate data looks like PII. Use allow lists:
{ "allow_list": { "entries": [ "support@yourcompany.com", "sales@yourcompany.com" ], "match_type": "exact" }, "output_detectors": [ { "detector_type": "pii/email", "threshold": "L2", "action": "block" } ]}This blocks email addresses except your company’s support emails.
Actions and Responses
Section titled “Actions and Responses”Blocked Response
Section titled “Blocked Response”HTTP/1.1 403 ForbiddenX-Risk-Blocked: trueX-Risk-Categories: pii/credit_cardX-Risk-Confidence: 0.98
{ "error": { "message": "Response blocked: sensitive data detected", "type": "data_leakage_prevention", "code": "pii_detected" }}Logged Response
Section titled “Logged Response”HTTP/1.1 200 OKX-Risk-Blocked: falseX-Risk-Categories: pii/emailX-Risk-Confidence: 0.85Content is delivered but logged for review.
Best Practices
Section titled “Best Practices”1. Always Protect High-Sensitivity Data
Section titled “1. Always Protect High-Sensitivity Data”Credit cards and SSNs should always be blocked:
{ "output_detectors": [ { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" }, { "detector_type": "pii/us_social_security_number", "threshold": "L1", "action": "block" } ]}2. Log Before Blocking (Low-Confidence)
Section titled “2. Log Before Blocking (Low-Confidence)”For lower-confidence detections, start with logging:
{ "output_detectors": [ { "detector_type": "pii/name", "threshold": "L3", "action": "log" } ]}Review logged content to tune your policy.
3. Combine with Custom Detectors
Section titled “3. Combine with Custom Detectors”Add patterns for domain-specific sensitive data:
{ "custom_detectors": [ { "name": "employee_id", "pattern": "EMP-\\d{6}", "action": "block", "description": "Internal employee IDs" } ]}Compliance Considerations
Section titled “Compliance Considerations”DLP helps with:
| Regulation | Relevant Data Types |
|---|---|
| GDPR | All PII (names, emails, addresses) |
| CCPA | California resident PII |
| PCI-DSS | Credit card numbers |
| HIPAA | Health information (custom detectors) |
Next Steps
Section titled “Next Steps”- Allow & Deny Lists — Customize detection rules
- Custom Detectors — Add domain-specific patterns
- Malicious Links — Protect against harmful URLs