Data Leakage Prevention
Data Leakage Prevention (DLP) identifies and protects sensitive data like personally identifiable information (PII), credentials, and proprietary content.
The Risk
Section titled “The Risk”LLMs can leak sensitive data in two ways:
1. Training Data Leakage
Section titled “1. Training Data Leakage”Models may memorize and regurgitate sensitive data from their training set.
2. Context Leakage
Section titled “2. Context Leakage”When processing documents or conversation history, models may expose sensitive information in their responses.
Glitch DLP detects sensitive data in both inputs (preventing submission) and outputs (preventing exposure).
PII Categories
Section titled “PII Categories”| Detector Type | Description | Example |
|---|---|---|
pii/email | Email addresses | user@example.com |
pii/credit_card | Credit/debit card numbers | 4111-1111-1111-1111 |
pii/ssn | Social Security Numbers | 123-45-6789 |
pii/phone | Phone numbers | +1 (555) 123-4567 |
pii/address | Physical addresses | 123 Main St, City, ST 12345 |
pii/name | Personal names | John Smith |
Configuration
Section titled “Configuration”Basic PII Protection
Section titled “Basic PII Protection”{ "output_detectors": [ { "detector_type": "pii/email", "threshold": "L2", "action": "block" }, { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" }, { "detector_type": "pii/ssn", "threshold": "L1", "action": "block" } ]}Full DLP Policy
Section titled “Full DLP Policy”{ "input_detectors": [ { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" }, { "detector_type": "pii/ssn", "threshold": "L1", "action": "block" } ], "output_detectors": [ { "detector_type": "pii/email", "threshold": "L2", "action": "block" }, { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" }, { "detector_type": "pii/ssn", "threshold": "L1", "action": "block" }, { "detector_type": "pii/phone", "threshold": "L2", "action": "flag" }, { "detector_type": "pii/address", "threshold": "L3", "action": "flag" } ]}Detection Examples
Section titled “Detection Examples”Output: "Your card ending in 4532-8901-2345-6789 has been charged."
Detection: pii/credit_cardConfidence: 0.98Action: BLOCKED
Note: Credit card patterns have high confidence due toLuhn checksum validation.Output: "Contact john.doe@company.com for support."
Detection: pii/emailConfidence: 0.95Action: BLOCKEDOutput: "SSN: 123-45-6789"
Detection: pii/ssnConfidence: 0.97Action: BLOCKEDOutput: "Call us at +1 (555) 123-4567"
Detection: pii/phoneConfidence: 0.85Action: FLAGGED (if configured)Threshold Recommendations
Section titled “Threshold Recommendations”| Detector | Recommended Level | Notes |
|---|---|---|
pii/credit_card | L1 | High-precision pattern with checksum |
pii/ssn | L1 | Strict pattern, low false positives |
pii/email | L2 | Email-like patterns can be benign |
pii/phone | L2-L3 | Many number patterns look like phones |
pii/address | L3 | Addresses have high variance |
pii/name | L3-L4 | Names are highly contextual |
Input vs. Output Protection
Section titled “Input vs. Output Protection”Input Protection
Section titled “Input Protection”Prevents users from submitting sensitive data:
{ "input_detectors": [ { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" } ]}Use cases:
- Prevent accidental PII submission
- Compliance with data handling policies
- Reduce liability from processing sensitive data
Output Protection
Section titled “Output Protection”Catches sensitive data in LLM responses:
{ "output_detectors": [ { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" }, { "detector_type": "pii/email", "threshold": "L2", "action": "block" } ]}Use cases:
- Prevent training data leakage
- Protect against prompt injection data extraction
- Compliance with privacy regulations (GDPR, CCPA)
Allow Lists for Known Data
Section titled “Allow Lists for Known Data”Sometimes legitimate data looks like PII. Use allow lists:
{ "allow_list": { "entries": [ "support@yourcompany.com", "sales@yourcompany.com" ], "match_type": "exact" }, "output_detectors": [ { "detector_type": "pii/email", "threshold": "L2", "action": "block" } ]}This blocks email addresses except your company’s support emails.
Actions and Responses
Section titled “Actions and Responses”Blocked Response
Section titled “Blocked Response”HTTP/1.1 403 ForbiddenX-Risk-Blocked: trueX-Risk-Categories: pii/credit_cardX-Risk-Confidence: 0.98
{ "error": { "message": "Response blocked: sensitive data detected", "type": "data_leakage_prevention", "code": "pii_detected" }}Flagged Response
Section titled “Flagged Response”HTTP/1.1 200 OKX-Risk-Blocked: falseX-Risk-Categories: pii/emailX-Risk-Confidence: 0.85Content is delivered but flagged for logging/review.
Best Practices
Section titled “Best Practices”1. Always Protect High-Sensitivity Data
Section titled “1. Always Protect High-Sensitivity Data”Credit cards and SSNs should always be blocked:
{ "output_detectors": [ { "detector_type": "pii/credit_card", "threshold": "L1", "action": "block" }, { "detector_type": "pii/ssn", "threshold": "L1", "action": "block" } ]}2. Flag Before Blocking (Low-Confidence)
Section titled “2. Flag Before Blocking (Low-Confidence)”For lower-confidence detections, start with flagging:
{ "output_detectors": [ { "detector_type": "pii/name", "threshold": "L3", "action": "flag" } ]}Review flagged content to tune your policy.
3. Combine with Custom Detectors
Section titled “3. Combine with Custom Detectors”Add patterns for domain-specific sensitive data:
{ "custom_detectors": [ { "name": "employee_id", "pattern": "EMP-\\d{6}", "action": "block", "description": "Internal employee IDs" } ]}Compliance Considerations
Section titled “Compliance Considerations”DLP helps with:
| Regulation | Relevant Data Types |
|---|---|
| GDPR | All PII (names, emails, addresses) |
| CCPA | California resident PII |
| PCI-DSS | Credit card numbers |
| HIPAA | Health information (custom detectors) |
Next Steps
Section titled “Next Steps”- Allow & Deny Lists — Customize detection rules
- Custom Detectors — Add domain-specific patterns
- Malicious Links — Protect against harmful URLs