What is a Signal?
A signal is the result of a compliance audit. It contains:- Risk Level: LOW, MEDIUM, HIGH, or CRITICAL
- Violations: List of violation codes
- Reasoning: Explanation of what was detected
- Metadata: Provider, model, tokens, duration
Signal Structure
Risk Levels
LOW
Minor issues that don’t require immediate action:- Borderline content
- Low-confidence detections
- Informational warnings
MEDIUM
Issues that should be reviewed:- Moderate policy violations
- Potential PII leakage
- Questionable content
HIGH
Serious violations requiring attention:- Clear PII leakage
- Discriminatory content
- Security vulnerabilities
- Prompt injection attempts
CRITICAL
Severe violations requiring immediate action:- CSAM (zero tolerance)
- Explicit dangerous instructions
- Severe security breaches
- Regulatory violations
Violation Codes
Signals include specific violation codes:PII Violations
PII_LEAK: General PII detectedEMAIL: Email addressSSN: Social Security NumberCREDIT_CARD: Credit card numberPHONE: Phone numberPASSPORT: Passport numberHEALTH_DATA: Medical informationFINANCIAL_DATA: Financial information
Bias Violations
RACIAL_BIAS: Racial discriminationGENDER_BIAS: Gender discriminationAGE_BIAS: Age discriminationDISABILITY_BIAS: Disability discriminationRELIGIOUS_BIAS: Religious discriminationINTERSECTIONAL_BIAS: Multiple bias types
Security Violations
CODE_INJECTION: SQL, command, or code injectionSECRET_LEAK: API keys, passwords, tokensXSS: Cross-site scriptingSSRF: Server-side request forgeryDANGEROUS_INSTRUCTIONS: Harmful how-to content
Prompt Injection
PROMPT_INJECTION: Direct injection attackJAILBREAK_ATTEMPT: DAN, STAN, or similarSYSTEM_PROMPT_EXTRACTION: Attempting to reveal system promptGOAL_HIJACKING: Redirecting model behavior
Agent Safety
AGENT_LOOP: Infinite loop or recursionIRREVERSIBLE_ACTION: Dangerous action without confirmationSCOPE_CREEP: Acting outside intended scopePRIVILEGE_ESCALATION: Attempting elevated permissions
Content Policy
VIOLENT_CONTENT: Graphic violenceSEXUAL_CONTENT: Sexual contentCSAM: Child sexual abuse material (CRITICAL)HATE_SPEECH: Dehumanizing languageSELF_HARM_FACILITATION: Suicide or self-harm content
Viewing Signals
Dashboard
View signals in the dashboard:- Risk level breakdown
- Recent signals
- Filter by sandbox, provider, model
- Date range filtering
- Export for compliance reports
API
Query signals programmatically:Filtering Signals
By Risk Level
By Sandbox
By Provider
By Date Range
Combined Filters
Signal Reasoning
Signals include redacted reasoning to protect PII:Dashboard Statistics
Get aggregate statistics:Exporting Signals
Export signals for compliance reports:csv, json, xlsx
Webhooks (Coming Soon)
Receive real-time alerts for high-risk signals:Best Practices
Monitoring
Set up regular monitoring:- Daily review of CRITICAL signals
- Weekly review of HIGH signals
- Monthly compliance reports
Alerting
Configure alerts for critical issues:- Email notifications for CRITICAL
- Slack/Discord webhooks for HIGH
- Dashboard monitoring for MEDIUM/LOW
Response Workflow
Establish a response workflow:- CRITICAL: Immediate investigation and remediation
- HIGH: Review within 24 hours
- MEDIUM: Review within 1 week
- LOW: Monthly review
Compliance Reports
Generate regular compliance reports:- Monthly audit summaries
- Violation trend analysis
- Sandbox performance metrics
- Regulatory compliance status
Next Steps
Dashboard
View signals in dashboard
API Reference
Query signals via API
Sandbox
Configure sandboxes
Compliance
Learn about compliance types

