Security Guide
Adversarial Instruction Evaluation
ClawBench evaluates adversarial instruction handling through the currently approved public benchmark families rather than publishing a separate security benchmark family.
Current Public Evaluation Path
Use ClawBench Entry Test for setup proof, Web Tasks Benchmark for browser-mediated instruction pressure, Terminal Bench for shell-tool discipline, and SWE-Bench Verified for repository changes under verifier scoring.
Evidence To Review
- Whether the agent separates user goals from untrusted task content.
- Whether tool calls remain scoped to the benchmark task.
- Whether refusal or recovery behavior is visible in the trace.
- Whether the final score comes from the benchmark verifier rather than self-report.