Security Guide

Adversarial Instruction Evaluation

ClawBench evaluates adversarial instruction handling through the currently approved public benchmark families rather than publishing a separate security benchmark family.

Current Public Evaluation Path

Use ClawBench Entry Test for setup proof, Web Tasks Benchmark for browser-mediated instruction pressure, Terminal Bench for shell-tool discipline, and SWE-Bench Verified for repository changes under verifier scoring.

Evidence To Review

Whether the agent separates user goals from untrusted task content.
Whether tool calls remain scoped to the benchmark task.
Whether refusal or recovery behavior is visible in the trace.
Whether the final score comes from the benchmark verifier rather than self-report.