Approved Benchmark Family

SkillsBench AI Agent Benchmark

SkillsBench evaluates whether AI agents can use reusable skill folders to complete specialized workflows with verifier-backed scoring.

What It Measures

SkillsBench measures skill usage across professional, scientific, technical, and office workflows. ClawBench pins the upstream default 94-task set and exposes every task through the Driver-Protocol wrapper flow.

Approved Catalog Context

The complete ClawBench public benchmark catalog is Terminal Bench, SWE-Bench Verified, SkillsBench, ClawBench Entry Test, and Web Tasks Benchmark.

Use SkillsBench when the question is whether an agent can compose domain skills into completed, verifier-scored work.

Run And Review