Back to Registry
View Author Profile
Official Verified
pinchbench
Run PinchBench benchmarks to evaluate OpenClaw agent performance across real-world tasks. Use when testing model capabilities, comparing models, submitting benchmark results to the leaderboard, or checking how well your OpenClaw setup handles calendar, email, research, coding, and multi-step workflows.
skill-install — Terminal
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/olearycrew/pinchbenchOr
PinchBench Benchmark Skill
PinchBench measures how well LLM models perform as the brain of an OpenClaw agent. Results are collected on a public leaderboard at pinchbench.com.
Prerequisites
- Python 3.10+
- uv package manager
- OpenClaw instance (this agent)
Quick Start
cd <skill_directory>
# Run benchmark with a specific model
uv run benchmark.py --model anthropic/claude-sonnet-4
# Run only automated tasks (faster)
uv run benchmark.py --model anthropic/claude-sonnet-4 --suite automated-only
# Run specific tasks
uv run benchmark.py --model anthropic/claude-sonnet-4 --suite task_01_calendar,task_02_stock
# Skip uploading results
uv run benchmark.py --model anthropic/claude-sonnet-4 --no-upload
Available Tasks (23)
| Task | Category | Description |
|---|---|---|
task_00_sanity | Basic | Verify agent works |
task_01_calendar | Productivity | Calendar event creation |
task_02_stock | Research | Stock price lookup |
task_03_blog | Writing | Blog post creation |
task_04_weather | Coding | Weather script |
task_05_summary | Analysis | Document summarization |
task_06_events | Research | Conference research |
task_07_email | Writing | Email drafting |
task_08_memory | Memory | Context retrieval |
task_09_files | Files | File structure creation |
task_10_workflow | Integration | Multi-step API workflow |
task_11_clawdhub | Skills | ClawHub interaction |
task_12_skill_search | Skills | Skill discovery |
task_13_image_gen | Creative | Image generation |
task_14_humanizer | Writing | Text humanization |
task_15_daily_summary | Productivity | Daily digest |
task_16_email_triage | Inbox triage | |
task_17_email_search | Email search | |
task_18_market_research | Research | Market analysis |
task_19_spreadsheet_summary | Analysis | Spreadsheet analysis |
task_20_eli5_pdf_summary | Analysis | PDF simplification |
task_21_openclaw_comprehension | Knowledge | OpenClaw docs comprehension |
task_22_second_brain | Memory | Knowledge management |
Command Line Options
| Option | Description |
|---|---|
--model | Model identifier (e.g., anthropic/claude-sonnet-4) |
--suite | all, automated-only, or comma-separated task IDs |
--output-dir | Results directory (default: results/) |
--timeout-multiplier | Scale task timeouts for slower models |
--runs | Number of runs per task for averaging |
--no-upload | Skip uploading to leaderboard |
--register | Request new API token for submissions |
--upload FILE | Upload previous results JSON |
Token Registration
To submit results to the leaderboard:
# Register for an API token (one-time)
uv run benchmark.py --register
# Run benchmark (auto-uploads with token)
uv run benchmark.py --model anthropic/claude-sonnet-4
Results
Metadata
AI Skill Finder
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skill Add to Configuration
Paste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-olearycrew-pinchbench": {
"enabled": true,
"auto_update": true
}
}
}Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.