ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified developer tools Safety 4/5

skillbench

Track skill versions, benchmark performance, compare improvements, and get self-improvement signals. Integrates with tasktime and ClawVault.

Why use this skill?

Track, benchmark, and optimize your OpenClaw agent skills. Improve your AI performance with automated testing, trend analysis, and version tracking.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/g9pedro/skillbench
Or

What This Skill Does

The skillbench skill acts as the central observability and performance-tracking engine for your OpenClaw AI agents. It provides a structured lifecycle for managing agent capabilities, allowing you to track specific skill versions, benchmark execution performance, and analyze improvements over time. By integrating tightly with tasktime for duration logging and ClawVault for secure state storage, skillbench turns standard AI interactions into measurable data points. It enables developers and power users to move beyond 'it works' to 'it works X% faster and with Y% fewer errors,' creating a self-improving feedback loop for agentic workflows.

Installation

To begin tracking your agent's performance, run the following command in your terminal:

clawhub install openclaw/skills/skills/g9pedro/skillbench

Alternatively, for a global node-based installation:

npm install -g @versatly/skillbench

Once installed, ensure your environment is linked to ClawHub for seamless synchronization and registry access.

Use Cases

  1. Regression Testing: Identify if a newer version of an agent skill is performing worse than a previously stable version by comparing latency and success rates.
  2. Performance Optimization: Use the improve command to receive data-driven suggestions on which skill should be refactored or prompted differently based on recorded error types.
  3. Multi-Agent Benchmarking: In setups with multiple agents, use the leaderboard functionality to determine which agent configuration yields the highest throughput for specific task types like PR generation or data extraction.
  4. CI/CD for Agents: Integrate the test command into your deployment pipelines to ensure new skill iterations pass smoke tests before being deployed to production agents.

Example Prompts

  1. "OpenClaw, run skillbench score github and tell me if the latest 1.2.0 update shows an improvement in speed compared to 1.1.0."
  2. "I'm experiencing intermittent failures; please run skillbench improve tasktime to see if there's a pattern in the error logs and provide a fix plan."
  3. "Generate a performance dashboard for all my active skills and export it as a markdown file so I can review the trends from the last 30 days."

Tips & Limitations

  • Proactive Monitoring: Set up skillbench watch --interval 300 to maintain constant health checks on your agent's critical paths. This catches auth issues before they impact business operations.
  • Data Integrity: Ensure tasktime is correctly configured before running record commands; otherwise, durations will default to manual inputs or null.
  • Limitations: The skill currently relies on standard exit codes and tasktime integration. Complex non-task-based interactions might require manual recording. Always maintain a version history to ensure that 'rollback' actions remain possible during failed improvement cycles.

Metadata

Author@g9pedro
Stars2387
Views1
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-g9pedro-skillbench": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#observability#benchmarking#agentic-workflow#performance-tracking#automation
Safety Score: 4/5

Flags: file-write, file-read, code-execution