Official Verified developer tools Safety 4/5

skillbench

Track skill versions, benchmark performance, compare improvements, and get self-improvement signals. Integrates with tasktime and ClawVault.

Why use this skill?

Track, benchmark, and optimize your OpenClaw agent skills. Improve your AI performance with automated testing, trend analysis, and version tracking.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/g9pedro/skillbench

Download Source Code (.zip)

What This Skill Does

The skillbench skill acts as the central observability and performance-tracking engine for your OpenClaw AI agents. It provides a structured lifecycle for managing agent capabilities, allowing you to track specific skill versions, benchmark execution performance, and analyze improvements over time. By integrating tightly with tasktime for duration logging and ClawVault for secure state storage, skillbench turns standard AI interactions into measurable data points. It enables developers and power users to move beyond 'it works' to 'it works X% faster and with Y% fewer errors,' creating a self-improving feedback loop for agentic workflows.

Installation

To begin tracking your agent's performance, run the following command in your terminal:

clawhub install openclaw/skills/skills/g9pedro/skillbench

Alternatively, for a global node-based installation:

npm install -g @versatly/skillbench

Once installed, ensure your environment is linked to ClawHub for seamless synchronization and registry access.

Use Cases

Regression Testing: Identify if a newer version of an agent skill is performing worse than a previously stable version by comparing latency and success rates.
Performance Optimization: Use the improve command to receive data-driven suggestions on which skill should be refactored or prompted differently based on recorded error types.
Multi-Agent Benchmarking: In setups with multiple agents, use the leaderboard functionality to determine which agent configuration yields the highest throughput for specific task types like PR generation or data extraction.
CI/CD for Agents: Integrate the test command into your deployment pipelines to ensure new skill iterations pass smoke tests before being deployed to production agents.

Example Prompts

"OpenClaw, run skillbench score github and tell me if the latest 1.2.0 update shows an improvement in speed compared to 1.1.0."
"I'm experiencing intermittent failures; please run skillbench improve tasktime to see if there's a pattern in the error logs and provide a fix plan."
"Generate a performance dashboard for all my active skills and export it as a markdown file so I can review the trends from the last 30 days."

Tips & Limitations

Proactive Monitoring: Set up skillbench watch --interval 300 to maintain constant health checks on your agent's critical paths. This catches auth issues before they impact business operations.
Data Integrity: Ensure tasktime is correctly configured before running record commands; otherwise, durations will default to manual inputs or null.
Limitations: The skill currently relies on standard exit codes and tasktime integration. Complex non-task-based interactions might require manual recording. Always maintain a version history to ensure that 'rollback' actions remain possible during failed improvement cycles.

Read Full Documentation on GitHub

Metadata

Author@g9pedro

Stars2387

Updated2026-03-09

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-g9pedro-skillbench": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#observability#benchmarking#agentic-workflow#performance-tracking#automation

Safety Score: 4/5

Flags: file-write, file-read, code-execution

Related Skills

Clawvault

Skill by g9pedro

g9pedro 2387

loom-workflow

AI-native workflow analyzer for Loom recordings. Breaks down recorded business processes into structured, automatable workflows. Use when: - Analyzing Loom videos to understand workflows - Extracting steps, tools, and decision points from screen recordings - Generating Lobster workflow files from video walkthroughs - Identifying ambiguities and human intervention points in processes

g9pedro 2387

agent-autonomy-primitives

Build long-running autonomous agent loops using ClawVault primitives (tasks, projects, memory types, templates, heartbeats). Use when setting up agent autonomy, creating task-driven execution loops, customizing primitive schemas, wiring heartbeat-based work queues, or teaching an agent to manage its own backlog. Also use when adapting primitives to an existing agent setup or designing multi-agent collaboration through shared vaults.

g9pedro 2387

pdauth

Dynamic OAuth for AI agents via Pipedream. Generate OAuth links for 2500+ APIs, let users authorize, then call MCP tools on their behalf.

g9pedro 2387

linkedin-pipedream

Post to LinkedIn, comment, like, search organizations, and manage profiles via Pipedream OAuth integration.

g9pedro 2387