skillbench
Track skill versions, benchmark performance, compare improvements, and get self-improvement signals. Integrates with tasktime and ClawVault.
Why use this skill?
Track, benchmark, and optimize your OpenClaw agent skills. Improve your AI performance with automated testing, trend analysis, and version tracking.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/g9pedro/skillbenchWhat This Skill Does
The skillbench skill acts as the central observability and performance-tracking engine for your OpenClaw AI agents. It provides a structured lifecycle for managing agent capabilities, allowing you to track specific skill versions, benchmark execution performance, and analyze improvements over time. By integrating tightly with tasktime for duration logging and ClawVault for secure state storage, skillbench turns standard AI interactions into measurable data points. It enables developers and power users to move beyond 'it works' to 'it works X% faster and with Y% fewer errors,' creating a self-improving feedback loop for agentic workflows.
Installation
To begin tracking your agent's performance, run the following command in your terminal:
clawhub install openclaw/skills/skills/g9pedro/skillbench
Alternatively, for a global node-based installation:
npm install -g @versatly/skillbench
Once installed, ensure your environment is linked to ClawHub for seamless synchronization and registry access.
Use Cases
- Regression Testing: Identify if a newer version of an agent skill is performing worse than a previously stable version by comparing latency and success rates.
- Performance Optimization: Use the
improvecommand to receive data-driven suggestions on which skill should be refactored or prompted differently based on recorded error types. - Multi-Agent Benchmarking: In setups with multiple agents, use the leaderboard functionality to determine which agent configuration yields the highest throughput for specific task types like PR generation or data extraction.
- CI/CD for Agents: Integrate the
testcommand into your deployment pipelines to ensure new skill iterations pass smoke tests before being deployed to production agents.
Example Prompts
- "OpenClaw, run
skillbench score githuband tell me if the latest 1.2.0 update shows an improvement in speed compared to 1.1.0." - "I'm experiencing intermittent failures; please run
skillbench improve tasktimeto see if there's a pattern in the error logs and provide a fix plan." - "Generate a performance dashboard for all my active skills and export it as a markdown file so I can review the trends from the last 30 days."
Tips & Limitations
- Proactive Monitoring: Set up
skillbench watch --interval 300to maintain constant health checks on your agent's critical paths. This catches auth issues before they impact business operations. - Data Integrity: Ensure
tasktimeis correctly configured before runningrecordcommands; otherwise, durations will default to manual inputs or null. - Limitations: The skill currently relies on standard exit codes and
tasktimeintegration. Complex non-task-based interactions might require manual recording. Always maintain a version history to ensure that 'rollback' actions remain possible during failed improvement cycles.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-g9pedro-skillbench": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution
Related Skills
Clawvault
Skill by g9pedro
loom-workflow
AI-native workflow analyzer for Loom recordings. Breaks down recorded business processes into structured, automatable workflows. Use when: - Analyzing Loom videos to understand workflows - Extracting steps, tools, and decision points from screen recordings - Generating Lobster workflow files from video walkthroughs - Identifying ambiguities and human intervention points in processes
agent-autonomy-primitives
Build long-running autonomous agent loops using ClawVault primitives (tasks, projects, memory types, templates, heartbeats). Use when setting up agent autonomy, creating task-driven execution loops, customizing primitive schemas, wiring heartbeat-based work queues, or teaching an agent to manage its own backlog. Also use when adapting primitives to an existing agent setup or designing multi-agent collaboration through shared vaults.
pdauth
Dynamic OAuth for AI agents via Pipedream. Generate OAuth links for 2500+ APIs, let users authorize, then call MCP tools on their behalf.
linkedin-pipedream
Post to LinkedIn, comment, like, search organizations, and manage profiles via Pipedream OAuth integration.