autoresearch
Autonomously optimize any OpenClaw skill by running it repeatedly, scoring outputs against binary evals, mutating the prompt, and keeping improvements. Based on Karpathy's autoresearch methodology. Use when: optimize this skill, improve this skill, run autoresearch on, make this skill better, self-improve skill, benchmark skill, eval my skill, run evals on.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/alannjaf/karpathy-autoresearchautoresearch
Autonomously optimize any OpenClaw skill by running it repeatedly, scoring outputs against binary evals, mutating the prompt, and keeping improvements. Based on Karpathy's autoresearch methodology.
Triggers
Use when: optimize this skill, improve this skill, run autoresearch on, make this skill better, self-improve skill, benchmark skill, eval my skill, run evals on.
Description
Autonomous prompt/strategy optimization using Karpathy's autoresearch pattern. Mutate → evaluate → keep improvements. Works on anything with a measurable score: trading strategies, content scripts, thumbnails, ad copy, email subjects.
How It Works
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ 1. BASELINE │────▶│ 2. MUTATE │────▶│ 3. EVALUATE │────▶│ 4. DECIDE │
│ Score the │ │ Change one │ │ Run scoring │ │ Better? │
│ current │ │ thing │ │ function │ │ Keep : Revert│
│ version │ │ │ │ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘ └──────┬───────┘
│
Loop back to 2
Instructions
Step 1: Identify the Mutable File
The mutable file is the thing you're optimizing. It can be:
- A SKILL.md prompt/instructions
- A trading strategy config (thresholds, parameters)
- A content template (YouTube script format, ad copy structure)
- Any text file where changes produce measurable differences
Create or identify this file. Example:
my-skill/
├── SKILL.md ← this is your mutable file
├── eval/
│ ├── test_cases.json
│ └── score.py
Step 2: Create an Evaluation Function
Your eval function must:
- Take the current mutable file as input
- Run it against test cases
- Return a numeric score (higher = better)
The eval can be anything:
- LLM-as-judge: Send output to an LLM, ask it to score 1-100
- Backtest: Run a strategy against historical data, measure Sharpe/returns
- A/B metrics: CTR, engagement, conversion rate
- Binary pass/fail: Count how many test cases pass out of N
Template eval function (customize for your domain):
# eval/score.py
import json
import sys
def evaluate(mutable_file_path: str, test_cases_path: str) -> float:
"""
Score the current version of the mutable file.
Returns a float — higher is better.
"""
with open(mutable_file_path) as f:
current_version = f.read()
with open(test_cases_path) as f:
test_cases = json.load(f)
scores = []
for case in test_cases:
# YOUR SCORING LOGIC HERE
# Example: run the prompt, compare output to expected
score = run_and_score(current_version, case)
scores.append(score)
return sum(scores) / len(scores)
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-alannjaf-karpathy-autoresearch": {
"enabled": true,
"auto_update": true
}
}
}