autoresearch-loop
Apply Karpathy's autoresearch methodology to iteratively improve anything measurable — Claude skills, n8n workflows, system prompts, business processes, or any artifact with a clear quality metric. Inspired by github.com/karpathy/autoresearch (56k stars). The loop: propose a change → test it → measure against the target metric → keep if better, discard if not → repeat until a stopping condition is met. Trigger this skill when the user explicitly requests an iterative improvement loop, e.g.: "improve this skill automatically", "iterate on this workflow", "run autoresearch on", "run experiments on this", "optimize this automatically", "set up an improvement loop", or "run the autoresearch method".
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/autosolutionsai-didac/as-autoresearch-loopAutoresearch Loop Skill
Karpathy's autoresearch methodology applied to improving Claude skills, n8n workflows, system prompts, and business processes.
Core idea: Define what "better" means. Lock everything except the artifact being improved. Propose a change → test → measure → keep or discard → repeat until a stopping condition is met.
When NOT to use this loop:
- You can't define a single measurable metric (e.g. "improve my writing style" — too subjective)
- The artifact is too large to evaluate cheaply in a fixed budget
- There's no fixed eval set (or you can't create one) — without a stable yardstick, you're just guessing
- You need to improve two interdependent artifacts simultaneously — do them sequentially instead
- The artifact is a one-time document (a single client proposal, a one-off report) — the loop is for artifacts that will be reused and improved over time. A one-time deliverable has no future eval value; just write it well directly
If you can't answer "what number tells me if this experiment worked?", stop and define that first.
The methodology is format-agnostic: The loop works for any artifact type — code, prompts, documents, design systems, API configurations, process specs — as long as you can define an artifact, a metric, and a repeatable eval. For novel artifact types not covered by the examples below: walk through the setup phase (artifact → metric → eval → budget) and creatively define each. A Figma component library's metric could be a checklist pass rate (accessibility, consistency, coverage); its eval could be test scenarios ("render a data table", "create a form with validation states") scored against that checklist. Start with a small eval (5–10 test cases) to validate the metric produces meaningful signal before committing to a full campaign.
Setup Phase
Before the loop starts, establish these five things with the user:
1. The Artifact (What You're Improving)
The single file, document, workflow, or process being iteratively modified. Think of this as train.py in Karpathy's repo — the one thing the agent edits.
Examples:
- A
SKILL.mdfile - An n8n workflow JSON
- A system prompt
- An SOP document
- A business process description
Fixed files: Identify what must NOT change — the evaluation criteria, input test cases, external integrations. These are your prepare.py.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-autosolutionsai-didac-as-autoresearch-loop": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
agent-memory-setup
Set up the full OpenClaw agent memory system with 3-tier memory (HOT/WARM/COLD), daily logs, semantic search (QMD), and lossless context management (Lossless Claw). Use when onboarding a new agent, setting up memory for a fresh OpenClaw instance, or when asked to install the memory system on a new agent. Triggers on "set up memory", "install memory system", "onboard new agent memory", "memory setup", "agent onboarding", "configure agent memory", "add memory to my agent", "how do I set up memory", "initialize memory", "memory system for OpenClaw".
agent-memory-setup-v2
Create a 3-tier memory directory structure (HOT/WARM/COLD) for OpenClaw agents and configure the built-in memory-core plugin to use Google Gemini Embeddings 2 (gemini-embedding-2-preview) for semantic memory search. Creates memory/ directories and stub files only — no code execution or external API calls from the setup script. After setup, the agent's memory_search tool uses Gemini's cloud embedding API to index memory files. Requires a free Google Gemini API key. Use when setting up a new agent's memory system or asked about semantic memory search. Triggers on "set up memory", "memory setup", "agent memory", "gemini memory", "semantic search memory", "onboard new agent".
gamma
Create presentations, documents, social posts, and web pages via the Gamma.app API. Use when asked to create a presentation, pitch deck, slide deck, document, social media carousel, or webpage using Gamma. Also use when asked to generate slides, export to PDF/PPTX, or create content from a Gamma template. Triggers on "create a presentation", "make a deck", "gamma", "slides", "pitch deck", "create a document in gamma".
agent-memory-setup
Set up the full OpenClaw agent memory system with 3-tier memory (HOT/WARM/COLD), daily logs, semantic search (QMD), and lossless context management (Lossless Claw). Use when onboarding a new agent, setting up memory for a fresh OpenClaw instance, or when asked to install the memory system on a new agent. Triggers on "set up memory", "install memory system", "onboard new agent memory", "memory setup", "agent onboarding", "configure agent memory", "add memory to my agent", "how do I set up memory", "initialize memory", "memory system for OpenClaw".
deep-research
Conduct deep multi-phase research using parallel subagents and iterative search. Use for deep research requests, comprehensive analysis, competitive intelligence, market research, or thorough investigation of complex topics.