Official Verified developer tools Safety 2/5

autoresearch-agent

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/alirezarezvani/autoresearch-agent

Download Source Code (.zip)

What This Skill Does

The Autoresearch Agent is an autonomous optimization system inspired by Andrej Karpathy's research-loop methodology. It treats code, text, or configuration optimization as an iterative experiment process. By combining a target file, a quantifiable metric, and a shell command that reports that metric, the agent creates a closed-loop system: it performs an edit, runs the evaluation command, and uses git to compare the result. If the metric improves, it keeps the change; if the metric regresses or the code crashes, it discards the change via git reset. This process continues indefinitely, allowing you to iterate on complex tasks like API latency reduction, bundle size minimization, or prompt engineering while you focus on other work.

Installation

You can integrate this agent into your workflow using the OpenClaw hub CLI:

clawhub install openclaw/skills/skills/alirezarezvani/autoresearch-agent

Once installed, initialize your first experiment by running python scripts/setup_experiment.py and choosing your scope. Project-level scopes are recommended for team-shared benchmarks, while user-level scopes are better for personal utility tasks.

Use Cases

Software Performance: Automatically tune loops, concurrency, or algorithms to reduce latency or CPU usage.
Content Marketing: Optimize article headlines or marketing copy by hooking into an A/B test listener or an LLM-based sentiment evaluator.
Prompt Engineering: Iteratively adjust system prompts to improve the output quality of other LLMs against a set of validation test cases.
Cost Optimization: Reduce cloud resource consumption by tuning configuration files to use smaller machine instances while maintaining service health.

Example Prompts

"I need to speed up the process_images function in src/utils.py. Please set up an autoresearch loop to minimize execution time. I have a benchmark script at bench.py."
"My model's classification accuracy on the validation set is currently 82%. Can you start an autoresearch loop to tweak the system_prompt.txt file and see if we can push that above 85%?"
"Run an experiment overnight to reduce our bundle.js size by changing the build configuration settings. Use npm run check-size as the evaluation command."

Tips & Limitations

Evaluation Precision: The quality of the outcome is strictly bound to the quality of your evaluation command. If your metric is noisy, the agent may accidentally keep suboptimal changes.
Git Hygiene: Ensure your target file is inside a clean git repository. The agent relies heavily on git reset to revert failed experiments.
Cost Awareness: Running autonomous loops continuously can incur costs if your evaluation commands trigger cloud API calls or compute-intensive workloads. Set realistic loop intervals to manage these costs effectively.

Read Full Documentation on GitHub

Metadata

Author@alirezarezvani

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-alirezarezvani-autoresearch-agent": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#optimization#automation#benchmarking#experimentation#developer-productivity

Safety Score: 2/5

Flags: file-write, file-read, code-execution

Related Skills

intl-expansion

International market expansion strategy. Market selection, entry modes, localization, regulatory compliance, and go-to-market by region. Use when expanding to new countries, evaluating international markets, planning localization, or building regional teams.

alirezarezvani 4473

marketing-strategy-pmm

Product marketing skill for positioning, GTM strategy, competitive intelligence, and product launches. Use when the user asks about product positioning, go-to-market planning, competitive analysis, target audience definition, ICP definition, market research, launch plans, or sales enablement. Covers April Dunford positioning, ICP definition, competitive battlecards, launch playbooks, and international market entry. Produces deliverables including positioning statements, battlecard documents, launch plans, and go-to-market strategies.

alirezarezvani 4473

paid-ads

When the user wants help with paid advertising campaigns on Google Ads, Meta (Facebook/Instagram), LinkedIn, Twitter/X, or other ad platforms. Also use when the user mentions 'PPC,' 'paid media,' 'ad copy,' 'ad creative,' 'ROAS,' 'CPA,' 'ad campaign,' 'retargeting,' or 'audience targeting.' This skill covers campaign strategy, ad creation, audience targeting, and optimization.

alirezarezvani 4473

qms-audit-expert

ISO 13485 internal audit expertise for medical device QMS. Covers audit planning, execution, nonconformity classification, and CAPA verification. Use for internal audit planning, audit execution, finding classification, external audit preparation, or audit program management.

alirezarezvani 4473

code-reviewer

Code review automation for TypeScript, JavaScript, Python, Go, Swift, Kotlin. Analyzes PRs for complexity and risk, checks code quality for SOLID violations and code smells, generates review reports. Use when reviewing pull requests, analyzing code quality, identifying issues, generating review checklists.

alirezarezvani 4473