ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified developer tools Safety 2/5

autoresearch-agent

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/alirezarezvani/autoresearch-agent
Or

What This Skill Does

The Autoresearch Agent is an autonomous optimization system inspired by Andrej Karpathy's research-loop methodology. It treats code, text, or configuration optimization as an iterative experiment process. By combining a target file, a quantifiable metric, and a shell command that reports that metric, the agent creates a closed-loop system: it performs an edit, runs the evaluation command, and uses git to compare the result. If the metric improves, it keeps the change; if the metric regresses or the code crashes, it discards the change via git reset. This process continues indefinitely, allowing you to iterate on complex tasks like API latency reduction, bundle size minimization, or prompt engineering while you focus on other work.

Installation

You can integrate this agent into your workflow using the OpenClaw hub CLI:

clawhub install openclaw/skills/skills/alirezarezvani/autoresearch-agent

Once installed, initialize your first experiment by running python scripts/setup_experiment.py and choosing your scope. Project-level scopes are recommended for team-shared benchmarks, while user-level scopes are better for personal utility tasks.

Use Cases

  • Software Performance: Automatically tune loops, concurrency, or algorithms to reduce latency or CPU usage.
  • Content Marketing: Optimize article headlines or marketing copy by hooking into an A/B test listener or an LLM-based sentiment evaluator.
  • Prompt Engineering: Iteratively adjust system prompts to improve the output quality of other LLMs against a set of validation test cases.
  • Cost Optimization: Reduce cloud resource consumption by tuning configuration files to use smaller machine instances while maintaining service health.

Example Prompts

  1. "I need to speed up the process_images function in src/utils.py. Please set up an autoresearch loop to minimize execution time. I have a benchmark script at bench.py."
  2. "My model's classification accuracy on the validation set is currently 82%. Can you start an autoresearch loop to tweak the system_prompt.txt file and see if we can push that above 85%?"
  3. "Run an experiment overnight to reduce our bundle.js size by changing the build configuration settings. Use npm run check-size as the evaluation command."

Tips & Limitations

  • Evaluation Precision: The quality of the outcome is strictly bound to the quality of your evaluation command. If your metric is noisy, the agent may accidentally keep suboptimal changes.
  • Git Hygiene: Ensure your target file is inside a clean git repository. The agent relies heavily on git reset to revert failed experiments.
  • Cost Awareness: Running autonomous loops continuously can incur costs if your evaluation commands trigger cloud API calls or compute-intensive workloads. Set realistic loop intervals to manage these costs effectively.

Metadata

Stars3809
Views0
Updated2026-04-05
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-alirezarezvani-autoresearch-agent": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#optimization#automation#benchmarking#experimentation#developer-productivity
Safety Score: 2/5

Flags: file-write, file-read, code-execution

Related Skills

chro-advisor

People leadership for scaling companies. Hiring strategy, compensation design, org structure, culture, and retention. Use when building hiring plans, designing comp frameworks, restructuring teams, managing performance, building culture, or when user mentions CHRO, HR, people strategy, talent, headcount, compensation, org design, retention, or performance management.

alirezarezvani 3809

change-management

Framework for rolling out organizational changes without chaos. Covers the ADKAR model adapted for startups, communication templates, resistance patterns, and change fatigue management. Handles process changes, org restructures, strategy pivots, and culture changes. Use when announcing a reorg, switching tools, pivoting strategy, killing a product, changing leadership, or when user mentions change management, change rollout, managing resistance, org change, reorg, or pivot communication.

alirezarezvani 3809

ab-test-setup

When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "conversion experiment," "statistical significance," or "test this." For tracking implementation, see analytics-tracking.

alirezarezvani 3809

copywriting

When the user wants to write, rewrite, or improve marketing copy for any page — including homepage, landing pages, pricing pages, feature pages, about pages, or product pages. Also use when the user says "write copy for," "improve this copy," "rewrite this page," "marketing copy," "headline help," or "CTA copy." For email copy, see email-sequence. For popup copy, see popup-cro.

alirezarezvani 3809

pr-review-expert

PR Review Expert

alirezarezvani 3809