autoresearch-agent
Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/alirezarezvani/autoresearch-agentWhat This Skill Does
The Autoresearch Agent is an autonomous optimization system inspired by Andrej Karpathy's research-loop methodology. It treats code, text, or configuration optimization as an iterative experiment process. By combining a target file, a quantifiable metric, and a shell command that reports that metric, the agent creates a closed-loop system: it performs an edit, runs the evaluation command, and uses git to compare the result. If the metric improves, it keeps the change; if the metric regresses or the code crashes, it discards the change via git reset. This process continues indefinitely, allowing you to iterate on complex tasks like API latency reduction, bundle size minimization, or prompt engineering while you focus on other work.
Installation
You can integrate this agent into your workflow using the OpenClaw hub CLI:
clawhub install openclaw/skills/skills/alirezarezvani/autoresearch-agent
Once installed, initialize your first experiment by running python scripts/setup_experiment.py and choosing your scope. Project-level scopes are recommended for team-shared benchmarks, while user-level scopes are better for personal utility tasks.
Use Cases
- Software Performance: Automatically tune loops, concurrency, or algorithms to reduce latency or CPU usage.
- Content Marketing: Optimize article headlines or marketing copy by hooking into an A/B test listener or an LLM-based sentiment evaluator.
- Prompt Engineering: Iteratively adjust system prompts to improve the output quality of other LLMs against a set of validation test cases.
- Cost Optimization: Reduce cloud resource consumption by tuning configuration files to use smaller machine instances while maintaining service health.
Example Prompts
- "I need to speed up the
process_imagesfunction insrc/utils.py. Please set up an autoresearch loop to minimize execution time. I have a benchmark script atbench.py." - "My model's classification accuracy on the validation set is currently 82%. Can you start an autoresearch loop to tweak the
system_prompt.txtfile and see if we can push that above 85%?" - "Run an experiment overnight to reduce our
bundle.jssize by changing the build configuration settings. Usenpm run check-sizeas the evaluation command."
Tips & Limitations
- Evaluation Precision: The quality of the outcome is strictly bound to the quality of your evaluation command. If your metric is noisy, the agent may accidentally keep suboptimal changes.
- Git Hygiene: Ensure your target file is inside a clean git repository. The agent relies heavily on
git resetto revert failed experiments. - Cost Awareness: Running autonomous loops continuously can incur costs if your evaluation commands trigger cloud API calls or compute-intensive workloads. Set realistic loop intervals to manage these costs effectively.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-alirezarezvani-autoresearch-agent": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution
Related Skills
intl-expansion
International market expansion strategy. Market selection, entry modes, localization, regulatory compliance, and go-to-market by region. Use when expanding to new countries, evaluating international markets, planning localization, or building regional teams.
marketing-strategy-pmm
Product marketing skill for positioning, GTM strategy, competitive intelligence, and product launches. Use when the user asks about product positioning, go-to-market planning, competitive analysis, target audience definition, ICP definition, market research, launch plans, or sales enablement. Covers April Dunford positioning, ICP definition, competitive battlecards, launch playbooks, and international market entry. Produces deliverables including positioning statements, battlecard documents, launch plans, and go-to-market strategies.
paid-ads
When the user wants help with paid advertising campaigns on Google Ads, Meta (Facebook/Instagram), LinkedIn, Twitter/X, or other ad platforms. Also use when the user mentions 'PPC,' 'paid media,' 'ad copy,' 'ad creative,' 'ROAS,' 'CPA,' 'ad campaign,' 'retargeting,' or 'audience targeting.' This skill covers campaign strategy, ad creation, audience targeting, and optimization.
qms-audit-expert
ISO 13485 internal audit expertise for medical device QMS. Covers audit planning, execution, nonconformity classification, and CAPA verification. Use for internal audit planning, audit execution, finding classification, external audit preparation, or audit program management.
code-reviewer
Code review automation for TypeScript, JavaScript, Python, Go, Swift, Kotlin. Analyzes PRs for complexity and risk, checks code quality for SOLID violations and code smells, generates review reports. Use when reviewing pull requests, analyzing code quality, identifying issues, generating review checklists.