autoresearch-agent
Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/alirezarezvani/autoresearch-agentWhat This Skill Does
The Autoresearch Agent is an autonomous optimization system inspired by Andrej Karpathy's research-loop methodology. It treats code, text, or configuration optimization as an iterative experiment process. By combining a target file, a quantifiable metric, and a shell command that reports that metric, the agent creates a closed-loop system: it performs an edit, runs the evaluation command, and uses git to compare the result. If the metric improves, it keeps the change; if the metric regresses or the code crashes, it discards the change via git reset. This process continues indefinitely, allowing you to iterate on complex tasks like API latency reduction, bundle size minimization, or prompt engineering while you focus on other work.
Installation
You can integrate this agent into your workflow using the OpenClaw hub CLI:
clawhub install openclaw/skills/skills/alirezarezvani/autoresearch-agent
Once installed, initialize your first experiment by running python scripts/setup_experiment.py and choosing your scope. Project-level scopes are recommended for team-shared benchmarks, while user-level scopes are better for personal utility tasks.
Use Cases
- Software Performance: Automatically tune loops, concurrency, or algorithms to reduce latency or CPU usage.
- Content Marketing: Optimize article headlines or marketing copy by hooking into an A/B test listener or an LLM-based sentiment evaluator.
- Prompt Engineering: Iteratively adjust system prompts to improve the output quality of other LLMs against a set of validation test cases.
- Cost Optimization: Reduce cloud resource consumption by tuning configuration files to use smaller machine instances while maintaining service health.
Example Prompts
- "I need to speed up the
process_imagesfunction insrc/utils.py. Please set up an autoresearch loop to minimize execution time. I have a benchmark script atbench.py." - "My model's classification accuracy on the validation set is currently 82%. Can you start an autoresearch loop to tweak the
system_prompt.txtfile and see if we can push that above 85%?" - "Run an experiment overnight to reduce our
bundle.jssize by changing the build configuration settings. Usenpm run check-sizeas the evaluation command."
Tips & Limitations
- Evaluation Precision: The quality of the outcome is strictly bound to the quality of your evaluation command. If your metric is noisy, the agent may accidentally keep suboptimal changes.
- Git Hygiene: Ensure your target file is inside a clean git repository. The agent relies heavily on
git resetto revert failed experiments. - Cost Awareness: Running autonomous loops continuously can incur costs if your evaluation commands trigger cloud API calls or compute-intensive workloads. Set realistic loop intervals to manage these costs effectively.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-alirezarezvani-autoresearch-agent": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution
Related Skills
chro-advisor
People leadership for scaling companies. Hiring strategy, compensation design, org structure, culture, and retention. Use when building hiring plans, designing comp frameworks, restructuring teams, managing performance, building culture, or when user mentions CHRO, HR, people strategy, talent, headcount, compensation, org design, retention, or performance management.
change-management
Framework for rolling out organizational changes without chaos. Covers the ADKAR model adapted for startups, communication templates, resistance patterns, and change fatigue management. Handles process changes, org restructures, strategy pivots, and culture changes. Use when announcing a reorg, switching tools, pivoting strategy, killing a product, changing leadership, or when user mentions change management, change rollout, managing resistance, org change, reorg, or pivot communication.
ab-test-setup
When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "conversion experiment," "statistical significance," or "test this." For tracking implementation, see analytics-tracking.
copywriting
When the user wants to write, rewrite, or improve marketing copy for any page — including homepage, landing pages, pricing pages, feature pages, about pages, or product pages. Also use when the user says "write copy for," "improve this copy," "rewrite this page," "marketing copy," "headline help," or "CTA copy." For email copy, see email-sequence. For popup copy, see popup-cro.
pr-review-expert
PR Review Expert