autoresearch
Autonomous AI research skill for running automated neural network experiments. This skill should be used when the user wants to set up autonomous AI research experiments, run automated neural network training, conduct autonomous machine learning research, or let AI agents experiment with model architectures and hyperparameters. Based on Andrej Karpathy's autoresearch project, this skill enables AI agents to autonomously modify training code, run experiments, evaluate results, and iteratively improve models. Use when: (1) Setting up autonomous research experiments, (2) Running automated neural network training, (3) Conducting AI-driven research optimization, (4) Experimenting with model architectures and hyperparameters, (5) Implementing autonomous research loops, or (6) When the user mentions "autonomous research", "AI experiments", "automated training", "neural network optimization", or "autoresearch".
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/baiyunrei2025/autoresearch-karpathyAutoresearch Skill
This skill enables autonomous AI research experiments based on Andrej Karpathy's autoresearch project. It allows AI agents to autonomously modify neural network training code, run experiments, evaluate results, and iteratively improve models.
Core Concept
The idea: give an AI agent a small but real LLM training setup and let it experiment autonomously. The agent modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats. You can leave it running overnight and wake up to a log of experiments and (hopefully) a better model.
Key Files
The project has three core files:
prepare.py— Fixed constants, one-time data prep (downloads training data, trains a BPE tokenizer), and runtime utilities (dataloader, evaluation). Not modified.train.py— The single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW), and training loop. Everything is fair game: architecture, hyperparameters, optimizer, batch size, etc. This file is edited and iterated on by the agent.program.md— Baseline instructions for the agent. This file is edited and iterated on by the human.
Requirements
- Single NVIDIA GPU (tested on H100)
- Python 3.10+
- uv package manager
Quick Start Workflow
Phase 1: Initial Setup
-
Clone the repository (if not already done):
git clone https://github.com/karpathy/autoresearch.git cd autoresearch -
Install dependencies:
uv sync -
Prepare data (one-time setup):
uv run prepare.py
Phase 2: Experiment Setup
- Agree on a run tag (e.g., based on date like
mar20) - Create a new branch:
git checkout -b autoresearch/<tag> - Initialize results file:
echo -e "commit\tval_bpb\tmemory_gb\tstatus\tdescription" > results.tsv
Phase 3: Autonomous Experimentation Loop
The agent follows this loop indefinitely:
LOOP FOREVER:
1. Look at current git state
2. Modify train.py with experimental idea
3. git commit
4. Run experiment: uv run train.py > run.log 2>&1
5. Extract results: grep "^val_bpb:\|^peak_vram_mb:" run.log
6. If crash → analyze logs and fix or mark as crash
7. Record results in results.tsv
8. If improved → keep commit
9. If not improved → git reset
Key Metrics
- val_bpb (validation bits per byte) — Lower is better, vocab-size-independent
- Training time — Fixed 5-minute budget per experiment
- Peak VRAM — Memory usage in GB
- Status —
keep,discard, orcrash
Constraints
What the agent CAN do:
- Modify
train.py(architecture, optimizer, hyperparameters, training loop, etc.) - Experiment with different model configurations
- Run training experiments autonomously
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-baiyunrei2025-autoresearch-karpathy": {
"enabled": true,
"auto_update": true
}
}
}