wandb
Monitor and analyze Weights & Biases training runs. Use when checking training status, detecting failures, analyzing loss curves, comparing runs, or monitoring experiments. Triggers on "wandb", "training runs", "how's training", "did my run finish", "any failures", "check experiments", "loss curve", "gradient norm", "compare runs".
Why use this skill?
Automate your machine learning monitoring. Use OpenClaw to analyze W&B loss curves, compare experiments, and detect training failures across your projects.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/chrisvoncsefalvay/wandb-monitorWhat This Skill Does
The Weights & Biases (W&B) skill for OpenClaw is a robust monitoring and analytical utility designed to provide real-time oversight of machine learning experiments. It bridges the gap between raw cloud-based telemetry and actionable insights, allowing the agent to interpret training health, compare hyperparameter efficacy, and detect infrastructure failures without manual dashboard navigation. Whether you are managing long-running training jobs, debugging sudden gradient explosions, or conducting A/B tests between model architectures, this skill provides a structured interface to the W&B API.
Installation
To install this skill, use the following command within your OpenClaw environment:
clawhub install openclaw/skills/skills/chrisvoncsefalvay/wandb-monitor
Ensure that you have completed the initial authentication by running wandb login in your terminal or by setting the WANDB_API_KEY environment variable in your workspace configuration.
Use Cases
- Proactive Maintenance: Receive automated morning briefings using
watch_runs.pyto identify stalled jobs or crashed experiments before they impact your compute budget. - Deep Diagnostic Analysis: Perform granular health checks using
characterize_run.pyto inspect loss curves, gradient norms, and configuration overrides when a model underperforms. - Experimental Optimization: Utilize
compare_runs.pyto perform side-by-side evaluations of different training configurations, allowing the agent to isolate variables contributing to performance gains or losses. - Automated Reporting: Incorporate machine-readable JSON output into your CI/CD pipelines to trigger alerts or auto-scale resources based on the health verdict of a run.
Example Prompts
- "Check the status of my recent experiments in the 'nlp-research' project; any failures or stalls?"
- "Compare the training run 'gpt-v1-baseline' and 'gpt-v1-experimental'. Which one has a better loss curve and fewer gradient spikes?"
- "Is the 'training-run-445' still healthy? Analyze the gradient norm and tell me if it looks like it's exploding."
Tips & Limitations
- Threshold Awareness: The tool utilizes hard-coded health thresholds. Gradients exceeding 10 are flagged as critical (exploding), while heartbeats older than 30 minutes trigger a stall warning. Keep these in mind when interpreting agent responses.
- Metric Mapping: The skill automatically handles common naming variations for metrics (e.g.,
train/lossvsloss). If your custom logger uses highly unconventional naming, you may need to map these explicitly. - Compute Efficiency: Avoid excessive calls to
characterize_run.pywithin tight loops to prevent rate-limiting on the W&B API. For high-volume monitoring, prioritizewatch_runs.pyand parse the summarized output.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-chrisvoncsefalvay-wandb-monitor": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, external-api