What This Skill Does

The Weights & Biases (W&B) skill for OpenClaw is a robust monitoring and analytical utility designed to provide real-time oversight of machine learning experiments. It bridges the gap between raw cloud-based telemetry and actionable insights, allowing the agent to interpret training health, compare hyperparameter efficacy, and detect infrastructure failures without manual dashboard navigation. Whether you are managing long-running training jobs, debugging sudden gradient explosions, or conducting A/B tests between model architectures, this skill provides a structured interface to the W&B API.

Installation

To install this skill, use the following command within your OpenClaw environment:

clawhub install openclaw/skills/skills/chrisvoncsefalvay/wandb-monitor

Ensure that you have completed the initial authentication by running wandb login in your terminal or by setting the WANDB_API_KEY environment variable in your workspace configuration.

Use Cases

Proactive Maintenance: Receive automated morning briefings using watch_runs.py to identify stalled jobs or crashed experiments before they impact your compute budget.
Deep Diagnostic Analysis: Perform granular health checks using characterize_run.py to inspect loss curves, gradient norms, and configuration overrides when a model underperforms.
Experimental Optimization: Utilize compare_runs.py to perform side-by-side evaluations of different training configurations, allowing the agent to isolate variables contributing to performance gains or losses.
Automated Reporting: Incorporate machine-readable JSON output into your CI/CD pipelines to trigger alerts or auto-scale resources based on the health verdict of a run.

Example Prompts

"Check the status of my recent experiments in the 'nlp-research' project; any failures or stalls?"
"Compare the training run 'gpt-v1-baseline' and 'gpt-v1-experimental'. Which one has a better loss curve and fewer gradient spikes?"
"Is the 'training-run-445' still healthy? Analyze the gradient norm and tell me if it looks like it's exploding."

Tips & Limitations

Threshold Awareness: The tool utilizes hard-coded health thresholds. Gradients exceeding 10 are flagged as critical (exploding), while heartbeats older than 30 minutes trigger a stall warning. Keep these in mind when interpreting agent responses.
Metric Mapping: The skill automatically handles common naming variations for metrics (e.g., train/loss vs loss). If your custom logger uses highly unconventional naming, you may need to map these explicitly.
Compute Efficiency: Avoid excessive calls to characterize_run.py within tight loops to prevent rate-limiting on the W&B API. For high-volume monitoring, prioritize watch_runs.py and parse the summarized output.

wandb

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)