Official Verified developer tools Safety 4/5

experiment-tracker

Manages ML experiment tracking with MLflow, Weights & Biases, or SpecWeave's built-in tracking. Activates for "track experiments", "MLflow", "wandb", "experiment logging", "compare experiments", "hyperparameter tracking". Automatically configures tracking tools to log to SpecWeave increment folders, ensuring all experiments are documented and reproducible. Integrates with SpecWeave's living docs for persistent experiment knowledge.

Why use this skill?

Manage and automate ML experiment tracking with MLflow, Weights & Biases, and SpecWeave. Ensure reproducible results with structured logging.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/anton-abyzov/sw-experiment-tracker

Download Source Code (.zip)

What This Skill Does

The experiment-tracker skill is a robust infrastructure component designed to bring order to the often chaotic workflow of machine learning research. By acting as an intelligent wrapper around standard logging tools like MLflow, Weights & Biases (W&B), or SpecWeave's own native tracking engine, it ensures that every experiment—from hyperparameters to resulting artifacts—is logged, versioned, and tied directly to a SpecWeave increment. It eliminates 'knowledge drift' by documenting the context behind model iterations within living, persistent documentation, ensuring that team members can trace back exactly why a model performed the way it did months after the initial training.

Installation

To integrate this skill into your environment, use the OpenClaw CLI tool. Ensure you have proper permissions to the project directory before running the following command:

clawhub install openclaw/skills/skills/anton-abyzov/sw-experiment-tracker

Once installed, the agent will automatically detect the presence of MLflow or W&B in your dependency tree and provide a unified interface to control them, or fall back to native logging if no external tools are found.

Use Cases

Model Reproducibility: Ensure that every model checkpoint can be recreated by mapping code commits to specific hyperparameter configurations.
Collaborative Research: Maintain a central repository of "decision logs" that explain why specific algorithms or features were selected, preventing redundant experimentation.
Hyperparameter Tuning: Automatically track parameter sweeps and compare metrics like accuracy, precision, and F1-score across different iterations using the built-in comparison engine.
Knowledge Transfer: When a team member leaves a project, the living documentation associated with the increment folder provides a complete historical narrative of the research journey.

Example Prompts

"Track the current experiment with MLflow and log the accuracy metric after the final epoch."
"Compare the experiments in this increment and generate a summary report of the best-performing model."
"Configure Weights & Biases for my latest training run and save the metadata to the current SpecWeave increment."

Tips & Limitations

Directory Hygiene: Always ensure your increment folders are properly initialized before running experiments; the tool relies on the SpecWeave directory structure to maintain traceability.
External APIs: If using W&B or MLflow as remote backends, ensure your environment variables (like WANDB_API_KEY) are securely set, as this skill facilitates the bridge between local code and cloud-based tracking servers.
Limitations: The skill is optimized for structured ML workflows; it may require custom configurations if you are using specialized non-standard deep learning frameworks that do not integrate cleanly with typical logger callbacks.

Read Full Documentation on GitHub

Metadata

Author@anton-abyzov

Stars1054

Updated2026-02-16

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-anton-abyzov-sw-experiment-tracker": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#machine-learning#mlops#experiment-tracking#reproducibility#data-science

Safety Score: 4/5

Flags: file-write, file-read, external-api

Related Skills

network-engineer

Cloud network architect for VPC design, service mesh, zero-trust networking, load balancers, and CDN optimization. Use for network troubleshooting or connectivity issues.

anton-abyzov 1100

jira-multi-project-mapper

Expert in mapping SpecWeave specs to multiple JIRA projects with intelligent project detection and cross-project coordination. Use when syncing to multiple JIRA projects (project-per-team, component-based), or managing bidirectional sync across team boundaries.

anton-abyzov 1100

helm-chart-scaffolding

Design, organize, and manage Helm charts for templating and packaging Kubernetes applications with reusable configurations. Use when creating Helm charts, packaging Kubernetes applications, or implementing templated deployments.

anton-abyzov 1100

performance-optimization

React Native performance with Hermes V1, FlashList, expo-image v2, concurrent rendering. Use for slow app, memory leaks, or FPS issues.

anton-abyzov 1100

release-strategy-advisor

Release strategy advisor - detects brownfield patterns (tags, CI/CD, changelogs), recommends versioning strategy based on architecture. Creates release-strategy.md.

anton-abyzov 1100