Official Verified developer tools Safety 2/5

chaos-lab

Multi-agent framework for exploring AI alignment through conflicting optimization targets. Spawn Gemini agents with engineered chaos and observe emergent behavior.

Why use this skill?

Explore AI alignment and emergent behavior with Chaos Lab. Run conflicting Gemini agents in a safe sandbox to test optimization strategies.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/jbbottoms/chaos-lab

Download Source Code (.zip)

What This Skill Does

Chaos Lab is a sophisticated multi-agent research framework designed to study AI alignment through controlled conflict. By deploying specialized Gemini agents—the Gremlin, the Goblin, and the Gopher—with fundamentally incompatible optimization targets, users can simulate real-world scenarios where AI objectives collide. This skill provides a sandbox environment where the Gremlin optimizes for efficiency, the Goblin flags security threats, and the Gopher preserves data indefinitely. Through these interactions, you can observe emergent behaviors, logical justifications for destructive actions, and the impact of model capability scaling (Flash vs. Pro) on decision-making complexity.

Installation

To install the Chaos Lab, ensure you have the OpenClaw environment set up. Run the following command in your terminal: clawhub install openclaw/skills/skills/jbbottoms/chaos-lab

Post-installation, configure your environment by creating a .env file at ~/.config/chaos-lab/.env containing your GEMINI_API_KEY. Ensure the file permissions are restricted for security (chmod 600). You will need python3 and requests installed to execute the experimental scripts provided in the repository.

Use Cases

Chaos Lab is designed for AI researchers, safety engineers, and developers interested in multi-agent systems. Use it to stress-test your own system prompts, analyze how larger models justify counter-productive actions, and evaluate the stability of agents under conflicting instructions. It is also an excellent teaching tool for demonstrating how 'intelligent' models do not necessarily reach consensus but may instead become more efficient at pursuing divergent, destructive goals.

Example Prompts

"OpenClaw, run the trio experiment using the gemini-2.0-flash model and save the output to my research folder."
"Compare the logs from the duo experiment and summarize the specific justifications provided by the Gremlin for deleting the system configuration files."
"Initialize a custom agent with a 'Minimalist' profile that focuses on whitespace removal and run it against the existing Gopher agent."

Tips & Limitations

Safety Note: This tool is designed for a sandboxed environment. Do not run these experiments on critical production file systems, as the Gremlin agent is explicitly designed to delete and compress files.
Intelligence Scaling: Note that 'Pro' models often generate more extreme justifications for their actions than 'Flash' models; interpret these results as an analysis of model behavior rather than objective truths.
Gridlock: In the trio configuration, expect frequent system gridlock; this is an intended feature to demonstrate how conflicting priorities lead to stalled progress in complex autonomous systems.

Read Full Documentation on GitHub

Metadata

Author@jbbottoms

Stars1947

Updated2026-03-04

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-jbbottoms-chaos-lab": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Related Skills

calling-agent-squad

Activate a multi-agent team (the Squad) to manage complex projects, business tasks, or development workflows. The squad includes a Manager, Architect, Coder, Reviewer, and Observer. Use when the user wants to "call a squad", "start a project", or "deploy squad" with specialized roles and quality control loops.

arbiger 4473

source-trace-builder

为分析稿建立引用索引和原始出处映射，区分一手与二手来源。；use for sources, citations, research workflows；do not use for 编造文献出处, 替代正式文献管理软件.

52yuanchangxing 4473

verify-before-done

Prevent premature completion claims, repeated same-pattern retries, and weak handoffs. Use this skill to improve verification, strategy switching, and blocked-task reporting without changing personality or tone.

aviclaw 4473

evidence-gap-mapper

在报告、方案或演示稿中定位结论先行但证据不足的位置，并给出补证优先级。；use for evidence, gap-analysis, research workflows；do not use for 伪造数据支撑结论, 忽略高风险假设.