ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified developer tools Safety 3/5

devops-ops-bot

Server health monitoring with alerts and auto-recovery. Checks CPU, memory, disk, and uptime with configurable thresholds. Sends Slack/Discord alerts and can auto-restart services on critical.

Why use this skill?

Monitor server CPU, memory, and disk usage with the devops-ops-bot. Get real-time alerts and trigger automated service restarts for your infrastructure.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/gruted/devops-ops-bot
Or

What This Skill Does

The devops-ops-bot is a robust, lightweight command-line interface (CLI) tool designed for proactive server health monitoring. As an OpenClaw AI agent skill, it allows the AI to monitor the vital signs of your infrastructure, including CPU load, memory utilization, disk usage, and system uptime. Unlike passive monitoring tools, this bot provides configurable threshold-based alerts that can differentiate between 'ok', 'warn', and 'crit' states.

Beyond simple monitoring, the bot facilitates automated incident response. It can dispatch real-time alerts to Slack or Discord via webhook integration, ensuring your team is notified immediately when a system threshold is breached. Perhaps most powerfully, it supports auto-recovery workflows, enabling the agent to trigger custom commands (such as a service restart via systemctl) when a critical state is detected. With its JSON output capability, it integrates seamlessly into existing log aggregation pipelines, making it a professional-grade addition to any DevOps toolkit.

Installation

To integrate this skill into your environment via the OpenClaw ecosystem, execute the following command in your terminal: clawhub install openclaw/skills/skills/gruted/devops-ops-bot

Alternatively, you can install it globally via npm using npm install -g @gruted/devops-ops-bot or by using the provided one-liner installation script. Docker images are also available under ghcr.io/gruted/devops-ops-bot:latest for ephemeral or containerized monitoring tasks.

Use Cases

  • Automated Service Recovery: Automatically restart a crashed Nginx or database service when CPU or memory consumption spikes past a critical threshold, reducing manual intervention.
  • Performance Trending: Use the JSON output feature to feed system stats into a centralized dashboard, helping you identify slow performance degradations over time.
  • Alert Fatigue Management: Configure custom warnings to receive low-priority notifications for memory spikes, while reserving critical alerts for total system failure or service outages.

Example Prompts

  1. "OpenClaw, run a health check on my local server and report the status. If any metric is critical, restart the nginx service immediately."
  2. "Monitor the current CPU and disk usage thresholds and send an alert to my Slack webhook if memory usage exceeds 85%."
  3. "Set up a cron job to perform a health check every 5 minutes and output the data in JSON format so I can track the performance logs."

Tips & Limitations

  • Security: Since this skill can execute shell commands (e.g., via --restart-cmd), ensure the user running the OpenClaw agent has the appropriate permissions but is restricted enough to avoid unintended system-wide impact.
  • Alerting: Always verify your webhook URLs. Incorrect configuration will cause the bot to fail silently regarding notifications.
  • Threshold Tuning: Start with the default thresholds before narrowing them down; setting thresholds too aggressively may lead to flapping services where a process is restarted unnecessarily during minor, transient spikes.

Metadata

Author@gruted
Stars2387
Views3
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-gruted-devops-ops-bot": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#devops#monitoring#automation#infrastructure#system-admin
Safety Score: 3/5

Flags: network-access, external-api, code-execution