ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified developer tools Safety 4/5

observability-engineer

Observability architect - OpenTelemetry-first, Prometheus+Grafana stack, SLIs/SLOs, alert fatigue prevention. Use for metrics, logs, traces setup.

Why use this skill?

Master your monitoring stack with the Observability Engineer skill. Implement OpenTelemetry, Prometheus, and Grafana to build robust SLIs, SLOs, and incident alerts.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/anton-abyzov/sw-observability-engineer
Or

What This Skill Does

The Observability Engineer skill is an enterprise-grade architectural assistant specialized in designing and implementing full-stack observability solutions. It adheres to an 'OpenTelemetry-first' philosophy, ensuring that your telemetry data is vendor-agnostic and scalable. The skill focuses on the core pillars of observability—metrics, logs, and traces—while integrating seamlessly with the Prometheus and Grafana ecosystem. Beyond simple setup, this skill helps define Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure your system performance aligns with business requirements. A key focus is alert fatigue prevention; the skill provides best practices for alert suppression, severity escalation, and noise reduction in your notification pipelines.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/anton-abyzov/sw-observability-engineer

Use Cases

  • Infrastructure Monitoring: Designing comprehensive Prometheus exporters for Kubernetes and cloud-native services.
  • Performance Tuning: Using OpenTelemetry traces to identify latency bottlenecks in microservices architectures.
  • Incident Management: Configuring Grafana dashboards and Alertmanager rules to reduce MTTR (Mean Time To Recovery).
  • Reliability Engineering: Formalizing SLIs/SLOs to track and maintain system availability.
  • Log Management: Structured logging strategies to ensure high cardinality data is searchable and cost-effective.

Example Prompts

  1. "Analyze my current microservices architecture and suggest an OpenTelemetry implementation strategy to capture distributed traces without impacting performance."
  2. "Help me define SLIs and SLOs for our user authentication service, and generate a Prometheus alerting rule configuration for latency breaches."
  3. "My Grafana dashboards are cluttered and producing too many alerts. Can you help me audit my current alerting stack and suggest ways to reduce noise?"

Tips & Limitations

  • Chunking Rule: Observability stacks are inherently complex. To ensure high-quality output, this skill follows a strict modular delivery policy. Ask for components individually: Metrics first, then Dashboards, Alerting, Tracing, and finally Logs. Attempting to generate an entire stack at once will trigger a warning to break your request into manageable chunks.
  • Compliance: Ensure your log scrubbing configurations are robust if you handle PII data to remain GDPR/SOC2 compliant.
  • Resource Usage: Be mindful of the overhead associated with deep-packet or high-cardinality tracing; always scope your collection strategies.

Metadata

Stars1100
Views0
Updated2026-02-17
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-anton-abyzov-sw-observability-engineer": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#observability#devops#telemetry#monitoring#reliability
Safety Score: 4/5

Flags: code-execution, file-read, file-write