Official Verified developer tools Safety 4/5

observability-engineer

Observability architect - OpenTelemetry-first, Prometheus+Grafana stack, SLIs/SLOs, alert fatigue prevention. Use for metrics, logs, traces setup.

Why use this skill?

Master your monitoring stack with the Observability Engineer skill. Implement OpenTelemetry, Prometheus, and Grafana to build robust SLIs, SLOs, and incident alerts.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/anton-abyzov/sw-observability-engineer

Download Source Code (.zip)

What This Skill Does

The Observability Engineer skill is an enterprise-grade architectural assistant specialized in designing and implementing full-stack observability solutions. It adheres to an 'OpenTelemetry-first' philosophy, ensuring that your telemetry data is vendor-agnostic and scalable. The skill focuses on the core pillars of observability—metrics, logs, and traces—while integrating seamlessly with the Prometheus and Grafana ecosystem. Beyond simple setup, this skill helps define Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure your system performance aligns with business requirements. A key focus is alert fatigue prevention; the skill provides best practices for alert suppression, severity escalation, and noise reduction in your notification pipelines.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/anton-abyzov/sw-observability-engineer

Use Cases

Infrastructure Monitoring: Designing comprehensive Prometheus exporters for Kubernetes and cloud-native services.
Performance Tuning: Using OpenTelemetry traces to identify latency bottlenecks in microservices architectures.
Incident Management: Configuring Grafana dashboards and Alertmanager rules to reduce MTTR (Mean Time To Recovery).
Reliability Engineering: Formalizing SLIs/SLOs to track and maintain system availability.
Log Management: Structured logging strategies to ensure high cardinality data is searchable and cost-effective.

Example Prompts

"Analyze my current microservices architecture and suggest an OpenTelemetry implementation strategy to capture distributed traces without impacting performance."
"Help me define SLIs and SLOs for our user authentication service, and generate a Prometheus alerting rule configuration for latency breaches."
"My Grafana dashboards are cluttered and producing too many alerts. Can you help me audit my current alerting stack and suggest ways to reduce noise?"

Tips & Limitations

Chunking Rule: Observability stacks are inherently complex. To ensure high-quality output, this skill follows a strict modular delivery policy. Ask for components individually: Metrics first, then Dashboards, Alerting, Tracing, and finally Logs. Attempting to generate an entire stack at once will trigger a warning to break your request into manageable chunks.
Compliance: Ensure your log scrubbing configurations are robust if you handle PII data to remain GDPR/SOC2 compliant.
Resource Usage: Be mindful of the overhead associated with deep-packet or high-cardinality tracing; always scope your collection strategies.

Read Full Documentation on GitHub

Metadata

Author@anton-abyzov

Stars1100

Updated2026-02-17

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-anton-abyzov-sw-observability-engineer": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#observability#devops#telemetry#monitoring#reliability

Safety Score: 4/5

Flags: code-execution, file-read, file-write

Related Skills

network-engineer

Cloud network architect for VPC design, service mesh, zero-trust networking, load balancers, and CDN optimization. Use for network troubleshooting or connectivity issues.

anton-abyzov 1100

jira-multi-project-mapper

Expert in mapping SpecWeave specs to multiple JIRA projects with intelligent project detection and cross-project coordination. Use when syncing to multiple JIRA projects (project-per-team, component-based), or managing bidirectional sync across team boundaries.

anton-abyzov 1100

helm-chart-scaffolding

Design, organize, and manage Helm charts for templating and packaging Kubernetes applications with reusable configurations. Use when creating Helm charts, packaging Kubernetes applications, or implementing templated deployments.

anton-abyzov 1100

performance-optimization

React Native performance with Hermes V1, FlashList, expo-image v2, concurrent rendering. Use for slow app, memory leaks, or FPS issues.

anton-abyzov 1100

release-strategy-advisor

Release strategy advisor - detects brownfield patterns (tags, CI/CD, changelogs), recommends versioning strategy based on architecture. Creates release-strategy.md.

anton-abyzov 1100