observability-engineer
Observability architect - OpenTelemetry-first, Prometheus+Grafana stack, SLIs/SLOs, alert fatigue prevention. Use for metrics, logs, traces setup.
Why use this skill?
Master your monitoring stack with the Observability Engineer skill. Implement OpenTelemetry, Prometheus, and Grafana to build robust SLIs, SLOs, and incident alerts.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/anton-abyzov/sw-observability-engineerWhat This Skill Does
The Observability Engineer skill is an enterprise-grade architectural assistant specialized in designing and implementing full-stack observability solutions. It adheres to an 'OpenTelemetry-first' philosophy, ensuring that your telemetry data is vendor-agnostic and scalable. The skill focuses on the core pillars of observability—metrics, logs, and traces—while integrating seamlessly with the Prometheus and Grafana ecosystem. Beyond simple setup, this skill helps define Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure your system performance aligns with business requirements. A key focus is alert fatigue prevention; the skill provides best practices for alert suppression, severity escalation, and noise reduction in your notification pipelines.
Installation
To integrate this skill into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/anton-abyzov/sw-observability-engineer
Use Cases
- Infrastructure Monitoring: Designing comprehensive Prometheus exporters for Kubernetes and cloud-native services.
- Performance Tuning: Using OpenTelemetry traces to identify latency bottlenecks in microservices architectures.
- Incident Management: Configuring Grafana dashboards and Alertmanager rules to reduce MTTR (Mean Time To Recovery).
- Reliability Engineering: Formalizing SLIs/SLOs to track and maintain system availability.
- Log Management: Structured logging strategies to ensure high cardinality data is searchable and cost-effective.
Example Prompts
- "Analyze my current microservices architecture and suggest an OpenTelemetry implementation strategy to capture distributed traces without impacting performance."
- "Help me define SLIs and SLOs for our user authentication service, and generate a Prometheus alerting rule configuration for latency breaches."
- "My Grafana dashboards are cluttered and producing too many alerts. Can you help me audit my current alerting stack and suggest ways to reduce noise?"
Tips & Limitations
- Chunking Rule: Observability stacks are inherently complex. To ensure high-quality output, this skill follows a strict modular delivery policy. Ask for components individually: Metrics first, then Dashboards, Alerting, Tracing, and finally Logs. Attempting to generate an entire stack at once will trigger a warning to break your request into manageable chunks.
- Compliance: Ensure your log scrubbing configurations are robust if you handle PII data to remain GDPR/SOC2 compliant.
- Resource Usage: Be mindful of the overhead associated with deep-packet or high-cardinality tracing; always scope your collection strategies.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-anton-abyzov-sw-observability-engineer": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: code-execution, file-read, file-write
Related Skills
network-engineer
Cloud network architect for VPC design, service mesh, zero-trust networking, load balancers, and CDN optimization. Use for network troubleshooting or connectivity issues.
jira-multi-project-mapper
Expert in mapping SpecWeave specs to multiple JIRA projects with intelligent project detection and cross-project coordination. Use when syncing to multiple JIRA projects (project-per-team, component-based), or managing bidirectional sync across team boundaries.
helm-chart-scaffolding
Design, organize, and manage Helm charts for templating and packaging Kubernetes applications with reusable configurations. Use when creating Helm charts, packaging Kubernetes applications, or implementing templated deployments.
performance-optimization
React Native performance with Hermes V1, FlashList, expo-image v2, concurrent rendering. Use for slow app, memory leaks, or FPS issues.
release-strategy-advisor
Release strategy advisor - detects brownfield patterns (tags, CI/CD, changelogs), recommends versioning strategy based on architecture. Creates release-strategy.md.