Official Verified developer tools Safety 5/5

slo-implementation

Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.

Why use this skill?

Learn to define reliable service metrics, calculate error budgets, and automate SRE practices with the slo-implementation skill for OpenClaw agents.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/anton-abyzov/sw-slo-implementation

Download Source Code (.zip)

What This Skill Does

The slo-implementation skill acts as a comprehensive SRE (Site Reliability Engineering) co-pilot within the OpenClaw ecosystem. It provides a standardized framework for engineering teams to define, track, and enforce service reliability. By automating the transition from abstract reliability goals to concrete technical implementation, this skill bridges the gap between business objectives and infrastructure reality. It empowers users to define Service Level Indicators (SLIs), establish Service Level Objectives (SLOs), and mathematically derive error budgets, ensuring that innovation velocity remains balanced with service stability. The skill facilitates the creation of complex Prometheus-compatible queries for availability, latency, and durability, providing a structured approach to managing production health.

Installation

To integrate this skill into your environment, use the OpenClaw command-line interface or the integrated package manager. Run the following command:

clawhub install openclaw/skills/skills/anton-abyzov/sw-slo-implementation

Ensure that you have appropriate permissions to apply configuration files if you intend to push these definitions to your monitoring stack directly.

Use Cases

Service Reliability Audits: Use this skill to evaluate your existing infrastructure and determine if your current monitoring coverage is sufficient to meet business SLAs.
Error Budget Management: Automatically calculate how much room you have for error within a rolling 28-day window before triggering a feature freeze.
Alerting Strategy: Transition from reactive, symptom-based alerting to proactive, budget-based alerting by defining thresholds that account for the user experience.
Cross-Team Communication: Standardize terminology across product and engineering teams using the provided SLI/SLO/SLA hierarchy for improved alignment.

Example Prompts

"Analyze my current API traffic and help me construct a 99.9% availability SLI using this Prometheus expression."
"Generate a YAML error budget policy that triggers a slack alert when we have consumed 80% of our monthly error budget for the checkout service."
"Calculate the monthly downtime allowed for a 99.99% latency target and explain how this impacts our deployment strategy for the next quarter."

Tips & Limitations

Start Simple: Don't try to track too many SLOs initially. Start with high-impact services (e.g., login, checkout) before expanding to background workers.
Context Matters: Remember that an SLI is a measurement, not a goal. Your SLO should reflect the user experience, not just the technical feasibility.
Query Complexity: While this skill provides excellent templates, always validate your PromQL expressions against your actual metrics data to ensure labels and metric names match your specific instrumentation.
Rolling Windows: The standard 28-day window is recommended for most web services, but high-velocity environments may require shorter intervals for tighter feedback loops.

Read Full Documentation on GitHub

Metadata

Author@anton-abyzov

Stars1100

Updated2026-02-17

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-anton-abyzov-sw-slo-implementation": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#sre#reliability#monitoring#prometheus#devops

Safety Score: 5/5

Related Skills

network-engineer

Cloud network architect for VPC design, service mesh, zero-trust networking, load balancers, and CDN optimization. Use for network troubleshooting or connectivity issues.

anton-abyzov 1100

jira-multi-project-mapper

Expert in mapping SpecWeave specs to multiple JIRA projects with intelligent project detection and cross-project coordination. Use when syncing to multiple JIRA projects (project-per-team, component-based), or managing bidirectional sync across team boundaries.

anton-abyzov 1100

helm-chart-scaffolding

Design, organize, and manage Helm charts for templating and packaging Kubernetes applications with reusable configurations. Use when creating Helm charts, packaging Kubernetes applications, or implementing templated deployments.

anton-abyzov 1100

performance-optimization

React Native performance with Hermes V1, FlashList, expo-image v2, concurrent rendering. Use for slow app, memory leaks, or FPS issues.

anton-abyzov 1100

release-strategy-advisor

Release strategy advisor - detects brownfield patterns (tags, CI/CD, changelogs), recommends versioning strategy based on architecture. Creates release-strategy.md.

anton-abyzov 1100