sre
SRE expert for incident response, production troubleshooting, root cause analysis, post-mortems, and runbooks. Use for outages, performance issues, or SEV incidents.
Why use this skill?
Optimize your production stability with the SRE skill for OpenClaw. Expert-level guidance for incident response, root cause analysis, and professional post-mortem generation.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/anton-abyzov/sw-sreWhat This Skill Does
The SRE skill transforms your OpenClaw agent into a high-level Site Reliability Engineering expert. Designed to handle the pressures of production environments, this skill provides structured support for incident response, real-time performance troubleshooting, root cause analysis (RCA), and the creation of formal post-mortem reports. Whether you are dealing with a critical SEV-1 outage or investigating subtle latency spikes in a distributed microservices architecture, the SRE agent helps you maintain system stability by guiding you through industry-standard methodologies.
Installation
To integrate the SRE expert into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/anton-abyzov/sw-sre
Ensure you have the latest version of OpenClaw installed to maintain compatibility with the repository features.
Use Cases
- Incident Response: Immediate guidance during system outages or performance degradation, providing step-by-step triage protocols.
- Root Cause Analysis: Parsing logs and metrics to identify the underlying source of failure, moving beyond superficial symptoms to architectural weaknesses.
- Post-Mortem Documentation: Drafting professional, comprehensive incident reports that fulfill organizational compliance and learning requirements.
- Runbook Generation: Creating automated or manual operation guides to ensure repeatable resolution for recurring system issues.
Example Prompts
- "We are seeing a 500-series error spike on the payment gateway following the latest deployment; please help me triage the logs to find the root cause."
- "Draft a post-mortem document for the database outage we experienced yesterday, focusing on the mitigation timeline and preventive measures for connection pooling."
- "Create a standard operating procedure (SOP) runbook for clearing stale cache clusters in our Redis instance during peak traffic hours."
Tips & Limitations
When dealing with massive datasets or extensive multi-layered incidents, the SRE agent utilizes an incremental generation strategy. If your report exceeds 1000 lines, the agent will pause and ask you which phase (Triage, RCA, Mitigation, or Prevention) you would like to proceed with next. This mechanism is critical for maintaining stability during complex generation tasks. Always verify the agent's output against your production environment's specific topology, as context regarding your private infrastructure remains the user's responsibility.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-anton-abyzov-sw-sre": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, code-execution
Related Skills
network-engineer
Cloud network architect for VPC design, service mesh, zero-trust networking, load balancers, and CDN optimization. Use for network troubleshooting or connectivity issues.
jira-multi-project-mapper
Expert in mapping SpecWeave specs to multiple JIRA projects with intelligent project detection and cross-project coordination. Use when syncing to multiple JIRA projects (project-per-team, component-based), or managing bidirectional sync across team boundaries.
helm-chart-scaffolding
Design, organize, and manage Helm charts for templating and packaging Kubernetes applications with reusable configurations. Use when creating Helm charts, packaging Kubernetes applications, or implementing templated deployments.
performance-optimization
React Native performance with Hermes V1, FlashList, expo-image v2, concurrent rendering. Use for slow app, memory leaks, or FPS issues.
release-strategy-advisor
Release strategy advisor - detects brownfield patterns (tags, CI/CD, changelogs), recommends versioning strategy based on architecture. Creates release-strategy.md.