github-actions-recovery-latency-audit
Measure GitHub Actions failure recovery latency and unresolved incident age by workflow group.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/daniellummis/github-actions-recovery-latency-auditWhat This Skill Does
The github-actions-recovery-latency-audit skill is a sophisticated diagnostic tool designed to provide visibility into your CI/CD health. Unlike basic pass/fail metrics, this skill identifies the duration of "incident" windows, where a failure occurs and persists until a successful run is documented. It aggregates data by repository, workflow, branch, and event, allowing teams to distinguish between transient network issues and systemic regressions. By processing GitHub Actions run JSON exports, it calculates recovery latency (Time to Recovery) and monitors the age of unresolved incidents. Its automated scoring system assigns severity levels (ok, warn, critical) based on user-defined thresholds, facilitating proactive maintenance. This skill is essential for teams looking to enforce CI stability gates, as it can be configured to exit with error codes when critical thresholds are breached, preventing the merging of unstable configurations or failing to address long-standing blockers.
Installation
To integrate this skill into your environment, use the OpenClaw command-line interface. Run the following command in your terminal:
clawhub install openclaw/skills/skills/daniellummis/github-actions-recovery-latency-audit
Ensure that you have collected your run artifacts as JSON files prior to execution, as the skill operates on local filesystem exports. Refer to the skill's source repository at openclaw/skills for additional configuration details or to review the source scripts.
Use Cases
- DevOps Stability Monitoring: Identify which core services or repositories have high recovery latency, signaling a need for better test isolation or flaky-test remediation.
- CI/CD Gatekeeping: Use the
FAIL_ON_CRITICALflag in your pipeline to stop deployments when any critical incident remains open for more than the permitted hours, ensuring a "green-first" deployment policy. - Incident Management Post-mortems: Analyze historical recovery patterns to provide quantitative data for engineering reviews regarding pipeline reliability.
- Alert Fatigue Reduction: Use the
TOP_Nsetting to focus developer attention on the most problematic workflows rather than overwhelming the team with every minor transient failure.
Example Prompts
- "Analyze my GitHub Actions runs in artifacts/ and tell me which workflows have a recovery latency higher than 18 hours."
- "Run the recovery audit on my local fixtures using the critical thresholds of 36 hours for open incidents, and provide the output in JSON format."
- "Check if there are any critical CI failures that have been unresolved for more than 12 hours across the 'main' branch of all repositories."
Tips & Limitations
- Data Quality: The accuracy of the audit is entirely dependent on the quality of your JSON run exports. Always ensure the
gh run viewcommand includes necessary fields likeconclusion,createdAt, andworkflowName. - Deterministic Testing: Use the
NOW_ISOinput when debugging or testing your reporting logic to ensure consistent results across repeated executions. - Scalability: While the tool is efficient, ensure your
RUN_GLOBis scoped appropriately to avoid processing excessive amounts of stale history, which may impact execution time. - Network Scope: This skill does not automatically pull data from GitHub; you must provide the local files, which is a design choice to maintain security and avoid rate-limiting issues.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-daniellummis-github-actions-recovery-latency-audit": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, code-execution
Related Skills
github-actions-cache-hardening-audit
Audit GitHub Actions workflow cache usage for poisoning, keying, and secret-path risks.
render-env-guard
Preflight-check Render service environment variables before deploys; catches missing keys and placeholder/template values that commonly break production rollouts.
github-actions-trigger-health-audit
Audit GitHub Actions run health by trigger event and workflow so flaky or noisy automation sources are easy to prioritize.
github-actions-cancel-waste-audit
Audit cancelled and timed-out GitHub Actions runs from JSON exports to surface wasted CI minutes and noisy workflows.
github-actions-run-gap-audit
Detect GitHub Actions workflow groups that stopped running on their normal cadence using median run intervals and current inactivity gap.