Official Verified developer tools Safety 5/5

guard

Deep AI safety guardrails workflow—policy definition, input/output filtering, monitoring, escalation, and false-positive handling. Use when reducing harmful outputs, misuse, or policy violations in LLM products.

Why use this skill?

Implement robust AI safety guardrails with OpenClaw. Manage policy, threat modeling, and input/output filtering to ensure secure, compliant, and reliable LLM applications.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/clawkk/guard

Download Source Code (.zip)

What This Skill Does

The guard skill provides a rigorous, multi-stage framework for implementing AI safety and governance within LLM applications. It moves beyond simple keyword filtering by providing a structured six-stage pipeline that covers policy definition, threat modeling, control stack design, implementation, monitoring, and iteration. This skill is designed to translate abstract legal and product requirements into enforceable, reproducible technical behaviors, such as input/output filtering, automated refusals, and human-in-the-loop review. By using this skill, developers can effectively mitigate risks like jailbreak attempts, prompt injection, PII leakage, and non-compliant content generation.

Installation

To integrate this safety framework into your OpenClaw environment, execute the following command in your terminal:

clawhub install openclaw/skills/skills/clawkk/guard

Ensure your project repository is initialized with OpenClaw prior to installation to manage dependencies and versioning effectively.

Use Cases

Consumer-Facing Chatbots: Implementing strict content moderation to prevent hate speech, sexual content, or harmful advice in public-facing interfaces.
Enterprise Data Agents: Configuring defense-in-depth for internal bots, focusing specifically on data exfiltration, connector access restrictions, and PII masking.
Regulated Industry Compliance: Automating required disclaimers and refusal logic for medical, financial, or legal advice bots that must adhere to strict regional compliance standards.
Public LLM API Wrappers: Adding a safety layer to third-party model outputs to ensure that model hallucinations or policy violations are caught before reaching the end user.

Example Prompts

"Analyze our current chatbot's vulnerabilities to prompt injection and suggest a list of input screening controls to implement using the guard skill."
"Help me draft a policy scope document for a health-tech AI assistant, ensuring we cover compliance for HIPAA and standard medical disclaimer requirements."
"Set up a dashboard monitoring strategy to track false-positive rates for our moderation filters across three different geographic regions."

Tips & Limitations

Defense in Depth: Never rely on a single classifier. Always combine input screening with output monitoring and tool-calling sandboxes to ensure comprehensive safety.
Latency Trade-offs: High-security filtering can introduce latency. Always define your latency budget early in the design phase to avoid degradation of user experience.
Human Review: The most effective systems include a human-in-the-loop component. Use the monitoring and appeals stage to refine your policies based on actual borderline cases.
Silent Failures: Avoid silent failures. Ensure every block or rewrite action is logged with telemetry so you can investigate and iterate on your policy triggers.

Read Full Documentation on GitHub

Metadata

Author@clawkk

Stars3535

Updated2026-03-28

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-clawkk-guard": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#ai-safety#governance#moderation#compliance#guardrails

Safety Score: 5/5

Related Skills

data-move

Deep data migration workflow—scope, mapping, validation, batching and ordering, dual-write and cutover, rollback, and reconciliation. Use when moving tenants, bulk backfills, or changing stores without losing trust in data correctness.

clawkk 3535

data-model

Deep data modeling workflow—grain, facts and dimensions, keys, slowly changing dimensions, normalization trade-offs, and analytics query patterns. Use when designing warehouse/analytics models or reviewing star/snowflake schemas.

clawkk 3535

prompts

Deep prompt engineering workflow—task spec, constraints, examples, evaluation sets, iteration protocol, regression testing, and safety alignment. Use when improving LLM outputs, shipping prompt changes, or building reusable prompt templates.

clawkk 3535

客诉处理

提供客诉处理的可落地指南与SOP。在开展客诉处理相关工作时调用。

clawkk 3535

cost-opt

Cloud cost review: rightsizing, reservations, waste. Use when reducing infra spend.

clawkk 3535