ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified utilities Safety 5/5

openguardrails-for-openclaw

Detect and block prompt injection attacks hidden in long content (emails, web pages, documents) using OpenGuardrails SOTA detection

Why use this skill?

Secure your OpenClaw agent against indirect prompt injection. Automatically scan emails and documents for malicious content with OpenGuardrails.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/thomaslwang/openguardrails
Or

What This Skill Does

OpenGuardrails for OpenClaw is a critical security layer designed to protect your AI agent from indirect prompt injection attacks. When an agent processes long-form content from untrusted external sources—such as emails, scraped web pages, or uploaded documents—attackers may hide malicious instructions within the text. This plugin intercepts this data, splitting it into manageable segments and passing them through the OpenGuardrails SOTA detection model. By identifying and blocking adversarial payloads before they reach your LLM context, it prevents unauthorized actions, sensitive data exfiltration, and potential system manipulation.

Installation

To integrate OpenGuardrails into your OpenClaw environment, execute the following installation command in your terminal:

clawhub install openclaw/skills/skills/thomaslwang/openguardrails

After installation, ensure the gateway is updated to reflect the new plugin status by running:

openclaw gateway restart

Verify the installation by listing your active plugins via openclaw plugins list and checking the gateway logs to ensure the initialization string appears. This ensures that the guardrails are active and hooked correctly into the tool_result_persist event.

Use Cases

  • Email Security: Automatically scan forwarded threads that may contain hidden malicious instructions aimed at tricking your agent into leaking credentials.
  • Web Content Retrieval: Protect your agent when crawling unknown websites that might contain hidden text or invisible HTML elements designed to hijack browser-automation logic.
  • Document Analysis: Safely process PDF, Docx, or TXT files that could contain embedded adversarial payloads disguised as formatting or metadata.

Example Prompts

  1. "OpenGuardrails, run a status check to see how many potential injection attempts have been blocked in the last 24 hours."
  2. "/og_report: Show me the details on the most recent alert for the file 'project_brief.pdf'."
  3. "/og_feedback 5021 fp: This wasn't an injection, it was actually a snippet of cybersecurity research code that triggered a false positive."

Tips & Limitations

To maintain high performance, the plugin splits content into 4000-character chunks with 200-character overlaps. While this provides 87.1% F1 accuracy for English and 97.3% for multilingual, no security system is infallible. Always review the logs using /og_report if your agent behaves unexpectedly. If you encounter a false positive, use the /og_feedback command to help improve the detection model. Be aware that processing very large documents may introduce slight latency, as each chunk requires an LLM-based analysis pass.

Metadata

Stars946
Views0
Updated2026-02-13
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-thomaslwang-openguardrails": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#security#ai-safety#prompt-injection#cybersecurity#openclaw
Safety Score: 5/5

Flags: external-api