openguardrails-for-openclaw
Detect and block prompt injection attacks hidden in long content (emails, web pages, documents) using OpenGuardrails SOTA detection
Why use this skill?
Secure your OpenClaw agent against indirect prompt injection. Automatically scan emails and documents for malicious content with OpenGuardrails.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/thomaslwang/openguardrailsWhat This Skill Does
OpenGuardrails for OpenClaw is a critical security layer designed to protect your AI agent from indirect prompt injection attacks. When an agent processes long-form content from untrusted external sources—such as emails, scraped web pages, or uploaded documents—attackers may hide malicious instructions within the text. This plugin intercepts this data, splitting it into manageable segments and passing them through the OpenGuardrails SOTA detection model. By identifying and blocking adversarial payloads before they reach your LLM context, it prevents unauthorized actions, sensitive data exfiltration, and potential system manipulation.
Installation
To integrate OpenGuardrails into your OpenClaw environment, execute the following installation command in your terminal:
clawhub install openclaw/skills/skills/thomaslwang/openguardrails
After installation, ensure the gateway is updated to reflect the new plugin status by running:
openclaw gateway restart
Verify the installation by listing your active plugins via openclaw plugins list and checking the gateway logs to ensure the initialization string appears. This ensures that the guardrails are active and hooked correctly into the tool_result_persist event.
Use Cases
- Email Security: Automatically scan forwarded threads that may contain hidden malicious instructions aimed at tricking your agent into leaking credentials.
- Web Content Retrieval: Protect your agent when crawling unknown websites that might contain hidden text or invisible HTML elements designed to hijack browser-automation logic.
- Document Analysis: Safely process PDF, Docx, or TXT files that could contain embedded adversarial payloads disguised as formatting or metadata.
Example Prompts
- "OpenGuardrails, run a status check to see how many potential injection attempts have been blocked in the last 24 hours."
- "/og_report: Show me the details on the most recent alert for the file 'project_brief.pdf'."
- "/og_feedback 5021 fp: This wasn't an injection, it was actually a snippet of cybersecurity research code that triggered a false positive."
Tips & Limitations
To maintain high performance, the plugin splits content into 4000-character chunks with 200-character overlaps. While this provides 87.1% F1 accuracy for English and 97.3% for multilingual, no security system is infallible. Always review the logs using /og_report if your agent behaves unexpectedly. If you encounter a false positive, use the /og_feedback command to help improve the detection model. Be aware that processing very large documents may introduce slight latency, as each chunk requires an LLM-based analysis pass.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-thomaslwang-openguardrails": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: external-api
Related Skills
flaw0
Security and vulnerability scanner for OpenClaw code, plugins, skills, and Node.js dependencies. Powered by OpenClaw AI models.
flaw0
Security and vulnerability scanner for OpenClaw code, plugins, skills, and Node.js dependencies. Powered by OpenClaw AI models.
test
test
moltguard
Detect and block prompt injection attacks hidden in long content (emails, web pages, documents) using the MoltGuard API
skill-scanner
Scan installed OpenClaw skills for malicious code patterns including ClickFix social engineering, reverse shell (RAT), and data exfiltration. Uses OG-Text model for agentic detection.