name: skill-sanitizer description: "First open-source AI sanitizer with local semantic detection. 7 layers + code block awareness + LLM intent analysis. Catches prompt injection, reverse shells, memory tampering, encoding evasion, trust abuse. 85% fewer false positives in v2.1. Zero cloud — your prompts stay on your machine." user-invocable: true metadata: openclaw: emoji: "🧤" homepage: "https://github.com/cyberxuan-XBX/skill-sanitizer"

Skill Sanitizer

The first open-source AI sanitizer with local semantic detection.

Commercial AI security tools exist — they all require sending your prompts to their cloud. Your antivirus shouldn't need antivirus.

This sanitizer scans any SKILL.md content before it reaches your LLM. 7 detection layers + optional LLM semantic judgment. Zero dependencies. Zero cloud calls. Your data never leaves your machine.

Why You Need This

SKILL.md files are prompts written for AI to execute
Attackers hide ignore previous instructions in "helpful" skills
Base64-encoded reverse shells look like normal text
Names like safe-defender can contain eval(user_input)
Your agent doesn't know it's being attacked — it just obeys

The 7 Layers

Layer	What It Catches	Severity
1. Kill-String	Known platform-level credential patterns (API keys, tokens)	CRITICAL
2. Prompt Injection	`ignore previous instructions`, role hijacking, system prompt override	HIGH-CRITICAL
3. Suspicious Bash	`rm -rf /`, reverse shells, pipe-to-shell, cron modification	MEDIUM-CRITICAL
4. Memory Tampering	Attempts to write to MEMORY.md, SOUL.md, CLAUDE.md, .env files	CRITICAL
5. Context Pollution	Attack patterns disguised as "examples" or "test cases"	MEDIUM-HIGH
6. Trust Abuse	Skill named `safe-` or `secure-` but contains `eval()`, `rm -rf`, `chmod 777`	HIGH
7. Encoding Evasion	Unicode homoglyphs, base64-encoded payloads, synonym-based instruction override	HIGH

Usage

In Python

from skill_sanitizer import sanitize_skill

# Before feeding any skill content to your LLM:
result = sanitize_skill(skill_content, "skill-name")

if result["risk_level"] in ("HIGH", "CRITICAL"):
    print(f"BLOCKED: {result['risk_level']} (score={result['risk_score']})")
    for f in result["findings"]:
        print(f"  [{f['severity']}] {f.get('pattern', f.get('layer', '?'))}")
else:
    # Safe to process
    clean_content = result["content"]
    # feed clean_content to your LLM...

In Claude Code (as a pre-check)

# Before installing or inspecting any skill:
python3 {baseDir}/skill_sanitizer.py scan "skill-name" < skill_content.md

CLI

# Scan a file
python3 skill_sanitizer.py scan skill-name < SKILL.md

# Run built-in test suite (10 attack vectors)
python3 skill_sanitizer.py test

# Show stats
python3 skill_sanitizer.py stats

Skill Sanitizer

Install via CLI (Recommended)

Skill Sanitizer

Why You Need This

The 7 Layers

Usage

In Python

In Claude Code (as a pre-check)

CLI

Risk Levels

Metadata