obliteratus-abliteration
One-click model liberation toolkit for removing refusal behaviors from LLMs via surgical abliteration techniques
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/adisinghstudent/obliteratus-abliterationOBLITERATUS — LLM Abliteration Toolkit
Skill by ara.so — Daily 2026 Skills collection.
OBLITERATUS is an open-source toolkit for identifying and surgically removing refusal behaviors from large language models using mechanistic interpretability techniques (abliteration). It locates refusal directions in a model's hidden states via SVD/PCA, projects them out of the weights, and preserves core language capabilities. Ships with a Gradio UI, CLI, Python API, and Colab notebook.
Installation
# Core install
pip install obliteratus
# With Gradio UI support
pip install "obliteratus[spaces]"
# With all optional analysis modules
pip install "obliteratus[full]"
# From source (latest)
git clone https://github.com/elder-plinius/OBLITERATUS
cd OBLITERATUS
pip install -e ".[full]"
Requirements:
- Python 3.10+
- PyTorch 2.1+ with CUDA (recommended) or CPU
transformers,accelerate,gradio>=5.29.0- HuggingFace account + token for gated models
export HF_TOKEN=your_hf_token_here
huggingface-cli login
CLI — Key Commands
# Basic obliteration (default method)
obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct
# Advanced method (whitened SVD + bias projection + iterative refinement)
obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct --method advanced
# Analysis-informed pipeline (auto-configures from geometry analysis)
obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct --method informed
# Specify output directory and push to Hub
obliteratus obliterate mistralai/Mistral-7B-Instruct-v0.3 \
--method advanced \
--output ./my-liberated-model \
--push-to-hub your-username/mistral-7b-liberated
# LoRA-based reversible ablation (non-destructive)
obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct \
--method lora \
--lora-rank 1
# Strength sweep — find the capability/compliance tradeoff
obliteratus sweep meta-llama/Llama-3.1-8B-Instruct \
--strengths 0.2,0.4,0.6,0.8,1.0
# Run analysis modules only (no modification)
obliteratus analyze meta-llama/Llama-3.1-8B-Instruct \
--modules concept_cone,alignment_imprint,universality
# Benchmark: compare methods on a model
obliteratus benchmark meta-llama/Llama-3.1-8B-Instruct \
--methods basic,advanced,informed
# Launch local Gradio UI
obliteratus ui
obliteratus ui --port 8080 --share
obliteratus ui --no-telemetry
Python API
Basic obliteration
from obliteratus import Obliterator
# Initialize with a HuggingFace model ID or local path
obl = Obliterator("meta-llama/Llama-3.1-8B-Instruct")
# Run the full pipeline: SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH
result = obl.obliterate(method="advanced")
print(result.perplexity_delta) # capability preservation metric
print(result.refusal_rate_delta) # refusal reduction
print(result.output_path) # where the model was saved
Step-by-step pipeline
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-adisinghstudent-obliteratus-abliteration": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
Oh My Openagent Omo
Skill by adisinghstudent
Planning With Files Manus Workflow
Skill by adisinghstudent
mirofish-offline-simulation
Fully local multi-agent swarm intelligence simulation engine using Neo4j + Ollama for public opinion, market sentiment, and social dynamics prediction.
ghostling-libghostty-terminal
Build minimal terminal emulators using the libghostty-vt C API with Raylib for windowing and rendering
Obra Superpowers Agentic Workflow
Skill by adisinghstudent