ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

obliteratus-abliteration

One-click model liberation toolkit for removing refusal behaviors from LLMs via surgical abliteration techniques

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/adisinghstudent/obliteratus-abliteration
Or

OBLITERATUS — LLM Abliteration Toolkit

Skill by ara.so — Daily 2026 Skills collection.

OBLITERATUS is an open-source toolkit for identifying and surgically removing refusal behaviors from large language models using mechanistic interpretability techniques (abliteration). It locates refusal directions in a model's hidden states via SVD/PCA, projects them out of the weights, and preserves core language capabilities. Ships with a Gradio UI, CLI, Python API, and Colab notebook.


Installation

# Core install
pip install obliteratus

# With Gradio UI support
pip install "obliteratus[spaces]"

# With all optional analysis modules
pip install "obliteratus[full]"

# From source (latest)
git clone https://github.com/elder-plinius/OBLITERATUS
cd OBLITERATUS
pip install -e ".[full]"

Requirements:

  • Python 3.10+
  • PyTorch 2.1+ with CUDA (recommended) or CPU
  • transformers, accelerate, gradio>=5.29.0
  • HuggingFace account + token for gated models
export HF_TOKEN=your_hf_token_here
huggingface-cli login

CLI — Key Commands

# Basic obliteration (default method)
obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct

# Advanced method (whitened SVD + bias projection + iterative refinement)
obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct --method advanced

# Analysis-informed pipeline (auto-configures from geometry analysis)
obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct --method informed

# Specify output directory and push to Hub
obliteratus obliterate mistralai/Mistral-7B-Instruct-v0.3 \
  --method advanced \
  --output ./my-liberated-model \
  --push-to-hub your-username/mistral-7b-liberated

# LoRA-based reversible ablation (non-destructive)
obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct \
  --method lora \
  --lora-rank 1

# Strength sweep — find the capability/compliance tradeoff
obliteratus sweep meta-llama/Llama-3.1-8B-Instruct \
  --strengths 0.2,0.4,0.6,0.8,1.0

# Run analysis modules only (no modification)
obliteratus analyze meta-llama/Llama-3.1-8B-Instruct \
  --modules concept_cone,alignment_imprint,universality

# Benchmark: compare methods on a model
obliteratus benchmark meta-llama/Llama-3.1-8B-Instruct \
  --methods basic,advanced,informed

# Launch local Gradio UI
obliteratus ui
obliteratus ui --port 8080 --share
obliteratus ui --no-telemetry

Python API

Basic obliteration

from obliteratus import Obliterator

# Initialize with a HuggingFace model ID or local path
obl = Obliterator("meta-llama/Llama-3.1-8B-Instruct")

# Run the full pipeline: SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH
result = obl.obliterate(method="advanced")

print(result.perplexity_delta)    # capability preservation metric
print(result.refusal_rate_delta)  # refusal reduction
print(result.output_path)         # where the model was saved

Step-by-step pipeline

Metadata

Stars3809
Views0
Updated2026-04-05
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-adisinghstudent-obliteratus-abliteration": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.