ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

audio-speaker-tools

Speaker separation, voice comparison, and audio processing tools. Use when working with multi-speaker audio, voice cloning, or speaker verification tasks including: (1) separating speakers from audio files via Demucs and pyannote diarization, (2) comparing voice samples for speaker verification or voice clone quality assessment using Resemblyzer, (3) extracting audio segments, (4) preparing samples for ElevenLabs voice cloning, or (5) validating speaker diarization results.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/cmfinlan/audio-speaker-tools
Or

Audio Speaker Tools

Tools for speaker separation, voice comparison, and audio processing using Demucs, pyannote, and Resemblyzer.

Overview

This skill provides three main workflows:

  1. Speaker separation - Extract per-speaker audio from multi-speaker recordings
  2. Voice comparison - Measure speaker similarity between two audio files
  3. Audio processing - Segment extraction and voice isolation

Prerequisites

Setup Virtual Environment

Run once to create the venv and install dependencies:

bash scripts/setup_venv.sh

Default venv location: ./.venv

Requirements:

  • Python 3.9+
  • ffmpeg (brew install ffmpeg)
  • HuggingFace token (set as env var HF_TOKEN)

Scripts

1. Speaker Separation: diarize_and_slice_mps.py

Separate speakers from multi-speaker audio:

# Basic usage
HF_TOKEN=<your-hf-token> \
  /path/to/venv/bin/python scripts/diarize_and_slice_mps.py \
  --input audio.mp3 \
  --outdir /path/to/output \
  --prefix MyShow

# With speaker constraints
HF_TOKEN=$TOKEN python scripts/diarize_and_slice_mps.py \
  --input audio.mp3 \
  --outdir ./out \
  --min-speakers 2 \
  --max-speakers 5 \
  --pad-ms 100

Process:

  1. Converts input to 16kHz mono WAV
  2. Runs Demucs vocal/background separation (optional, for cleaner input)
  3. Runs pyannote speaker diarization (MPS-accelerated)
  4. Extracts concatenated per-speaker WAV files

Output:

  • <prefix>_speaker1.wav, <prefix>_speaker2.wav, etc. (one per detected speaker)
  • diarization.rttm (time-stamped speaker segments)
  • segments.jsonl (JSON segments metadata)
  • meta.json (pipeline info and speaker index)

Important:

  • Always pass HF token via HF_TOKEN env var, never as CLI arg
  • MPS first, CPU fallback - Script prefers Metal GPU, falls back to CPU if unavailable
  • Default output: ./separated/

2. Voice Comparison: compare_voices.py

Measure similarity between two voice samples using Resemblyzer:

# Basic comparison
python scripts/compare_voices.py \
  --audio1 sample1.wav \
  --audio2 sample2.wav

# JSON output
python scripts/compare_voices.py \
  --audio1 reference.wav \
  --audio2 clone.wav \
  --threshold 0.85 \
  --json

# Exit code = 0 if pass, 1 if fail

Scores:

  • < 0.75 = Different speakers
  • 0.75-0.84 = Likely same speaker
  • 0.85+ = Excellent match (ideal for voice cloning validation)

Use cases:

  • Voice clone quality assessment (compare clone vs. original)
  • Speaker verification (authenticate speaker identity)
  • Validate speaker separation (confirm separated speakers are distinct)

See: references/scoring-guide.md for detailed interpretation

3. Audio Trimming

Use ffmpeg directly for segment extraction:

# Extract 10-second segment starting at 5 seconds
ffmpeg -i input.mp3 -ss 5 -t 10 -c copy output.mp3

# Extract vocals only with Demucs (before diarization)
demucs --two-stems vocals --out ./separated input.mp3

Workflows

Metadata

Author@cmfinlan
Stars3453
Views0
Updated2026-03-26
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-cmfinlan-audio-speaker-tools": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.