voice-note-to-midi
Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/danbennettuk/voice-note-to-midiπ΅ Voice Note to MIDI
Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.
What It Does
This skill provides a complete audio-to-MIDI conversion pipeline that:
- Stem Separation - Uses HPSS (Harmonic-Percussive Source Separation) to isolate melodic content from drums, noise, and background sounds
- ML-Powered Pitch Detection - Leverages Spotify's Basic Pitch model for accurate fundamental frequency extraction
- Key Detection - Automatically detects the musical key of your recording using Krumhansl-Kessler key profiles
- Intelligent Quantization - Snaps notes to a configurable timing grid with optional key-aware pitch correction
- Post-Processing - Applies octave pruning, overlap-based harmonic removal, and legato note merging for clean output
Pipeline Architecture
Audio Input (WAV/M4A/MP3)
β
βββββββββββββββββββββββββββββββββββββββ
β Step 1: Stem Separation (HPSS) β
β - Isolate harmonic content β
β - Remove drums/percussion β
β - Noise gating β
βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β Step 2: Pitch Detection β
β - Basic Pitch ML model (Spotify) β
β - Polyphonic note detection β
β - Onset/offset estimation β
βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β Step 3: Analysis β
β - Pitch class distribution β
β - Key detection β
β - Dominant note identification β
βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β Step 4: Quantization & Cleanup β
β - Timing grid snap β
β - Key-aware pitch correction β
β - Octave pruning (harmonic removal) β
β - Overlap-based pruning β
β - Note merging (legato) β
β - Velocity normalization β
βββββββββββββββββββββββββββββββββββββββ
β
MIDI Output (Standard MIDI File)
Setup
Prerequisites
- Python 3.11+ (Python 3.14+ recommended)
- FFmpeg (for audio format support)
- pip
Installation
Quick Install (Recommended):
cd /path/to/voice-note-to-midi
./setup.sh
This automated script will:
- Check Python 3.11+ is installed
- Create the
~/melody-pipelinedirectory - Set up the virtual environment
- Install all dependencies (basic-pitch, librosa, music21, etc.)
- Download and configure the hum2midi script
- Add melody-pipeline to your PATH
Manual Install:
If you prefer manual setup:
mkdir -p ~/melody-pipeline
cd ~/melody-pipeline
python3 -m venv venv-bp
source venv-bp/bin/activate
pip install basic-pitch librosa soundfile mido music21
chmod +x ~/melody-pipeline/hum2midi
- Add to your PATH (optional):
echo 'export PATH="$HOME/melody-pipeline:$PATH"' >> ~/.bashrc
source ~/.bashrc
Verify Installation
Metadata
Not sure this is the right skill?
Describe what you want to build β we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-danbennettuk-voice-note-to-midi": {
"enabled": true,
"auto_update": true
}
}
}Tags
Related Skills
voice-ai-tts
High-quality voice synthesis with 9 personas, 11 languages, and streaming using Voice.ai API.
youtube-summarizer
Automatically fetch YouTube video transcripts, generate structured summaries, and send full transcripts to messaging platforms. Detects YouTube URLs and provides metadata, key insights, and downloadable transcripts.
ressemble
Text-to-Speech and Speech-to-Text integration using Resemble AI HTTP API.
voice-ai-tts
High-quality voice synthesis with 9 personas, 11 languages, and streaming using Voice.ai API.
vidu-video
δ½Ώη¨ Vidu Q3 Pro 樑εηζθ§ι’γε½η¨ζ·ζ³θ¦ζηθ§ι’γηζεΈ¦ι³ι’ηθ§ι’οΌζζε° vidu ζΆδ½Ώη¨ζ€ skillγ