voice-note-to-midi
Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing
Why use this skill?
Transform your humming, voice memos, and melodic audio into quantized MIDI files for your DAW using ML-powered pitch detection and intelligent post-processing.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/danbennettuk/voice-note-to-midiWhat This Skill Does
The voice-note-to-midi skill is a sophisticated audio processing pipeline designed to bridge the gap between human musical expression and digital music production. By leveraging advanced machine learning, specifically Spotify's 'Basic Pitch' model, this skill converts raw vocal recordings, humming, or melodic audio into structured MIDI data. The process begins with harmonic-percussive source separation (HPSS) to clean up your input, isolating the melodic essence from background noise and transient percussive elements. Once the melody is isolated, the ML model performs precise pitch detection. The skill then applies a layer of intelligent post-processing, including Krumhansl-Kessler based key detection and automatic quantization, which snaps notes to a musical grid. Further refinements such as octave pruning, legato note merging, and velocity normalization ensure the resulting MIDI file is not just accurate, but musical and ready for drag-and-drop integration into your Digital Audio Workstation (DAW).
Installation
To install, ensure your system has Python 3.11+ and FFmpeg installed. The easiest way to get started is to use the OpenClaw skill manager: clawhub install openclaw/skills/skills/danbennettuk/voice-note-to-midi. Alternatively, you can follow the manual path by cloning the source repository into ~/melody-pipeline, creating a virtual environment, and installing the required Python dependencies including basic-pitch, librosa, and music21. Once installed, adding the directory to your PATH allows you to invoke the pipeline directly from your terminal or via the OpenClaw agent.
Use Cases
- Songwriting: Capture fleeting melodic ideas while on the go and turn them into MIDI to keep as project files.
- Transcribing: Quickly convert a recorded vocal melody into notation or MIDI for analysis.
- Workflow Acceleration: Eliminate the manual labor of MIDI programming by 'singing' your synth lines, basslines, or vocal leads directly into your DAW.
- Musical Ideation: Experiment with voice-led compositions that can be re-synthesized using virtual instruments.
Example Prompts
- "Convert my voice memo 'idea_01.mp3' into a MIDI file and quantize it to 16th notes."
- "Process the recording of me humming this bassline and output a MIDI file named 'new_bassline.mid'."
- "Take the latest audio file from my recordings folder, extract the melody, and snap it to C Major."
Tips & Limitations
For best results, record in a quiet environment with minimal background interference. While the HPSS separation is powerful, extremely noisy environments may introduce artifacts. Ensure your singing is relatively rhythmic for the best quantization results. Note that the skill is optimized for monophonic melodies; highly complex polyphonic or chordal voice notes may experience diminished accuracy during the pitch detection phase.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-danbennettuk-voice-note-to-midi": {
"enabled": true,
"auto_update": true
}
}
}Tags
Flags: file-write, file-read, code-execution
Related Skills
play-guitar-fretboard
玩转吉他指板 - 快速跳转到吉他指板学习资源网站
podcast-agent
Search articles on any topic, generate a two-host dialogue script, and synthesize podcast audio via TTS. Turn long reads into listenable content.
ym-mediatoolkit
流式视频处理工具集 - 压缩、封面提取、音频转换,无需下载完整视频
ressemble
Text-to-Speech and Speech-to-Text integration using Resemble AI HTTP API.
audio-transcriber
Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration