Official Verified media Safety 4/5

voice-note-to-midi

Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing

Why use this skill?

Transform your humming, voice memos, and melodic audio into quantized MIDI files for your DAW using ML-powered pitch detection and intelligent post-processing.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/danbennettuk/voice-note-to-midi

Download Source Code (.zip)

What This Skill Does

The voice-note-to-midi skill is a sophisticated audio processing pipeline designed to bridge the gap between human musical expression and digital music production. By leveraging advanced machine learning, specifically Spotify's 'Basic Pitch' model, this skill converts raw vocal recordings, humming, or melodic audio into structured MIDI data. The process begins with harmonic-percussive source separation (HPSS) to clean up your input, isolating the melodic essence from background noise and transient percussive elements. Once the melody is isolated, the ML model performs precise pitch detection. The skill then applies a layer of intelligent post-processing, including Krumhansl-Kessler based key detection and automatic quantization, which snaps notes to a musical grid. Further refinements such as octave pruning, legato note merging, and velocity normalization ensure the resulting MIDI file is not just accurate, but musical and ready for drag-and-drop integration into your Digital Audio Workstation (DAW).

Installation

To install, ensure your system has Python 3.11+ and FFmpeg installed. The easiest way to get started is to use the OpenClaw skill manager: clawhub install openclaw/skills/skills/danbennettuk/voice-note-to-midi. Alternatively, you can follow the manual path by cloning the source repository into ~/melody-pipeline, creating a virtual environment, and installing the required Python dependencies including basic-pitch, librosa, and music21. Once installed, adding the directory to your PATH allows you to invoke the pipeline directly from your terminal or via the OpenClaw agent.

Use Cases

Songwriting: Capture fleeting melodic ideas while on the go and turn them into MIDI to keep as project files.
Transcribing: Quickly convert a recorded vocal melody into notation or MIDI for analysis.
Workflow Acceleration: Eliminate the manual labor of MIDI programming by 'singing' your synth lines, basslines, or vocal leads directly into your DAW.
Musical Ideation: Experiment with voice-led compositions that can be re-synthesized using virtual instruments.

Example Prompts

"Convert my voice memo 'idea_01.mp3' into a MIDI file and quantize it to 16th notes."
"Process the recording of me humming this bassline and output a MIDI file named 'new_bassline.mid'."
"Take the latest audio file from my recordings folder, extract the melody, and snap it to C Major."

Tips & Limitations

For best results, record in a quiet environment with minimal background interference. While the HPSS separation is powerful, extremely noisy environments may introduce artifacts. Ensure your singing is relatively rhythmic for the best quantization results. Note that the skill is optimized for monophonic melodies; highly complex polyphonic or chordal voice notes may experience diminished accuracy during the pitch detection phase.

Read Full Documentation on GitHub

Metadata

Author@danbennettuk

Stars3376

Updated2026-03-24

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-danbennettuk-voice-note-to-midi": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Related Skills

play-guitar-fretboard

玩转吉他指板 - 快速跳转到吉他指板学习资源网站

applesun8799 4473

podcast-agent

Search articles on any topic, generate a two-host dialogue script, and synthesize podcast audio via TTS. Turn long reads into listenable content.

besty0121 4473

ym-mediatoolkit

流式视频处理工具集 - 压缩、封面提取、音频转换，无需下载完整视频

370299455cx-web 4473

ressemble

Text-to-Speech and Speech-to-Text integration using Resemble AI HTTP API.

adriano-vr 4473

audio-transcriber

Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration

bingze00000 4473