ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

sergei-mikhailov-stt

Speech recognition from voice messages using Yandex SpeechKit (with an extensible architecture for other providers). Use when you need to convert a voice message to text.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/bzsega/sergei-mikhailov-stt
Or

What This Skill Does

The sergei-mikhailov-stt skill is a powerful speech-to-text integration designed for the OpenClaw ecosystem. Its primary function is to intercept incoming voice messages from connected messengers and accurately transcribe them into readable text. Built with an extensible architecture, it defaults to the robust Yandex SpeechKit API but allows for modular expansion to other providers as needed. The skill manages the end-to-end lifecycle of an audio file, from local verification and format validation to format conversion via ffmpeg and eventual transcription.

Installation

To get started, first ensure your system has the necessary prerequisites, specifically the python3-venv package. Run 'clawhub install sergei-mikhailov-stt' to download the skill. Once installed, navigate to the workspace directory (~/.openclaw/workspace/skills/sergei-mikhailov-stt) and execute the 'bash setup.sh' script. This will automate the creation of a virtual environment, manage Python dependencies, and prepare the configuration files. After completing the setup, perform a validation check by running 'bash check.sh'. This diagnostic script ensures your environment, permissions, and API keys are configured correctly before live usage.

Use Cases

This skill is perfect for users who frequently receive voice notes but prefer text for better accessibility and searchability. It is ideal for hands-free environments where listening to audio is disruptive, for archiving voice conversations in searchable text formats, or for summarizing long voice messages quickly using downstream AI analysis. It bridges the gap between asynchronous audio communication and structured data management within OpenClaw.

Example Prompts

  1. "Transcribe the voice message I just received in my last Telegram chat."
  2. "Convert the audio file located at /home/user/.openclaw/media/inbound/note_001.ogg to text."
  3. "Show me the text content of the latest voice memo from my professional channel."

Tips & Limitations

To ensure optimal performance, verify that your audio files remain under the 1MB limit for the Yandex SpeechKit v1 sync API. While the skill supports multiple formats like OGG, WAV, and MP3, keeping inputs in high-quality, clear speech formats yields the best recognition confidence. Always prioritize security by never sharing your API keys; perform any configuration updates manually in the protected .env or config files. If recognition quality drops, ensure your environment's ffmpeg installation is up-to-date, as this is critical for pre-processing non-standard audio files. Finally, remember that the skill uses absolute paths for execution to maintain security compliance with OpenClaw's approval prompts.

Metadata

Author@bzsega
Stars4097
Views0
Updated2026-04-14
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-bzsega-sergei-mikhailov-stt": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#speech-to-text#audio-transcription#yandex#messenger#voice-processing
Safety Score: 4/5

Flags: file-read, external-api, code-execution