sergei-mikhailov-stt
Speech recognition from voice messages using Yandex SpeechKit (with an extensible architecture for other providers). Use when you need to convert a voice message to text.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/bzsega/sergei-mikhailov-sttWhat This Skill Does
The sergei-mikhailov-stt skill is a powerful speech-to-text integration designed for the OpenClaw ecosystem. Its primary function is to intercept incoming voice messages from connected messengers and accurately transcribe them into readable text. Built with an extensible architecture, it defaults to the robust Yandex SpeechKit API but allows for modular expansion to other providers as needed. The skill manages the end-to-end lifecycle of an audio file, from local verification and format validation to format conversion via ffmpeg and eventual transcription.
Installation
To get started, first ensure your system has the necessary prerequisites, specifically the python3-venv package. Run 'clawhub install sergei-mikhailov-stt' to download the skill. Once installed, navigate to the workspace directory (~/.openclaw/workspace/skills/sergei-mikhailov-stt) and execute the 'bash setup.sh' script. This will automate the creation of a virtual environment, manage Python dependencies, and prepare the configuration files. After completing the setup, perform a validation check by running 'bash check.sh'. This diagnostic script ensures your environment, permissions, and API keys are configured correctly before live usage.
Use Cases
This skill is perfect for users who frequently receive voice notes but prefer text for better accessibility and searchability. It is ideal for hands-free environments where listening to audio is disruptive, for archiving voice conversations in searchable text formats, or for summarizing long voice messages quickly using downstream AI analysis. It bridges the gap between asynchronous audio communication and structured data management within OpenClaw.
Example Prompts
- "Transcribe the voice message I just received in my last Telegram chat."
- "Convert the audio file located at /home/user/.openclaw/media/inbound/note_001.ogg to text."
- "Show me the text content of the latest voice memo from my professional channel."
Tips & Limitations
To ensure optimal performance, verify that your audio files remain under the 1MB limit for the Yandex SpeechKit v1 sync API. While the skill supports multiple formats like OGG, WAV, and MP3, keeping inputs in high-quality, clear speech formats yields the best recognition confidence. Always prioritize security by never sharing your API keys; perform any configuration updates manually in the protected .env or config files. If recognition quality drops, ensure your environment's ffmpeg installation is up-to-date, as this is critical for pre-processing non-standard audio files. Finally, remember that the skill uses absolute paths for execution to maintain security compliance with OpenClaw's approval prompts.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-bzsega-sergei-mikhailov-stt": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, external-api, code-execution
Related Skills
smtools-image-generation
Generate images from text prompts using AI models via OpenRouter, Kie.ai, or YandexART. Use when the user asks to generate, create, draw, or illustrate an image.
sergei-mikhailov-tg-channel-reader
Read posts and comments from Telegram channels via MTProto (Pyrogram or Telethon). Fetch recent messages and discussion replies from public or private channels by time window.