video-stt
Extract audio from video URLs and transcribe using STT (Speech-to-Text). Supports local Whisper or cloud APIs. Use when: user provides a video URL and wants to know what is being said, transcribing YouTube videos, podcasts, or any video with audio.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/damiencronw/video-sttVideo STT Skill
从视频 URL 提取音频并转换为文字 (Speech-to-Text)
环境要求
- yt-dlp - 下载视频/音频
- ffmpeg - 提取音频
- Python - 使用 uv 虚拟环境
快速开始
# 进入脚本目录
cd ~/.openclaw/workspace/skills/video-stt/scripts
# 运行转录
bash stt.sh "视频URL"
使用方法
# 基本用法
bash stt.sh "https://youtube.com/watch?v=xxx"
# 指定输出文件
bash stt.sh "https://youtube.com/watch?v=xxx" -o output.txt
# 使用本地 Whisper 模型
bash stt.sh "https://youtube.com/watch?v=xxx" --local
# 使用云端 API
bash stt.sh "https://youtube.com/watch?v=xxx" --api openai
支持的模型
本地 (免费)
- tiny - 最快,质量一般
- base - 平衡
- small - 较好
- medium - 很好
- large - 最佳(需要更多内存)
云端 API
- OpenAI Whisper API
- Azure Speech
- Google Speech
输出格式
默认输出纯文本,可选:
.txt- 纯文本.srt- 字幕格式.vtt- WebVTT 字幕.json- 带时间戳的 JSON
环境变量
# OpenAI (如果使用云端)
export OPENAI_API_KEY="sk-xxx"
# 或者使用硅基流动 (更便宜)
export SILICONFLOW_API_KEY="xxx"
示例
# 转录 YouTube 视频
bash stt.sh "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# 指定模型
bash stt.sh "https://youtube.com/watch?v=xxx" --model medium
# 保存为 SRT
bash stt.sh "https://youtube.com/watch?v=xxx" --format srt
Python 依赖
使用 uv 管理 Python 环境:
# 创建虚拟环境
uv venv
uv pip install yt-dlp whisper ffmpeg-python
# 运行
uv run python stt.py "视频URL"
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-damiencronw-video-stt": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
skill-list
列出 OpenClaw Skills:以表格形式展示所有 skills、显示功能介绍、找出功能重复的 skills。Use when: 用户想知道安装了哪些 skills、某个 skill 是做什么的、或者想找出重复功能的 skills。
pgvector
PostgreSQL vector database skill with pgvector extension. Enables vector similarity search, embeddings storage, RAG (Retrieval-Augmented Generation) pipelines, and hybrid search combining vector and keyword search. Use when: storing/retrieving embeddings, building AI applications with vector search, implementing RAG, similarity matching, semantic search, or any use case requiring vector database functionality.
port-manager
Port Manager - Track and manage system port usage. Use when: (1) Port conflict when installing software, (2) Check port usage, (3) Release occupied ports, (4) List all recorded service ports
memory-system
OpenClaw 长期记忆管理系统。提供结构化记忆、向量记忆、语义搜索功能。Use when: 用户需要 AI 记住长期上下文、偏好、决策,或需要从记忆中进行语义搜索。
skill-manager
管理 OpenClaw Skills:以表格形式列出所有 skills、显示功能介绍、找出功能重复的 skills。Use when: 用户想知道安装了哪些 skills、某个 skill 是做什么的、或者想找出重复功能的 skills。