open-autoglm-phone-agent
Expert skill for Open-AutoGLM, an AI phone agent framework that controls Android/HarmonyOS/iOS devices via natural language using the AutoGLM vision-language model
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/adisinghstudent/open-autoglm-phone-agentWhat This Skill Does
The open-autoglm-phone-agent is a powerful automation skill for OpenClaw that transforms your device into an autonomous agent. By leveraging the AutoGLM vision-language model, this agent interprets your screen content and executes complex, multi-step actions across Android, HarmonyOS NEXT, and iOS devices. Unlike simple automation scripts that rely on static UI coordinates, this agent 'sees' the interface as a human would, identifying buttons, text fields, and navigation elements dynamically. It bridges the gap between high-level natural language instructions and granular device-level inputs, effectively allowing your AI to perform tasks like searching, browsing, and app interaction in real-time.
Installation
To get started, ensure you have the necessary environment setup:
- Ensure your machine has Python 3.10+ and the required device bridge drivers (ADB for Android, HDC for HarmonyOS, or WebDriverAgent for iOS).
- Execute the installation command in your terminal:
clawhub install openclaw/skills/skills/adisinghstudent/open-autoglm-phone-agent. - Verify your device connection by running
adb devicesorhdc list targetsto ensure the agent can communicate with your target hardware. - Configure your preferred model endpoint. We recommend the BigModel (ZhipuAI) API for a quick start, or you can deploy locally using vLLM if you have significant GPU resources (24GB+ VRAM suggested for the 9B model).
Use Cases
- Automated Testing: Run regression suites on mobile apps without writing custom scripts for every UI change.
- Daily Workflow Automation: Automatically open specific apps, retrieve data, or check statuses while you are away from your phone.
- Accessibility Support: Assist users by executing complex navigation tasks via voice or text input.
- Data Extraction: Scrape structured information from mobile-only applications that lack public APIs.
Example Prompts
- "Open Meituan, search for the nearest Italian restaurant, and take a screenshot of the top result."
- "Check my recent WhatsApp notifications and summarize any messages from 'Mom'."
- "Open Spotify, find a jazz playlist, and set the volume to 50%."
Tips & Limitations
- Precision: While highly effective, the agent may struggle with ultra-fast animations or highly cluttered screens. Give the model a moment to process the UI frames.
- Device State: Ensure your device is unlocked and the screen is active. The agent performs best when the device is connected via high-speed USB cables.
- Privacy: As this agent captures screenshots to perform tasks, be mindful of sensitive information visible on your screen during operations.
- Updates: Always keep the AutoGLM framework updated to receive the latest perception improvements and supported action schemas.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-adisinghstudent-open-autoglm-phone-agent": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, external-api, code-execution
Related Skills
Oh My Openagent Omo
Skill by adisinghstudent
Planning With Files Manus Workflow
Skill by adisinghstudent
mirofish-offline-simulation
Fully local multi-agent swarm intelligence simulation engine using Neo4j + Ollama for public opinion, market sentiment, and social dynamics prediction.
ghostling-libghostty-terminal
Build minimal terminal emulators using the libghostty-vt C API with Raylib for windowing and rendering
Obra Superpowers Agentic Workflow
Skill by adisinghstudent