ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified system Safety 3/5

open-autoglm-phone-agent

Expert skill for Open-AutoGLM, an AI phone agent framework that controls Android/HarmonyOS/iOS devices via natural language using the AutoGLM vision-language model

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/adisinghstudent/open-autoglm-phone-agent
Or

What This Skill Does

The open-autoglm-phone-agent is a powerful automation skill for OpenClaw that transforms your device into an autonomous agent. By leveraging the AutoGLM vision-language model, this agent interprets your screen content and executes complex, multi-step actions across Android, HarmonyOS NEXT, and iOS devices. Unlike simple automation scripts that rely on static UI coordinates, this agent 'sees' the interface as a human would, identifying buttons, text fields, and navigation elements dynamically. It bridges the gap between high-level natural language instructions and granular device-level inputs, effectively allowing your AI to perform tasks like searching, browsing, and app interaction in real-time.

Installation

To get started, ensure you have the necessary environment setup:

  1. Ensure your machine has Python 3.10+ and the required device bridge drivers (ADB for Android, HDC for HarmonyOS, or WebDriverAgent for iOS).
  2. Execute the installation command in your terminal: clawhub install openclaw/skills/skills/adisinghstudent/open-autoglm-phone-agent.
  3. Verify your device connection by running adb devices or hdc list targets to ensure the agent can communicate with your target hardware.
  4. Configure your preferred model endpoint. We recommend the BigModel (ZhipuAI) API for a quick start, or you can deploy locally using vLLM if you have significant GPU resources (24GB+ VRAM suggested for the 9B model).

Use Cases

  • Automated Testing: Run regression suites on mobile apps without writing custom scripts for every UI change.
  • Daily Workflow Automation: Automatically open specific apps, retrieve data, or check statuses while you are away from your phone.
  • Accessibility Support: Assist users by executing complex navigation tasks via voice or text input.
  • Data Extraction: Scrape structured information from mobile-only applications that lack public APIs.

Example Prompts

  1. "Open Meituan, search for the nearest Italian restaurant, and take a screenshot of the top result."
  2. "Check my recent WhatsApp notifications and summarize any messages from 'Mom'."
  3. "Open Spotify, find a jazz playlist, and set the volume to 50%."

Tips & Limitations

  • Precision: While highly effective, the agent may struggle with ultra-fast animations or highly cluttered screens. Give the model a moment to process the UI frames.
  • Device State: Ensure your device is unlocked and the screen is active. The agent performs best when the device is connected via high-speed USB cables.
  • Privacy: As this agent captures screenshots to perform tasks, be mindful of sensitive information visible on your screen during operations.
  • Updates: Always keep the AutoGLM framework updated to receive the latest perception improvements and supported action schemas.

Metadata

Stars3809
Views1
Updated2026-04-05
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-adisinghstudent-open-autoglm-phone-agent": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#mobile#automation#vision#agent#ui
Safety Score: 3/5

Flags: network-access, external-api, code-execution