ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified browser automation Safety 2/5

gemini-computer-use

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/am-will/gemini-computer-use
Or

What This Skill Does

The gemini-computer-use skill enables the integration of Google’s advanced multimodal Gemini models to perform automated browser interactions via Playwright. Instead of writing custom scripts for every website interaction, this skill allows the agent to visually perceive a webpage, interpret its contents, and execute actions such as clicks, typing, and navigation based on your natural language instructions. It operates using an iterative loop where the agent sends screenshots to the model, interprets the model’s proposed function_call commands, and performs the requested actions within a live Chromium-based browser session.

Installation

To integrate this skill into your OpenClaw environment, execute the command: clawhub install openclaw/skills/skills/am-will/gemini-computer-use.

Following installation, ensure your environment is configured by copying the env.example file to env.sh, populating it with your API credentials, and sourcing it. You will also need to initialize the underlying dependencies: pip install google-genai playwright followed by playwright install chromium. This ensures that the browser controller has the necessary binary engine to operate successfully.

Use Cases

This skill is ideal for complex web automation tasks that require human-like decision-making. Common use cases include:

  • Automated data extraction from dynamic, non-API-friendly websites.
  • Testing user interface flows across different browser environments (Chrome/Edge/Chromium).
  • Automating repetitive administrative tasks like filling out multi-step web forms.
  • Researching information across multiple pages where the structure is unpredictable or requires visual interpretation.

Example Prompts

  1. "Go to the company portal at example.com, login with the saved credentials, and export the Q3 financial report table to a CSV."
  2. "Search for the latest research papers on climate change, visit the top three results, and summarize the key findings from each."
  3. "Navigate to the registration page, fill out the form using the provided mock user data, and verify if the submission succeeds."

Tips & Limitations

  • Safety First: Always use the safety_decision features for sensitive actions. Because the model has the power to click and type, ensure you run this in a sandboxed profile to prevent accidental data modification or exposure.
  • Browser Control: You can toggle between different browser channels using environment variables. For specific enterprise environments, leverage COMPUTER_USE_BROWSER_EXECUTABLE to point to a specific, hardened browser build.
  • Turn Limits: Complex tasks should be scoped carefully. Set appropriate --turn-limit values (e.g., 6-10) to prevent the agent from getting stuck in an infinite loop if it fails to resolve a UI challenge.
  • Viewport Management: Stick to the recommended 1440x900 resolution. Deviating significantly may confuse the model’s vision-based coordinate system.

Metadata

Author@am-will
Stars3840
Views0
Updated2026-04-06
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-am-will-gemini-computer-use": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#browser-automation#gemini#playwright#agent-loop#web-scraping
Safety Score: 2/5

Flags: network-access, code-execution, external-api