Official Verified browser automation Safety 2/5

gemini-computer-use

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/am-will/gemini-computer-use

Download Source Code (.zip)

What This Skill Does

The gemini-computer-use skill enables the integration of Google’s advanced multimodal Gemini models to perform automated browser interactions via Playwright. Instead of writing custom scripts for every website interaction, this skill allows the agent to visually perceive a webpage, interpret its contents, and execute actions such as clicks, typing, and navigation based on your natural language instructions. It operates using an iterative loop where the agent sends screenshots to the model, interprets the model’s proposed function_call commands, and performs the requested actions within a live Chromium-based browser session.

Installation

To integrate this skill into your OpenClaw environment, execute the command: clawhub install openclaw/skills/skills/am-will/gemini-computer-use.

Following installation, ensure your environment is configured by copying the env.example file to env.sh, populating it with your API credentials, and sourcing it. You will also need to initialize the underlying dependencies: pip install google-genai playwright followed by playwright install chromium. This ensures that the browser controller has the necessary binary engine to operate successfully.

Use Cases

This skill is ideal for complex web automation tasks that require human-like decision-making. Common use cases include:

Automated data extraction from dynamic, non-API-friendly websites.
Testing user interface flows across different browser environments (Chrome/Edge/Chromium).
Automating repetitive administrative tasks like filling out multi-step web forms.
Researching information across multiple pages where the structure is unpredictable or requires visual interpretation.

Example Prompts

"Go to the company portal at example.com, login with the saved credentials, and export the Q3 financial report table to a CSV."
"Search for the latest research papers on climate change, visit the top three results, and summarize the key findings from each."
"Navigate to the registration page, fill out the form using the provided mock user data, and verify if the submission succeeds."

Tips & Limitations

Safety First: Always use the safety_decision features for sensitive actions. Because the model has the power to click and type, ensure you run this in a sandboxed profile to prevent accidental data modification or exposure.
Browser Control: You can toggle between different browser channels using environment variables. For specific enterprise environments, leverage COMPUTER_USE_BROWSER_EXECUTABLE to point to a specific, hardened browser build.
Turn Limits: Complex tasks should be scoped carefully. Set appropriate --turn-limit values (e.g., 6-10) to prevent the agent from getting stuck in an infinite loop if it fails to resolve a UI challenge.
Viewport Management: Stick to the recommended 1440x900 resolution. Deviating significantly may confuse the model’s vision-based coordinate system.

Read Full Documentation on GitHub

Metadata

Author@am-will

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-am-will-gemini-computer-use": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#browser-automation#gemini#playwright#agent-loop#web-scraping

Safety Score: 2/5

Flags: network-access, code-execution, external-api

Related Skills

remotion-best-practices

Best practices for Remotion - Video creation in React

am-will 4473

read-github

Read GitHub repos the RIGHT way - via gitmcp.io instead of raw scraping. Why this beats web search: (1) Semantic search across docs, not just keyword matching, (2) Smart code navigation with accurate file structure - zero hallucinations on repo layout, (3) Proper markdown output optimized for LLMs, not raw HTML/JSON garbage, (4) Aggregates README + /docs + code in one clean interface, (5) Respects rate limits and robots.txt. Stop pasting raw GitHub URLs - use this instead.

am-will 4473

morning-email-rollup

Daily morning rollup of important emails and calendar events at 8am with AI-generated summaries

am-will 4473

openai-docs-skill

Query the OpenAI developer documentation via the OpenAI Docs MCP server using CLI (curl/jq). Use whenever a task involves the OpenAI API (Responses, Chat Completions, Realtime, etc.), OpenAI SDKs, ChatGPT Apps SDK, Codex, MCP integrations, endpoint schemas, parameters, limits, or migrations and you need up-to-date official guidance.

am-will 4473

remotion-best-practices

Best practices for Remotion - Video creation in React

am-will 4473