gemini-computer-use
Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/am-will/gemini-computer-useWhat This Skill Does
The gemini-computer-use skill enables the integration of Google’s advanced multimodal Gemini models to perform automated browser interactions via Playwright. Instead of writing custom scripts for every website interaction, this skill allows the agent to visually perceive a webpage, interpret its contents, and execute actions such as clicks, typing, and navigation based on your natural language instructions. It operates using an iterative loop where the agent sends screenshots to the model, interprets the model’s proposed function_call commands, and performs the requested actions within a live Chromium-based browser session.
Installation
To integrate this skill into your OpenClaw environment, execute the command: clawhub install openclaw/skills/skills/am-will/gemini-computer-use.
Following installation, ensure your environment is configured by copying the env.example file to env.sh, populating it with your API credentials, and sourcing it. You will also need to initialize the underlying dependencies: pip install google-genai playwright followed by playwright install chromium. This ensures that the browser controller has the necessary binary engine to operate successfully.
Use Cases
This skill is ideal for complex web automation tasks that require human-like decision-making. Common use cases include:
- Automated data extraction from dynamic, non-API-friendly websites.
- Testing user interface flows across different browser environments (Chrome/Edge/Chromium).
- Automating repetitive administrative tasks like filling out multi-step web forms.
- Researching information across multiple pages where the structure is unpredictable or requires visual interpretation.
Example Prompts
- "Go to the company portal at example.com, login with the saved credentials, and export the Q3 financial report table to a CSV."
- "Search for the latest research papers on climate change, visit the top three results, and summarize the key findings from each."
- "Navigate to the registration page, fill out the form using the provided mock user data, and verify if the submission succeeds."
Tips & Limitations
- Safety First: Always use the
safety_decisionfeatures for sensitive actions. Because the model has the power to click and type, ensure you run this in a sandboxed profile to prevent accidental data modification or exposure. - Browser Control: You can toggle between different browser channels using environment variables. For specific enterprise environments, leverage
COMPUTER_USE_BROWSER_EXECUTABLEto point to a specific, hardened browser build. - Turn Limits: Complex tasks should be scoped carefully. Set appropriate
--turn-limitvalues (e.g., 6-10) to prevent the agent from getting stuck in an infinite loop if it fails to resolve a UI challenge. - Viewport Management: Stick to the recommended 1440x900 resolution. Deviating significantly may confuse the model’s vision-based coordinate system.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-am-will-gemini-computer-use": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, code-execution, external-api
Related Skills
get-you-some-britches
Use this skill any time I start complaining about my love life, or, if I indicate I need to find some pants.
morning-email-rollup
Daily morning rollup of important emails and calendar events at 8am with AI-generated summaries
remotion-best-practices
Best practices for Remotion - Video creation in React
context7
Fetch up-to-date library documentation via Context7 API. Use PROACTIVELY when: (1) Working with ANY external library (React, Next.js, Supabase, etc.) (2) User asks about library APIs, patterns, or best practices (3) Implementing features that rely on third-party packages (4) Debugging library-specific issues (5) Need current documentation beyond training data cutoff Always prefer this over guessing library APIs or using outdated knowledge.
openai-docs-skill
Query the OpenAI developer documentation via the OpenAI Docs MCP server using CLI (curl/jq). Use whenever a task involves the OpenAI API (Responses, Chat Completions, Realtime, etc.), OpenAI SDKs, ChatGPT Apps SDK, Codex, MCP integrations, endpoint schemas, parameters, limits, or migrations and you need up-to-date official guidance.