ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified ai models Safety 4/5

image-model-evaluation

评估图像生成模型的效果。对指定模型进行全面的文生图、图生图测试,包括不同参数、不同提示词、人物一致性等测试项,生成详细的HTML测试报告。当用户想要测试、评估、对比图像模型效果时使用此 skill。

Why use this skill?

Use the OpenClaw image-model-evaluation skill to comprehensively test and benchmark image generation models. Get detailed HTML reports on T2I, I2I, and character consistency.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/hexiaochun/image-model-evaluation
Or

What This Skill Does

The image-model-evaluation skill is a professional-grade testing suite designed to assess the performance, consistency, and reliability of image generation models. It provides a structured framework to evaluate how well a model handles text-to-image (T2I) and image-to-image (I2I) tasks. By conducting systematic tests—ranging from stylistic variations and aspect ratios to complex, multi-variable character consistency—the skill generates comprehensive, easy-to-read HTML reports. It serves as an essential tool for developers and content creators who need empirical data to choose the right AI model for their projects, ensuring that the chosen model delivers expected output quality and stable character preservation under diverse conditions.

Installation

To integrate this skill into your OpenClaw agent, use the following installation command in your terminal: clawhub install openclaw/skills/skills/hexiaochun/image-model-evaluation

Use Cases

  • Model Selection: Compare different models like Stable Diffusion, Flux, or custom enterprise models to decide which fits your production pipeline.
  • Consistency Audit: Verify that character features (facial structure, hair, body proportions) are preserved when generating images across different poses, lighting, and environments.
  • Quality Assurance: Automated regression testing to ensure that model updates or fine-tuning haven't negatively impacted specific stylistic or structural capabilities.
  • API Benchmarking: Analyze generation speed and success rates to estimate production costs and latency.

Example Prompts

  1. "帮我测试一下 jimeng-4.5 模型的效果,我想了解它的文生图表现。"
  2. "执行一次完整的评估,看看 Stable Diffusion XL 在人物一致性测试中的表现如何。"
  3. "对 mi-journey-v6 进行快速测试,检查它在写实和动漫风格转换上的准确度。"

Tips & Limitations

  • Concurrency: To maintain stability, the skill caps parallel API requests at 4. Please wait for the process to complete to avoid hitting rate limits.
  • Timeouts: Individual tasks have a maximum timeout of 120 seconds. If your model response is slow, consider adjusting your network settings or checking model availability.
  • Cost Awareness: Always review the cost estimate provided by the agent before confirming the test execution. Full testing involves 31 distinct scenarios and may incur significant API usage costs.
  • Error Recovery: If a specific test fails due to network spikes, the tool allows for selective retries rather than restarting the entire batch.
  • Data Privacy: Ensure that any input images used for testing do not contain sensitive personal or proprietary information, as these are processed through the model evaluation pipeline.

Metadata

Stars2387
Views0
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-hexiaochun-image-model-evaluation": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#image-generation#model-evaluation#quality-assurance#computer-vision
Safety Score: 4/5

Flags: file-write, file-read, external-api