ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

prompt_design_tuning_best_practice

Collaboratively design, evaluate, iterate on, and recommend a final launch candidate for a target prompt under the principle of “human-gated, agent-executed” workflow.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/abysscat-yj/prompt-design-tuning
Or

Prompt Design & Tuning Best Practices

The goal of this Skill is not to casually “chat about prompts,” but to turn prompt tuning into an executable, reviewable, and cost-controlled engineering workflow.

The Agent handles most of the execution work.
Humans are responsible only for validating direction, approving high-cost loops, and signing off on the final launch candidate.


When to Use

Use this Skill when the user needs to:

  • design or optimize a target prompt from scratch
  • design a separate evaluation / judge prompt
  • compare the performance of multiple models on an evaluation set
  • work with an existing API curl, SDK integration, or request protocol
  • run controlled prompt iterations under a limited budget
  • turn prompt tuning into a reusable workflow instead of a one-off chat exercise

Working Modes

1. Design-Only Mode

Use this mode when:

  • there is no runnable environment yet
  • no evaluation resources are available yet
  • real model calls cannot be executed for now

In this mode, the Agent should produce:

  • task definition
  • target prompt draft
  • judge prompt draft
  • evaluation plan
  • script skeletons
  • manual execution guidance

2. Execution Mode

Use this mode when:

  • a runnable environment already exists
  • the model invocation method has been provided
  • the evaluation set, resource limits, and candidate models have been provided

In this mode, the Agent should continue with:

  • batch generation
  • automatic evaluation
  • result analysis
  • prompt iteration
  • final candidate recommendation

Core Principles

The following rules are non-negotiable by default:

  1. The target prompt and the judge prompt must be separated.
    Do not silently modify both in the same comparison round and then mix their gains together.

  2. Before large-scale evaluation, the task definition (task spec) must be frozen first.

  3. Every round of prompt optimization must have a clear optimization hypothesis.
    No random “this sentence feels off, let’s tweak it” behavior.

  4. An experiment log must be maintained, including at least:

    • version number
    • summary of changes in the current round
    • optimization hypothesis
    • evaluation results
    • cost information
    • conclusion
  5. Any high-cost evaluation loop must be approved by a human beforehand.

  6. The final launch candidate must be reviewed by a human.
    A high machine-evaluation score does not automatically mean it is ready for launch.

  7. If the input information is incomplete, low-risk assumptions may be made, but they must be stated explicitly.


Recommended Inputs to Collect

The Agent should gather or infer the following whenever possible:

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-abysscat-yj-prompt-design-tuning": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.