prompt_design_tuning_best_practice
Collaboratively design, evaluate, iterate on, and recommend a final launch candidate for a target prompt under the principle of “human-gated, agent-executed” workflow.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/abysscat-yj/prompt-design-tuningPrompt Design & Tuning Best Practices
The goal of this Skill is not to casually “chat about prompts,” but to turn prompt tuning into an executable, reviewable, and cost-controlled engineering workflow.
The Agent handles most of the execution work.
Humans are responsible only for validating direction, approving high-cost loops, and signing off on the final launch candidate.
When to Use
Use this Skill when the user needs to:
- design or optimize a target prompt from scratch
- design a separate evaluation / judge prompt
- compare the performance of multiple models on an evaluation set
- work with an existing API curl, SDK integration, or request protocol
- run controlled prompt iterations under a limited budget
- turn prompt tuning into a reusable workflow instead of a one-off chat exercise
Working Modes
1. Design-Only Mode
Use this mode when:
- there is no runnable environment yet
- no evaluation resources are available yet
- real model calls cannot be executed for now
In this mode, the Agent should produce:
- task definition
- target prompt draft
- judge prompt draft
- evaluation plan
- script skeletons
- manual execution guidance
2. Execution Mode
Use this mode when:
- a runnable environment already exists
- the model invocation method has been provided
- the evaluation set, resource limits, and candidate models have been provided
In this mode, the Agent should continue with:
- batch generation
- automatic evaluation
- result analysis
- prompt iteration
- final candidate recommendation
Core Principles
The following rules are non-negotiable by default:
-
The target prompt and the judge prompt must be separated.
Do not silently modify both in the same comparison round and then mix their gains together. -
Before large-scale evaluation, the task definition (task spec) must be frozen first.
-
Every round of prompt optimization must have a clear optimization hypothesis.
No random “this sentence feels off, let’s tweak it” behavior. -
An experiment log must be maintained, including at least:
- version number
- summary of changes in the current round
- optimization hypothesis
- evaluation results
- cost information
- conclusion
-
Any high-cost evaluation loop must be approved by a human beforehand.
-
The final launch candidate must be reviewed by a human.
A high machine-evaluation score does not automatically mean it is ready for launch. -
If the input information is incomplete, low-risk assumptions may be made, but they must be stated explicitly.
Recommended Inputs to Collect
The Agent should gather or infer the following whenever possible:
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-abysscat-yj-prompt-design-tuning": {
"enabled": true,
"auto_update": true
}
}
}