peer-review
Multi-model peer review layer using local LLMs via Ollama to catch errors in cloud model output. Fan-out critiques to 2-3 local models, aggregate flags, synthesize consensus. Use when: validating trade analyses, reviewing agent output quality, testing local model accuracy, checking any high-stakes Claude output before publishing or acting on it. Don't use when: simple fact-checking (just search the web), tasks that don't benefit from multi-model consensus, time-critical decisions where 60s latency is unacceptable, reviewing trivial or low-stakes content. Negative examples: - "Check if this date is correct" → No. Just web search it. - "Review my grocery list" → No. Not worth multi-model inference. - "I need this answer in 5 seconds" → No. Peer review adds 30-60s latency. Edge cases: - Short text (<50 words) → Models may not find meaningful issues. Consider skipping. - Highly technical domain → Local models may lack domain knowledge. Weight flags lower. - Creative writing → Factual review doesn't apply well. Use only for logical consistency.
Why use this skill?
Enhance agent reliability with the peer-review skill. Use local Ollama LLMs to catch factual and logical errors in cloud model outputs through automated consensus.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/staybased/peer-reviewWhat This Skill Does
The peer-review skill provides a robust multi-model consensus layer designed to validate outputs generated by high-latency cloud models like Claude. By leveraging local Ollama-hosted models (Mistral 7B, TinyLlama 1.1B, and Llama 3.1 8B), this skill acts as an automated, skeptical editor. It performs a parallel "fan-out" operation where all three local models analyze the source text independently, focusing on identifying factual, logical, and stylistic errors. The results are aggregated to produce a high-confidence consensus, ensuring that potential hallucinations or logic gaps are caught before they reach a human or enter a production pipeline.
Installation
To add this capability to your OpenClaw agent, use the CLI:
clawhub install openclaw/skills/skills/staybased/peer-review
Ensure you have Ollama running locally with the necessary models (Drift/Mistral 7B, Pip/TinyLlama 1.1B, and Lume/Llama 3.1 8B) installed prior to execution, as the skill relies on local inference for performance and privacy.
Use Cases
- Validated Trade Analysis: Use this to double-check financial reasoning or market analysis produced by cloud models, specifically looking for unsupported conclusions.
- Technical Documentation Review: Use to ensure that generated code documentation, API references, or architecture diagrams are logically consistent and accurate.
- High-Stakes Decision Support: Apply this to critical agent outputs that require verification against provided datasets, reducing the risk of relying on a single, potentially biased model response.
Example Prompts
- "Perform a peer review on the market report I just generated; verify the logic behind the Q3 growth predictions."
- "Review the following technical summary for any logical fallacies or unsupported technical claims before I send this to the engineering team."
- "Critique the draft strategy document: check for factual accuracy regarding the Q1 revenue data and flag any overconfident conclusions."
Tips & Limitations
- Latency Trade-off: This skill introduces 30-60 seconds of latency due to the multi-model inference. Do not use for real-time customer service interactions.
- Domain Specificity: Local models may struggle with highly specialized scientific or legal jargon compared to massive cloud models. Always weight the output of the smaller models (like TinyLlama) lower than the 8B models.
- Input Constraints: For texts under 50 words, the feedback loop is often redundant; save your compute resources for dense, complex analytical pieces where errors are more likely to be masked by eloquent phrasing.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-staybased-peer-review": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, code-execution
Related Skills
Proposal Writing
Skill by staybased
ops-hygiene
Standard operating procedures for agent maintenance, security hygiene, and system health. Use when performing periodic checks, security audits, memory maintenance, secret rotation, dependency updates, or any recurring "housekeeping" tasks. Also use when setting up automated maintenance schedules or when asked about agent security posture.
Lead Magnets
Skill by staybased
Pricing Psychology
Skill by staybased
trade-validation
10-dimension weighted scoring framework for prediction market trade evaluation. Enforces disciplined position sizing, circuit breakers, and mandatory counter-arguments. Use when: evaluating prediction market trades, scoring opportunities, deciding position sizes, comparing Polymarket/Kalshi opportunities, running pre-trade checklists. Don't use when: general crypto analysis, DeFi yield farming, non-prediction-market investments, stock/equity analysis, sports betting (different framework needed). Negative examples: - "Should I buy ETH?" → No. This is for prediction markets with binary/discrete outcomes. - "What's the best DeFi yield?" → No. Wrong domain entirely. - "Score this sports bet" → No. Sports betting has different dimensions (injuries, matchups). Edge cases: - Crypto prediction markets (e.g., "Will BTC hit $X?") → YES, use this if on Polymarket/Kalshi. - Multi-outcome markets → Score each outcome separately. - Markets with <$25 liquidity → Auto-fail on Liquidity dimension.