ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

protein-phylogeny

Comprehensive protein family phylogenetic analysis workflow with quality control, conservation analysis, coevolution network analysis, and publication-ready visualization. Use when: (1) analyzing protein family evolution, (2) building phylogenetic trees from sequences, (3) identifying conserved/coevolved residues, (4) generating publication-quality figures and reports, (5) quality-controlling sequence datasets, or (6) performing systematic evolutionary analysis of enzyme families, protein superfamilies, or any homologous protein groups.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/billwanttobetop/protein-phylogeny
Or

Protein Family Phylogenetic Analysis

Complete workflow for protein family evolutionary analysis: quality control → conservation → coevolution → phylogeny → publication report.

Quick Start

Input: FASTA file with protein sequences (any family, any size)
Output: Publication-ready report with phylogenetic tree, conservation analysis, coevolution networks, and high-quality figures

Typical workflow:

# 1. Quality control (removes low-quality sequences)
bash scripts/01_quality_control.sh input.fasta output_dir/

# 2. Conservation analysis
bash scripts/02_conservation.sh output_dir/qc/final.fasta output_dir/

# 3. Coevolution analysis
bash scripts/03_coevolution.sh output_dir/qc/final.fasta output_dir/

# 4. Phylogenetic tree
bash scripts/04_phylogeny.sh output_dir/qc/final.fasta output_dir/

# 5. Generate figures
bash scripts/05_visualize.sh output_dir/

# 6. Create report
bash scripts/06_report.sh output_dir/ "Family Name"

Workflow Overview

Stage 1: Quality Control (references/01-quality-control.md)

Purpose: Filter raw sequences to high-quality, non-redundant dataset

Steps:

  1. Literature validation (remove predicted sequences)
  2. Length filtering (remove fragments/fusions)
  3. CD-HIT redundancy removal (90% identity)
  4. Complexity check (remove low-complexity regions)
  5. Motif validation (confirm family membership)
  6. MAFFT alignment (high accuracy mode)
  7. trimAl trimming (automatic strategy)
  8. Final validation (gap ratio, coverage)

Key parameters:

  • CD-HIT threshold: 90% (adjustable 70-95%)
  • Length range: mean ± 2 SD
  • Gap threshold: < 30% per position
  • Motif coverage: > 50%

Output: qc/final.fasta (high-quality aligned sequences)

Stage 2: Conservation Analysis (references/02-conservation.md)

Purpose: Identify functionally important conserved residues

Method: Shannon entropy

  • H_norm < 0.3: Highly conserved
  • H_norm 0.3-0.6: Moderately conserved
  • H_norm > 0.6: Variable

Output:

  • Conserved positions list
  • Conservation landscape plot
  • Gap vs conservation scatter plot

Stage 3: Coevolution Analysis (references/03-coevolution.md)

Purpose: Identify residue pairs that evolve together

Method: Normalized Mutual Information (NMI)

  • Corrects for phylogenetic bias
  • Identifies structural/functional coupling
  • Builds coevolution network

Output:

  • Coevolved position pairs (MI scores)
  • Network graph (hub identification)
  • Hub residue heatmap

Stage 4: Phylogenetic Analysis (references/04-phylogeny.md)

Purpose: Reconstruct evolutionary relationships

Method: IQ-TREE maximum likelihood

  • Automatic model selection (ModelFinder)
  • UFBoot2 ultrafast bootstrap (1000 replicates)
  • Convergence check (> 0.99 required)

Output:

  • Phylogenetic tree (.treefile)
  • Bootstrap consensus tree (.contree)
  • Model parameters (.iqtree)

Stage 5: Visualization (references/05-visualization.md)

Purpose: Generate publication-quality figures (300 DPI)

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-billwanttobetop-protein-phylogeny": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.