protein-phylogeny
Comprehensive protein family phylogenetic analysis workflow with quality control, conservation analysis, coevolution network analysis, and publication-ready visualization. Use when: (1) analyzing protein family evolution, (2) building phylogenetic trees from sequences, (3) identifying conserved/coevolved residues, (4) generating publication-quality figures and reports, (5) quality-controlling sequence datasets, or (6) performing systematic evolutionary analysis of enzyme families, protein superfamilies, or any homologous protein groups.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/billwanttobetop/protein-phylogenyProtein Family Phylogenetic Analysis
Complete workflow for protein family evolutionary analysis: quality control → conservation → coevolution → phylogeny → publication report.
Quick Start
Input: FASTA file with protein sequences (any family, any size)
Output: Publication-ready report with phylogenetic tree, conservation analysis, coevolution networks, and high-quality figures
Typical workflow:
# 1. Quality control (removes low-quality sequences)
bash scripts/01_quality_control.sh input.fasta output_dir/
# 2. Conservation analysis
bash scripts/02_conservation.sh output_dir/qc/final.fasta output_dir/
# 3. Coevolution analysis
bash scripts/03_coevolution.sh output_dir/qc/final.fasta output_dir/
# 4. Phylogenetic tree
bash scripts/04_phylogeny.sh output_dir/qc/final.fasta output_dir/
# 5. Generate figures
bash scripts/05_visualize.sh output_dir/
# 6. Create report
bash scripts/06_report.sh output_dir/ "Family Name"
Workflow Overview
Stage 1: Quality Control (references/01-quality-control.md)
Purpose: Filter raw sequences to high-quality, non-redundant dataset
Steps:
- Literature validation (remove predicted sequences)
- Length filtering (remove fragments/fusions)
- CD-HIT redundancy removal (90% identity)
- Complexity check (remove low-complexity regions)
- Motif validation (confirm family membership)
- MAFFT alignment (high accuracy mode)
- trimAl trimming (automatic strategy)
- Final validation (gap ratio, coverage)
Key parameters:
- CD-HIT threshold: 90% (adjustable 70-95%)
- Length range: mean ± 2 SD
- Gap threshold: < 30% per position
- Motif coverage: > 50%
Output: qc/final.fasta (high-quality aligned sequences)
Stage 2: Conservation Analysis (references/02-conservation.md)
Purpose: Identify functionally important conserved residues
Method: Shannon entropy
- H_norm < 0.3: Highly conserved
- H_norm 0.3-0.6: Moderately conserved
- H_norm > 0.6: Variable
Output:
- Conserved positions list
- Conservation landscape plot
- Gap vs conservation scatter plot
Stage 3: Coevolution Analysis (references/03-coevolution.md)
Purpose: Identify residue pairs that evolve together
Method: Normalized Mutual Information (NMI)
- Corrects for phylogenetic bias
- Identifies structural/functional coupling
- Builds coevolution network
Output:
- Coevolved position pairs (MI scores)
- Network graph (hub identification)
- Hub residue heatmap
Stage 4: Phylogenetic Analysis (references/04-phylogeny.md)
Purpose: Reconstruct evolutionary relationships
Method: IQ-TREE maximum likelihood
- Automatic model selection (ModelFinder)
- UFBoot2 ultrafast bootstrap (1000 replicates)
- Convergence check (> 0.99 required)
Output:
- Phylogenetic tree (.treefile)
- Bootstrap consensus tree (.contree)
- Model parameters (.iqtree)
Stage 5: Visualization (references/05-visualization.md)
Purpose: Generate publication-quality figures (300 DPI)
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-billwanttobetop-protein-phylogeny": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
auto-proteomics
Public OpenClaw skill for low-token routing and downstream analysis of processed DDA LFQ proteomics inputs. Use when the user already has protein-level quantification tables such as MaxQuant-style `proteinGroups.txt` and needs a clear two-group downstream workflow.
Automd Viz
Skill by billwanttobetop
Phylo Tree
Skill by billwanttobetop
automd-gromacs
AutoMD-GROMACS: Automated molecular dynamics simulation workflow - 13 Skills covering system setup, equilibration, production, analysis, free energy, ligand binding, membrane proteins, umbrella sampling, PCA, and workflows. Built-in auto-repair, 84.7% token savings. Part of the AutoMD series.
protein-qc-strict
Strictest protein sequence analysis quality control workflow (3365→456 sequences). Includes literature validation, CD-HIT redundancy removal, complexity check, motif verification, MSA quality assessment, and conservation/coevolution analysis. Based on real research experience with IRED enzyme family.