Back to Registry
View Author Profile
Official Verified
vision-tagger
Tag and annotate images using Apple Vision framework (macOS only). Detects faces, bodies, hands, text (OCR), barcodes, objects, scene labels, and saliency regions. Use for image analysis, photo tagging, posture monitoring, or any task requiring computer vision on images.
skill-install — Terminal
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/sagarjhaa/vision-taggerOr
Vision Tagger
macOS-native image analysis using Apple's Vision framework. All processing is local — no cloud APIs, no API keys needed.
Requirements
- macOS 12+ (Monterey or later)
- Xcode Command Line Tools
- Python 3 with Pillow
Setup (one-time)
# Install Xcode CLI tools if needed
xcode-select --install
# Install Pillow
pip3 install Pillow
# Compile the Swift binary
cd scripts/
swiftc -O -o image_tagger image_tagger.swift
Usage
Analyze image → JSON
./scripts/image_tagger /path/to/photo.jpg
Output includes:
faces— bounding boxes, roll/yaw/pitch, landmarks (eyes, nose, mouth)bodies— 18 skeleton joints with confidence scoreshands— 21 joints per hand (left/right)text— OCR results with bounding boxeslabels— scene classification (desk, outdoor, clothing, etc.)barcodes— QR codes, UPC, etc.saliency— attention and objectness regions
Annotate image with boxes
python3 scripts/annotate_image.py photo.jpg output.jpg
Draws colored boxes:
- 🟢 Green: faces
- 🟠 Orange: body skeleton
- 🟣 Magenta: hands
- 🔵 Cyan: text regions
- 🟡 Yellow: rectangles/objects
- Scene labels at bottom
Python integration
import subprocess, json
def analyze(path):
r = subprocess.run(['./scripts/image_tagger', path], capture_output=True, text=True)
return json.loads(r.stdout[r.stdout.find('{'):])
tags = analyze('photo.jpg')
print(tags['labels']) # [{'label': 'desk', 'confidence': 0.85}, ...]
print(tags['faces']) # [{'bbox': {...}, 'confidence': 0.99, 'yaw': 5.2}]
Example JSON Output
{
"dimensions": {"width": 1920, "height": 1080},
"faces": [{"bbox": {"x": 0.3, "y": 0.4, "width": 0.15, "height": 0.2}, "confidence": 0.99, "roll": -2, "yaw": 5}],
"bodies": [{"joints": {"head_joint": {"x": 0.5, "y": 0.7, "confidence": 0.9}, "left_shoulder": {...}}, "confidence": 1}],
"hands": [{"chirality": "left", "joints": {"VNHLKWRI": {"x": 0.4, "y": 0.3, "confidence": 0.85}}}],
"text": [{"text": "HELLO", "confidence": 0.95, "bbox": {...}}],
"labels": [{"label": "outdoor", "confidence": 0.88}, {"label": "sky", "confidence": 0.75}],
"saliency": {"attentionBased": [{"x": 0.2, "y": 0.1, "width": 0.6, "height": 0.8}]}
}
Detection Capabilities
| Feature | Details |
|---|---|
| Faces | Bounding box, confidence, roll/yaw/pitch angles, 76-point landmarks |
| Bodies | 18 joints: head, neck, shoulders, elbows, wrists, hips, knees, ankles |
| Hands | 21 joints per hand, left/right chirality |
| Text (OCR) | Recognized text with confidence and bounding boxes |
| Labels | 1000+ scene/object categories (clothing, furniture, outdoor, etc.) |
| Barcodes | QR, UPC, EAN, Code128, PDF417, Aztec, DataMatrix |
| Saliency | Attention-based and objectness-based regions |
Use Cases
Metadata
AI Skill Finder
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skill Add to Configuration
Paste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-sagarjhaa-vision-tagger": {
"enabled": true,
"auto_update": true
}
}
}Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.