llmfit-advisor
Detect local hardware (RAM, CPU, GPU/VRAM) and recommend the best-fit local LLM models with optimal quantization, speed estimates, and fit scoring.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/alexsjones/llmfitWhat This Skill Does
The llmfit-advisor is an intelligent diagnostic and recommendation agent designed to optimize your local Large Language Model (LLM) ecosystem. By analyzing your specific system architecture—including CPU, RAM, and GPU/VRAM configurations—it calculates a precise fit score for various models. It eliminates the guesswork of whether a model will fit in your memory or perform at a usable speed by providing tailored quantization recommendations (e.g., Q4_K_M or Q8_0) based on your hardware's actual constraints. Whether you are running on unified memory like Apple Silicon or dedicated NVIDIA hardware, this skill ensures you aren't over-provisioning or under-utilizing your system's capabilities.
Installation
To add this skill to your OpenClaw environment, use the provided clawhub command:
clawhub install openclaw/skills/skills/alexsjones/llmfit
Ensure that your OpenClaw runtime has the necessary permissions to probe system hardware so that the tool can perform its initial detection scan accurately. You can verify the installation by running llmfit --json system to confirm that the agent correctly identifies your GPU and memory modules.
Use Cases
This skill is essential for users who want to run privacy-focused, offline AI workloads. It excels in:
- Hardware Optimization: Configuring Ollama or LM Studio environments to extract maximum performance from limited VRAM.
- Workload Specialization: Selecting the best quantized model variants specifically optimized for coding, creative writing, or logical reasoning tasks.
- Capacity Planning: Determining the feasibility of running high-parameter models (like 70B variants) versus medium-parameter models on existing hardware.
Example Prompts
- "I have 16GB of RAM and a 3060. Can I run Llama 3.1 8B, and what is the best quantization for coding?"
- "What are the top 3 reasoning-focused local models that fit into my current VRAM?"
- "My system is running slow with current models; suggest a more optimal configuration for my hardware setup."
Tips & Limitations
- Hardware Refresh: Always re-run the detection command if you add external GPU enclosures or upgrade your RAM, as the advisor relies on static hardware snapshots.
- Quantization Trade-offs: While the advisor suggests the 'best' fit, remember that lower quantization (e.g., Q3) saves memory but may slightly degrade model intelligence compared to Q5 or Q6.
- Provider Support: This tool works best with Ollama and LM Studio. It does not control external cloud APIs, focusing strictly on local inference environments.
- Unified Memory: Users on Apple Silicon should pay close attention to the 'unified_memory' flag in the system output, as this significantly changes the recommendation profile for memory-heavy tasks.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-alexsjones-llmfit": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read