What This Skill Does

CascadeFlow is a high-performance optimization layer for OpenClaw that acts as a custom provider to reduce AI operational costs and latency. By implementing a sophisticated cascading architecture, it allows developers to route requests dynamically between different LLM providers (OpenAI, Anthropic, etc.) based on defined strategy presets. The tool exposes an OpenAI-compatible /v1/chat/completions endpoint, ensuring a seamless integration experience for OpenClaw agents without requiring complex infrastructure changes. It is specifically designed to maintain full support for streaming responses and multi-step agent loops, ensuring that cost-saving measures do not hinder agent performance or user experience.

Installation

To integrate CascadeFlow, ensure you have Python 3.10+ installed. Begin by cloning the repository from GitHub: git clone https://github.com/lemony-ai/cascadeflow.git. Navigate into the folder, create a virtual environment with python3 -m venv .venv and activate it. Install dependencies using pip install -e ".[openclaw,providers]". Once installed, select a preset configuration from examples/configs/—such as mixed-anthropic-openai.yaml—and define your ANTHROPIC_API_KEY and OPENAI_API_KEY in a .env file. Launch the server using the provided command targeting your chosen config and specifying auth tokens for security. Finally, map the baseUrl to http://127.0.0.1:8084/v1 in OpenClaw and set the model to cascadeflow.

Use Cases

CascadeFlow is ideal for high-scale agent deployments where throughput costs become prohibitive. It is perfect for developers building complex agentic workflows that require a mix of model capabilities—using a cheaper 'fast' model for routine tasks and escalating to more powerful models only when necessary. This cascading behavior is transparent to the agent loop, making it highly effective for RAG (Retrieval-Augmented Generation) pipelines, complex code generation tasks, and interactive chat sessions where latency spikes need to be strictly controlled.

Example Prompts

"/model cflow; Analyze this codebase and suggest a refactor pattern that minimizes unnecessary re-runs."
"/model cflow; Research the latest developments in modular AI architecture and summarize them in a concise report."
"/model cflow; Help me iterate on this prompt chain to ensure the final output is formatted as valid JSON."

Tips & Limitations

Always run CascadeFlow behind a TLS reverse proxy if you intend to expose the service beyond 127.0.0.1. While the cascading logic significantly lowers costs, it relies heavily on the accuracy of your chosen preset. Start with the openai-only.yaml or anthropic-only.yaml to baseline your performance before experimenting with the mixed configurations. Note that CascadeFlow adds a small overhead; ensure your network latency between the proxy and the LLM providers is accounted for in your latency budgets.

cascadeflow

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)