kafka-observability
Kafka monitoring and observability expert for Prometheus, Grafana, and JMX metrics. Use when setting up Kafka monitoring, configuring alerting rules, or building performance dashboards.
Why use this skill?
Master Kafka monitoring with automated JMX exporter setup, pre-built Grafana dashboards, and Prometheus configuration for cluster health, performance tracking, and consumer lag.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/anton-abyzov/sw-kafka-observabilityWhat This Skill Does
The kafka-observability skill acts as a specialized assistant for monitoring Apache Kafka clusters. It provides a structured approach to setting up the JMX Exporter, configuring Prometheus scraping, and deploying production-ready Grafana dashboards. By automating the integration of JVM-level performance data with Kafka-specific throughput metrics, it enables teams to move from reactive troubleshooting to proactive cluster management. The skill encapsulates best-practice configurations for heap, garbage collection, and thread monitoring alongside critical broker-level health metrics like replica lag and controller status.
Installation
To integrate this skill into your environment, use the OpenClaw CLI or interface:
clawhub install openclaw/skills/skills/anton-abyzov/sw-kafka-observability
Once installed, verify the availability of the monitoring component path: plugins/specweave-kafka/monitoring/. For self-hosted deployments, ensure your Kafka service environment variables include the KAFKA_OPTS definition as specified in the plugin documentation to attach the JMX Prometheus agent. If using Kubernetes, map the provided kafka-jmx-exporter.yml config file into your Prometheus-JMX container sidecar.
Use Cases
- Proactive Cluster Health: Monitor controller elections and under-replicated partitions before they trigger production outages.
- Performance Tuning: Track request latencies and produce/fetch rates to identify bottlenecks in your broker network.
- Capacity Planning: Use historical consumer lag and throughput metrics to determine when to scale partitions or add new broker nodes.
- Operational Troubleshooting: Quickly correlate JVM garbage collection spikes with periodic latency dips in message processing.
Example Prompts
- "I need to monitor under-replicated partitions; can you show me the Prometheus query for the under-replicated partition count and how to alert on it?"
- "Help me set up the Kafka-JMX exporter on our bare-metal cluster; what environment variables do I need to add to the systemd script?"
- "My consumers are slowing down. How can I use the kafka-consumer-lag dashboard to identify if it is a consumer processing issue or a broker throughput bottleneck?"
Tips & Limitations
- Memory Overhead: Be aware that attaching the Java agent adds a small memory overhead to your Kafka process; ensure your heap sizes are configured to account for this.
- Dashboards: The provided Grafana dashboards expect the Prometheus datasource to be named 'Prometheus'. Adjust your data source configurations if your setup uses a different alias.
- Alerting: While the dashboards provide visualization, you must independently configure your Alertmanager rules in Prometheus for PagerDuty or Slack notifications.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-anton-abyzov-sw-kafka-observability": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, code-execution
Related Skills
network-engineer
Cloud network architect for VPC design, service mesh, zero-trust networking, load balancers, and CDN optimization. Use for network troubleshooting or connectivity issues.
jira-multi-project-mapper
Expert in mapping SpecWeave specs to multiple JIRA projects with intelligent project detection and cross-project coordination. Use when syncing to multiple JIRA projects (project-per-team, component-based), or managing bidirectional sync across team boundaries.
helm-chart-scaffolding
Design, organize, and manage Helm charts for templating and packaging Kubernetes applications with reusable configurations. Use when creating Helm charts, packaging Kubernetes applications, or implementing templated deployments.
performance-optimization
React Native performance with Hermes V1, FlashList, expo-image v2, concurrent rendering. Use for slow app, memory leaks, or FPS issues.
release-strategy-advisor
Release strategy advisor - detects brownfield patterns (tags, CI/CD, changelogs), recommends versioning strategy based on architecture. Creates release-strategy.md.