ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified developer tools Safety 4/5

kafka-observability

Kafka monitoring and observability expert for Prometheus, Grafana, and JMX metrics. Use when setting up Kafka monitoring, configuring alerting rules, or building performance dashboards.

Why use this skill?

Master Kafka monitoring with automated JMX exporter setup, pre-built Grafana dashboards, and Prometheus configuration for cluster health, performance tracking, and consumer lag.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/anton-abyzov/sw-kafka-observability
Or

What This Skill Does

The kafka-observability skill acts as a specialized assistant for monitoring Apache Kafka clusters. It provides a structured approach to setting up the JMX Exporter, configuring Prometheus scraping, and deploying production-ready Grafana dashboards. By automating the integration of JVM-level performance data with Kafka-specific throughput metrics, it enables teams to move from reactive troubleshooting to proactive cluster management. The skill encapsulates best-practice configurations for heap, garbage collection, and thread monitoring alongside critical broker-level health metrics like replica lag and controller status.

Installation

To integrate this skill into your environment, use the OpenClaw CLI or interface:

clawhub install openclaw/skills/skills/anton-abyzov/sw-kafka-observability

Once installed, verify the availability of the monitoring component path: plugins/specweave-kafka/monitoring/. For self-hosted deployments, ensure your Kafka service environment variables include the KAFKA_OPTS definition as specified in the plugin documentation to attach the JMX Prometheus agent. If using Kubernetes, map the provided kafka-jmx-exporter.yml config file into your Prometheus-JMX container sidecar.

Use Cases

  • Proactive Cluster Health: Monitor controller elections and under-replicated partitions before they trigger production outages.
  • Performance Tuning: Track request latencies and produce/fetch rates to identify bottlenecks in your broker network.
  • Capacity Planning: Use historical consumer lag and throughput metrics to determine when to scale partitions or add new broker nodes.
  • Operational Troubleshooting: Quickly correlate JVM garbage collection spikes with periodic latency dips in message processing.

Example Prompts

  1. "I need to monitor under-replicated partitions; can you show me the Prometheus query for the under-replicated partition count and how to alert on it?"
  2. "Help me set up the Kafka-JMX exporter on our bare-metal cluster; what environment variables do I need to add to the systemd script?"
  3. "My consumers are slowing down. How can I use the kafka-consumer-lag dashboard to identify if it is a consumer processing issue or a broker throughput bottleneck?"

Tips & Limitations

  • Memory Overhead: Be aware that attaching the Java agent adds a small memory overhead to your Kafka process; ensure your heap sizes are configured to account for this.
  • Dashboards: The provided Grafana dashboards expect the Prometheus datasource to be named 'Prometheus'. Adjust your data source configurations if your setup uses a different alias.
  • Alerting: While the dashboards provide visualization, you must independently configure your Alertmanager rules in Prometheus for PagerDuty or Slack notifications.

Metadata

Stars1054
Views1
Updated2026-02-16
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-anton-abyzov-sw-kafka-observability": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#kafka#prometheus#grafana#observability#devops
Safety Score: 4/5

Flags: file-read, file-write, code-execution