As a developer or administrator, you can view metrics in OpenShift Streams for Apache Kafka to visualize the performance and data usage for Kafka instances and topics that you have access to. You can view metrics directly in the Streams for Apache Kafka web console, or use the metrics API endpoint provided by Streams for Apache Kafka to import the data into your own metrics monitoring tool, such as Prometheus.
OpenShift Streams for Apache Kafka supports the following metrics for Kafka instances and topics. In the Streams for Apache Kafka web console, the Dashboard page of a Kafka instance displays a subset of these metrics. To learn more about the limits associated with both trial and production Kafka instance types, see Red Hat OpenShift Streams for Apache Kafka Service Limits.
- Cluster metrics
-
kafka_namespace:haproxy_server_bytes_in_total:rate5m
-
Number of incoming bytes per second for the cluster in the last five minutes. This ingress metric represents all the data that producers are sending to topics in the cluster.
The Kafka instance type determines the maximum incoming byte rate.
kafka_namespace:haproxy_server_bytes_out_total:rate5m
-
Number of outgoing bytes per second for the cluster in the last five minutes. This egress metric represents all the data that consumers are receiving from topics in the cluster.
The Kafka instance type determines the maximum outgoing byte rate.
kafka_namespace:kafka_server_socket_server_metrics_connection_count:sum
-
Number of current client connections to the cluster. Kafka clients use persistent connections to interact with brokers in the cluster. For example, a consumer holds a connection to each broker it is receiving data from and a connection to its group coordinator.
The Kafka instance type determines the maximum number of active connections.
kafka_namespace:kafka_server_socket_server_metrics_connection_creation_rate:sum
-
Number of client connection creations per second for the cluster. Kafka clients use persistent connections to interact with brokers in the cluster. A constant high number of connection creations might indicate a client issue.
The Kafka instance type determines the maximum connection creation rate.
kafka_topic:kafka_topic_partitions:count
-
Number of topics in the cluster. This metric does not include internal Kafka topics, such as
__consumer_offsets
and__transaction_state
. kafka_topic:kafka_topic_partitions:sum
-
Number of partitions across all topics in the cluster. This metric does not include partitions from internal Kafka topics, such as
__consumer_offsets
and__transaction_state
.The Kafka instance type determines the maximum number of partitions.
kas_broker_partition_log_size_bytes_top50
-
Sizes, in bytes, of the fifty largest topic partitions on each broker in the cluster. The total amount of storage being used by all topic partitions on a broker is shown by the
kafka_broker_quota_totalstorageusedbytes
broker metric. The total usage for a broker must stay below thekafka_broker_quota_softlimitbytes
value to avoid throttling of producers. kas_topic_partition_log_size_bytes
-
Size, in bytes, of each topic partition on each broker in the cluster. The total amount of storage being used by all topic partitions on a broker is shown by the
kafka_broker_quota_totalstorageusedbytes
broker metric. The total usage for a broker must stay below thekafka_broker_quota_softlimitbytes
value to avoid throttling of producers.
- Broker metrics
-
kafka_broker_quota_softlimitbytes
-
Maximum amount of storage, in bytes, for this broker before producers are throttled. When this limit is reached, the broker starts throttling producers to prevent them from sending additional data.
The Kafka instance type determines the maximum storage in the broker.
kafka_broker_quota_totalstorageusedbytes
-
Amount of storage, in bytes, that is currently used by partitions in the broker. The storage usage depends on the number and retention configurations of the partitions. This metric must stay below the
kafka_broker_quota_softlimitbytes
value. kafka_controller_kafkacontroller_global_partition_count
-
Number of partitions in the cluster. Only the broker that is the current controller in the cluster reports this metric. Any other brokers report a value of
0
. This count includes partitions from internal Kafka topics, such as__consumer_offsets
and__transaction_state
. This metric is similar to thekafka_topic:kafka_topic_partitions:sum
cluster metric. kafka_controller_kafkacontroller_offline_partitions_count
-
Number of partitions in the cluster that are currently offline. Offline partitions cannot be used by clients for producing or consuming data. Only the broker that is the current controller in the cluster reports this metric. Any other brokers report
0
. kubelet_volume_stats_available_bytes
-
Amount of disk space, in bytes, that is available in the broker.
kubelet_volume_stats_used_bytes
-
Amount of disk space, in bytes, that is currently used in the broker. This metric is similar to the
kafka_broker_quota_totalstorageusedbytes
broker metric.
- Topic metrics
-
kafka_server_brokertopicmetrics_bytes_in_total
-
Number of incoming bytes to topics in the instance.
kafka_server_brokertopicmetrics_bytes_out_total
-
Number of outgoing bytes from topics in the instance.
kafka_server_brokertopicmetrics_messages_in_total
-
Number of messages per second received by one or more topics in the instance.
kafka_topic:kafka_server_brokertopicmetrics_bytes_in_total:rate5m
-
Number of incoming bytes to topics in the instance in the last five minutes.
kafka_topic:kafka_server_brokertopicmetrics_bytes_out_total:rate5m
-
Number of outgoing bytes from topics in the instance in the last five minutes.
kafka_topic:kafka_server_brokertopicmetrics_messages_in_total:rate5m
-
Number of messages per second received by one or more topics in the instance in the last five minutes.
kafka_topic:kafka_log_log_size:sum
-
Log size, in bytes, of each topic and replica across all brokers in the cluster.
After you produce and consume messages in your services using methods such as Kafka scripts, Kcat, or a Quarkus application, you can return to the Kafka instance in the web console and use the Dashboard page to view metrics for the instance and topics. The metrics help you understand the performance and data usage for your Kafka instance and topics.
-
You have access to a running Kafka instance in Streams for Apache Kafka that contains topics. For more information about access management in Streams for Apache Kafka, see Managing account access in OpenShift Streams for Apache Kafka.
-
In the Kafka Instances page of the web console, click the name of the Kafka instance and select the Dashboard tab.
When you create a Kafka instance and add new topics, the Dashboard page is initially empty. After you start producing and consuming messages in your services, you can return to this page to view related metrics. For example, to use Kafka scripts to produce and consume messages, see Configuring and connecting Kafka scripts with OpenShift Streams for Apache Kafka.
Note
|
In some cases, after you start producing and consuming messages, you might need to wait several minutes for the latest metrics to appear. You might also need to wait until your instance and topics contain enough data for metrics to appear. |
As an alternative to viewing metrics for a Kafka instance in the OpenShift Streams for Apache Kafka web console, you can export your metrics to Prometheus and integrate the metrics with your own metrics monitoring platform. Streams for Apache Kafka provides a kafkas/{id}/metrics/federate
API endpoint that you can configure as a scrape target for Prometheus to use to collect and store metrics. You can then access the metrics in the Prometheus expression browser or in a data-graphing tool such as Grafana.
This procedure follows the Configuration File method defined by Prometheus for integrating third-party metrics. If you use the Prometheus Operator in your monitoring environment, you can also follow the Additional Scrape Configuration method.
-
You have access to a running Kafka instance that contains topics in Streams for Apache Kafka. For more information about access management in Streams for Apache Kafka, see Managing account access in OpenShift Streams for Apache Kafka.
-
You have the ID and the SASL/OAUTHBEARER token endpoint for the Kafka instance. To locate the Kafka instance ID and the token endpoint, select your Kafka instance in the Streams for Apache Kafka web console, select the options menu (three vertical dots) and click:
-
Details to locate the ID which is the ID of the Kafka instance.
-
Connection tab to locate the SASL/OAUTHBEARER Token endpoint URL.
-
-
You have the generated credentials for your service account that has access to the Kafka instance. To reset the credentials, use the Service Accounts page in the Application Services section of the Red Hat Hybrid Cloud Console.
-
You have installed a Prometheus instance in your monitoring environment. For installation instructions, see Getting Started in the Prometheus documentation.
-
In your Prometheus configuration file, add the following information. Replace the variable values with your own Kafka instance and service account information.
The
<kafka_instance_id>
is the ID of the Kafka instance. The<client_id>
and<client_secret>
are the generated credentials for your service account that you copied previously. The<token_url>
is the SASL/OAUTHBEARER token endpoint for the Kafka instance.Required information for Prometheus configuration file- job_name: "kafka-federate" static_configs: - targets: ["api.openshift.com"] scheme: "https" metrics_path: "/api/kafkas_mgmt/v1/kafkas/<kafka_instance_id>/metrics/federate" oauth2: client_id: "<client_id>" client_secret: "<client_secret>" token_url: "<token_url>"
The new scrape target becomes available after the configuration has reloaded.
-
View your collected metrics in the Prometheus expression browser at
http://<host>:<port>/graph
, or integrate your Prometheus data source with a data-graphing tool such as Grafana. For information about Prometheus metrics in Grafana, see Grafana Support for Prometheus in the Grafana documentation.If you use Grafana with your Prometheus instance, you can import the predefined OpenShift Streams for Apache Kafka Grafana dashboard to set up your metrics display. For import instructions, see Importing a dashboard in the Grafana documentation.
When you create a Kafka instance and add new topics, the metrics are initially empty. After you start producing and consuming messages in your services, you can return to your monitoring tool to view related metrics. For example, to use Kafka scripts to produce and consume messages, see Configuring and connecting Kafka scripts with OpenShift Streams for Apache Kafka.
Note
|
In some cases, after you start producing and consuming messages, you might need to wait several minutes for the latest metrics to appear. You might also need to wait until your instance and topics contain enough data for metrics to appear. |
Note
|
If you use the Prometheus Operator in your monitoring environment, you can alternatively create a Example
kafka-federate.yaml file- job_name: "kafka-federate"
static_configs:
- targets: ["api.openshift.com"]
scheme: "https"
metrics_path: "/api/kafkas_mgmt/v1/kafkas/<kafka_instance_id>/metrics/federate"
oauth2:
client_id: "<client_id>"
client_secret: "<client_secret>"
token_url: "<token_url>" Example command to create and apply a Kubernetes secret
Example Prometheus custom resource with new secret
|
-
You have successfully configured metrics monitoring for a Kafka instance in Prometheus.
-
You use the Prometheus Operator in your monitoring environment.
-
You can define alerting rules in Prometheus and can deploy an Alertmanager cluster in Prometheus Operator.
-
Create a
PrometheusRule
custom resource with alerts defined for the capacity of your Kafka instance. -
Apply the
PrometheusRule
to the cluster that you are federating the metrics to.
PrometheusRule
custom resource for a Kafka broker storage limit alertapiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
spec:
groups:
- name: limits
rules:
- alert: KafkaBrokerStorageFillingUp
expr: predict_linear(kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"data-(.+)-kafka-[0-9]+"}[1h], 4 * 3600)
labels:
severity: <SOME_SEVERITY>
annotations:
summary: 'Broker PersistentVolume is filling up.'
description: 'Based on recent sampling, the Broker PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} is expected to fill up within four days.
-
Getting started with the
rhoas
CLI for OpenShift Streams for Apache Kafka -
Getting Started in the Prometheus documentation
-
Prometheus Data Source in the Grafana documentation