### _Why are the changes needed?_ This PR adds a basic Grafana Dashboard template, also updates the metrics docs to guide users to use Prometheus and Grafana to monitor the Kyuubi server. The Grafana Dashboard template is exported from the Grafana OSS v11.4.0 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate <img width="1484" alt="image" src="https://github.com/user-attachments/assets/417b35fa-cd12-4e51-b73f-2955282aa187" /> - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5147 from zhaohehuhu/Improvement-0809. Closes #5834 f6fc2d71e [Cheng Pan] fix style 465f0546a [Cheng Pan] update dashboard 3fa2d237e [hezhao2] add status chart 4b2bd3dbc [hezhao2] add status chart 185f2cccf [hezhao2] make it compatible with kyuubi 1.8 457085be5 [hezhao2] add REAMDE.md to guide users 45e3ba3e5 [hezhao2] add docker file build a grafana image and load dashboards available dbc22108b [hezhao2] Add Grafana dashboard template Lead-authored-by: hezhao2 <hezhao2@cisco.com> Co-authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
31 KiB
Monitoring Kyuubi - Server Metrics
Kyuubi has a configurable metrics system based on the Dropwizard Metrics Library.
This allows users to report Kyuubi metrics to a variety of kyuubi.metrics.reporters.
The metrics provide instrumentation for specific activities and Kyuubi server.
Configurations
The metrics system is configured via $KYUUBI_HOME/conf/kyuubi-defaults.conf.
| Key | Default | Meaning | Type | Since |
|---|---|---|---|---|
kyuubi.metrics.enabled |
true |
Set to true to enable kyuubi metrics system |
boolean |
1.2.0 |
kyuubi.metrics.reporters |
JSON |
A comma-separated list for all metrics reporters
|
seq |
1.2.0 |
kyuubi.metrics.console.interval |
PT5S |
How often should report metrics to console |
duration |
1.2.0 |
kyuubi.metrics.json.interval |
PT5S |
How often should report metrics to JSON file |
duration |
1.2.0 |
kyuubi.metrics.json.location |
metrics |
Where the JSON metrics file located |
string |
1.2.0 |
kyuubi.metrics.prometheus.path |
/metrics |
URI context path of prometheus metrics HTTP server |
string |
1.2.0 |
kyuubi.metrics.prometheus.port |
10019 |
Prometheus metrics HTTP server port |
int |
1.2.0 |
kyuubi.metrics.slf4j.interval |
PT5S |
How often should report metrics to SLF4J logger |
duration |
1.2.0 |
Metrics
These metrics include:
| Metrics Prefix | Metrics Suffix | Type | Since | Description |
|---|---|---|---|---|
kyuubi.exec.pool.threads.alive |
gauge | 1.2.0 | threads keepAlive in the backend executive thread pool |
|
kyuubi.exec.pool.threads.active |
gauge | 1.2.0 | threads active in the backend executive thread pool |
|
kyuubi.exec.pool.work_queue.size |
gauge | 1.7.0 | work queue size in the backend executive thread pool |
|
kyuubi.connection.total |
counter | 1.2.0 | cumulative connection count |
|
kyuubi.connection.total |
${sessionType} |
counter | 1.7.0 | cumulative connection count with session type ${sessionType} |
kyuubi.connection.opened |
gauge | 1.2.0 | current active connection count |
|
kyuubi.connection.opened |
${user} |
counter | 1.2.0 | current active connections count requested by a ${user} |
kyuubi.connection.opened |
${user}${sessionType} |
counter | 1.7.0 | current active connections count requested by a ${user} with session type ${sessionType} |
kyuubi.connection.opened |
${sessionType} |
counter | 1.7.0 | current active connections count with session type ${sessionType} |
kyuubi.connection.failed |
counter | 1.2.0 | cumulative failed connection count |
|
kyuubi.connection.failed |
${user} |
counter | 1.2.0 | cumulative failed connections for a ${user} |
kyuubi.connection.failed |
${sessionType} |
counter | 1.7.0 | cumulative failed connection count with session type ${sessionType} |
kyuubi.operation.total |
counter | 1.5.0 | cumulative opened operation count |
|
kyuubi.operation.total |
${operationType} |
counter | 1.5.0 | cumulative opened count for the operation ${operationType} |
kyuubi.operation.opened |
gauge | 1.5.0 | current opened operation count |
|
kyuubi.operation.opened |
${operationType} |
counter | 1.5.0 | current opened count for the operation ${operationType} |
kyuubi.operation.failed |
${operationType}.${errorType} |
counter | 1.5.0 | cumulative failed count for the operation ${operationType} with a particular ${errorType}, e.g. execute_statement.AnalysisException |
kyuubi.operation.state |
${operationState} |
meter | 1.5.0 | kyuubi operation state rate |
kyuubi.operation.exec_time |
${operationType} |
histogram | 1.7.0 | execution time histogram for the operation ${operationType}, now only ExecuteStatement is enabled. |
kyuubi.engine.total |
counter | 1.2.0 | cumulative created engines |
|
kyuubi.engine.timeout |
counter | 1.2.0 | cumulative timeout engines |
|
kyuubi.engine.failed |
${user} |
counter | 1.2.0 | cumulative explicitly failed engine count for a ${user} |
kyuubi.engine.failed |
${errorType} |
counter | 1.2.0 | cumulative explicitly failed engine count for a particular ${errorType}, e.g. ClassNotFoundException |
kyuubi.backend_service.open_session |
timer | 1.5.0 | kyuubi backend service openSession method execution time and rate |
|
kyuubi.backend_service.close_session |
timer | 1.5.0 | kyuubi backend service closeSession method execution time and rate |
|
kyuubi.backend_service.get_info |
timer | 1.5.0 | kyuubi backend service getInfo method execution time and rate |
|
kyuubi.backend_service.execute_statement |
timer | 1.5.0 | kyuubi backend service executeStatement method execution time and rate |
|
kyuubi.backend_service.get_type_info |
timer | 1.5.0 | kyuubi backend service getTypeInfo method execution time and rate |
|
kyuubi.backend_service.get_catalogs |
timer | 1.5.0 | kyuubi backend service getCatalogs method execution time and rate |
|
kyuubi.backend_service.get_schemas |
timer | 1.5.0 | kyuubi backend service getSchemas method execution time and rate |
|
kyuubi.backend_service.get_tables |
timer | 1.5.0 | kyuubi backend service getTables method execution time and rate |
|
kyuubi.backend_service.get_table_types |
timer | 1.5.0 | kyuubi backend service getTableTypes method execution time and rate |
|
kyuubi.backend_service.get_columns |
timer | 1.5.0 | kyuubi backend service getColumns method execution time and rate |
|
kyuubi.backend_service.get_functions |
timer | 1.5.0 | kyuubi backend service getFunctions method execution time and rate |
|
kyuubi.backend_service.get_operation_status |
timer | 1.5.0 | kyuubi backend service getOperationStatus method execution time and rate |
|
kyuubi.backend_service.cancel_operation |
timer | 1.5.0 | kyuubi backend service cancelOperation method execution time and rate |
|
kyuubi.backend_service.close_operation |
timer | 1.5.0 | kyuubi backend service closeOperation method execution time and rate |
|
kyuubi.backend_service.get_result_set_metadata |
timer | 1.5.0 | kyuubi backend service getResultSetMetadata method execution time and rate |
|
kyuubi.backend_service.fetch_results |
timer | 1.5.0 | kyuubi backend service fetchResults method execution time and rate |
|
kyuubi.backend_service.fetch_log_rows_rate |
meter | 1.5.0 | kyuubi backend service fetchResults method that fetch log rows rate |
|
kyuubi.backend_service.fetch_result_rows_rate |
meter | 1.5.0 | kyuubi backend service fetchResults method that fetch result rows rate |
|
kyuubi.backend_service.get_primary_keys |
meter | 1.6.0 | kyuubi backend service get_primary_keys method execution time and rate |
|
kyuubi.backend_service.get_cross_reference |
meter | 1.6.0 | kyuubi backend service get_cross_reference method execution time and rate |
|
kyuubi.operation.state |
${operationType}.${state} |
meter | 1.6.0 | The ${operationType} with a particular ${state} rate, e.g. BatchJobSubmission.pending, BatchJobSubmission.finished. Note that, the terminal states are cumulative, but the intermediate ones are not. |
kyuubi.metadata.request.opened |
counter | 1.6.1 | current opened count for the metadata requests |
|
kyuubi.metadata.request.total |
meter | 1.6.0 | metadata requests time and rate |
|
kyuubi.metadata.request.failed |
meter | 1.6.0 | metadata requests failure time and rate |
|
kyuubi.metadata.request.retrying |
meter | 1.6.0 | retrying metadata requests time and rate, it is not cumulative |
|
kyuubi.operartion.batch_pending_max_elapse |
gauge | 1.10.1 | the batch pending max elapsed time on current kyuubi instance |
Before v1.5.0, if you use these metrics:
kyuubi.statement.totalkyuubi.statement.openedkyuubi.statement.failed.${errorType}
Since v1.5.0, you can use the following metrics to replace:
kyuubi.operation.total.ExecuteStatementkyuubi.operation.opened.ExecuteStatementkyuubi.operation.failed.ExecuteStatement.${errorType}
Grafana and Prometheus
Grafana is a popular open and composable observability platform. Kyuubi provides
a Grafana Dashboard template at <KYUUBI_HOME>/grafana/dashboard-template.json to help users to monitor
the Kyuubi server.
To use the provided Grafana Dashboard, Prometheus must be used to collect Kyuubi server's metrics.
By default, Kyuubi server exposes Prometheus metrics at http://<host>:10019/metrics, you can also modify
the relative configurations in kyuubi-defaults.conf.
kyuubi.metrics.enabled true
kyuubi.metrics.reporters PROMETHEUS
kyuubi.metrics.prometheus.port 10019
kyuubi.metrics.prometheus.path /metrics
To verify Prometheus metrics endpoint, run curl http://<host>:10019/metrics, and the output should look like
# HELP kyuubi_buffer_pool_mapped_count Generated from Dropwizard metric import (metric=kyuubi.buffer_pool.mapped.count, type=com.codahale.metrics.jvm.JmxAttributeGauge)
# TYPE kyuubi_buffer_pool_mapped_count gauge
kyuubi_buffer_pool_mapped_count 0.0
# HELP kyuubi_memory_usage_pools_PS_Eden_Space_max Generated from Dropwizard metric import (metric=kyuubi.memory_usage.pools.PS-Eden-Space.max, type=com.codahale.metrics.jvm.MemoryUsageGaugeSet$$Lambda$231/207471778)
# TYPE kyuubi_memory_usage_pools_PS_Eden_Space_max gauge
kyuubi_memory_usage_pools_PS_Eden_Space_max 2.064646144E9
# HELP kyuubi_gc_PS_MarkSweep_time Generated from Dropwizard metric import (metric=kyuubi.gc.PS-MarkSweep.time, type=com.codahale.metrics.jvm.GarbageCollectorMetricSet$$Lambda$218/811207775)
# TYPE kyuubi_gc_PS_MarkSweep_time gauge
kyuubi_gc_PS_MarkSweep_time 831.0
...
Set Prometheus's scraper to target the Kyuubi server cluster endpoints, for example,
cat > /etc/prometheus/prometheus.yml <<EOF
global:
scrape_interval: 10s
scrape_configs:
- job_name: "kyuubi-server"
scheme: "http"
metrics_path: "/metrics"
static_configs:
- targets:
- "kyuubi-server-1:10019"
- "kyuubi-server-2:10019"
EOF
Grafana has built-in support for Prometheus, add the Prometheus data source, and then import the
<KYUUBI_HOME>/grafana/dashboard-template.json into Grafana and customize.
If you have good ideas to improve the dashboard, please don't hesitate to reach out to us by opening
GitHub Issues/PRs
or sending an email to dev@kyuubi.apache.org.