kyuubi/docs/monitor/metrics.md
hezhao2 7e8275b7b4
[KYUUBI #5834] Add Grafana dashboard template
### _Why are the changes needed?_

This PR adds a basic Grafana Dashboard template, also updates the metrics docs to guide users to use Prometheus and Grafana to monitor the Kyuubi server.

The Grafana Dashboard template is exported from the Grafana OSS v11.4.0

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

<img width="1484" alt="image" src="https://github.com/user-attachments/assets/417b35fa-cd12-4e51-b73f-2955282aa187" />

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5147 from zhaohehuhu/Improvement-0809.

Closes #5834

f6fc2d71e [Cheng Pan] fix style
465f0546a [Cheng Pan] update dashboard
3fa2d237e [hezhao2] add status chart
4b2bd3dbc [hezhao2] add status chart
185f2cccf [hezhao2] make it compatible with kyuubi 1.8
457085be5 [hezhao2] add REAMDE.md to guide users
45e3ba3e5 [hezhao2] add docker file build a grafana image and load dashboards available
dbc22108b [hezhao2] Add Grafana dashboard template

Lead-authored-by: hezhao2 <hezhao2@cisco.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-12-24 10:30:50 +08:00

31 KiB

Monitoring Kyuubi - Server Metrics

Kyuubi has a configurable metrics system based on the Dropwizard Metrics Library. This allows users to report Kyuubi metrics to a variety of kyuubi.metrics.reporters. The metrics provide instrumentation for specific activities and Kyuubi server.

Configurations

The metrics system is configured via $KYUUBI_HOME/conf/kyuubi-defaults.conf.

Key Default Meaning Type Since
kyuubi.metrics.enabled
true
Set to true to enable kyuubi metrics system
boolean
1.2.0
kyuubi.metrics.reporters
JSON
A comma-separated list for all metrics reporters
  • CONSOLE - ConsoleReporter which outputs measurements to CONSOLE periodically.
  • JMX - JmxReporter which listens for new metrics and exposes them as MBeans.
  • JSON - JsonReporter which outputs measurements to json file periodically.
  • PROMETHEUS - PrometheusReporter which exposes metrics in Prometheus format.
  • SLF4J - Slf4jReporter which outputs measurements to system log periodically.
seq
1.2.0
kyuubi.metrics.console.interval
PT5S
How often should report metrics to console
duration
1.2.0
kyuubi.metrics.json.interval
PT5S
How often should report metrics to JSON file
duration
1.2.0
kyuubi.metrics.json.location
metrics
Where the JSON metrics file located
string
1.2.0
kyuubi.metrics.prometheus.path
/metrics
URI context path of prometheus metrics HTTP server
string
1.2.0
kyuubi.metrics.prometheus.port
10019
Prometheus metrics HTTP server port
int
1.2.0
kyuubi.metrics.slf4j.interval
PT5S
How often should report metrics to SLF4J logger
duration
1.2.0

Metrics

These metrics include:

Metrics Prefix Metrics Suffix Type Since Description
kyuubi.exec.pool.threads.alive gauge 1.2.0
threads keepAlive in the backend executive thread pool
kyuubi.exec.pool.threads.active gauge 1.2.0
threads active in the backend executive thread pool
kyuubi.exec.pool.work_queue.size gauge 1.7.0
work queue size in the backend executive thread pool
kyuubi.connection.total counter 1.2.0
cumulative connection count
kyuubi.connection.total ${sessionType} counter 1.7.0
cumulative connection count with session type ${sessionType}
kyuubi.connection.opened gauge 1.2.0
current active connection count
kyuubi.connection.opened ${user} counter 1.2.0
current active connections count requested by a ${user}
kyuubi.connection.opened ${user}
${sessionType}
counter 1.7.0
current active connections count requested by a ${user} with session type ${sessionType}
kyuubi.connection.opened ${sessionType} counter 1.7.0
current active connections count with session type ${sessionType}
kyuubi.connection.failed counter 1.2.0
cumulative failed connection count
kyuubi.connection.failed ${user} counter 1.2.0
cumulative failed connections for a ${user}
kyuubi.connection.failed ${sessionType} counter 1.7.0
cumulative failed connection count with session type ${sessionType}
kyuubi.operation.total counter 1.5.0
cumulative opened operation count
kyuubi.operation.total ${operationType} counter 1.5.0
cumulative opened count for the operation ${operationType}
kyuubi.operation.opened gauge 1.5.0
current opened operation count
kyuubi.operation.opened ${operationType} counter 1.5.0
current opened count for the operation ${operationType}
kyuubi.operation.failed ${operationType}
.${errorType}
counter 1.5.0
cumulative failed count for the operation ${operationType} with a particular ${errorType}, e.g. execute_statement.AnalysisException
kyuubi.operation.state ${operationState} meter 1.5.0
kyuubi operation state rate
kyuubi.operation.exec_time ${operationType} histogram 1.7.0
execution time histogram for the operation ${operationType}, now only ExecuteStatement is enabled.
kyuubi.engine.total counter 1.2.0
cumulative created engines
kyuubi.engine.timeout counter 1.2.0
cumulative timeout engines
kyuubi.engine.failed ${user} counter 1.2.0
cumulative explicitly failed engine count for a ${user}
kyuubi.engine.failed ${errorType} counter 1.2.0
cumulative explicitly failed engine count for a particular ${errorType}, e.g. ClassNotFoundException
kyuubi.backend_service.open_session timer 1.5.0
kyuubi backend service openSession method execution time and rate
kyuubi.backend_service.close_session timer 1.5.0
kyuubi backend service closeSession method execution time and rate
kyuubi.backend_service.get_info timer 1.5.0
kyuubi backend service getInfo method execution time and rate
kyuubi.backend_service.execute_statement timer 1.5.0
kyuubi backend service executeStatement method execution time and rate
kyuubi.backend_service.get_type_info timer 1.5.0
kyuubi backend service getTypeInfo method execution time and rate
kyuubi.backend_service.get_catalogs timer 1.5.0
kyuubi backend service getCatalogs method execution time and rate
kyuubi.backend_service.get_schemas timer 1.5.0
kyuubi backend service getSchemas method execution time and rate
kyuubi.backend_service.get_tables timer 1.5.0
kyuubi backend service getTables method execution time and rate
kyuubi.backend_service.get_table_types timer 1.5.0
kyuubi backend service getTableTypes method execution time and rate
kyuubi.backend_service.get_columns timer 1.5.0
kyuubi backend service getColumns method execution time and rate
kyuubi.backend_service.get_functions timer 1.5.0
kyuubi backend service getFunctions method execution time and rate
kyuubi.backend_service.get_operation_status timer 1.5.0
kyuubi backend service getOperationStatus method execution time and rate
kyuubi.backend_service.cancel_operation timer 1.5.0
kyuubi backend service cancelOperation method execution time and rate
kyuubi.backend_service.close_operation timer 1.5.0
kyuubi backend service closeOperation method execution time and rate
kyuubi.backend_service.get_result_set_metadata timer 1.5.0
kyuubi backend service getResultSetMetadata method execution time and rate
kyuubi.backend_service.fetch_results timer 1.5.0
kyuubi backend service fetchResults method execution time and rate
kyuubi.backend_service.fetch_log_rows_rate meter 1.5.0
kyuubi backend service fetchResults method that fetch log rows rate
kyuubi.backend_service.fetch_result_rows_rate meter 1.5.0
kyuubi backend service fetchResults method that fetch result rows rate
kyuubi.backend_service.get_primary_keys meter 1.6.0
kyuubi backend service get_primary_keys method execution time and rate
kyuubi.backend_service.get_cross_reference meter 1.6.0
kyuubi backend service get_cross_reference method execution time and rate
kyuubi.operation.state ${operationType}
.${state}
meter 1.6.0
The ${operationType} with a particular ${state} rate, e.g. BatchJobSubmission.pending, BatchJobSubmission.finished. Note that, the terminal states are cumulative, but the intermediate ones are not.
kyuubi.metadata.request.opened counter 1.6.1
current opened count for the metadata requests
kyuubi.metadata.request.total meter 1.6.0
metadata requests time and rate
kyuubi.metadata.request.failed meter 1.6.0
metadata requests failure time and rate
kyuubi.metadata.request.retrying meter 1.6.0
retrying metadata requests time and rate, it is not cumulative
kyuubi.operartion.batch_pending_max_elapse gauge 1.10.1
the batch pending max elapsed time on current kyuubi instance

Before v1.5.0, if you use these metrics:

  • kyuubi.statement.total
  • kyuubi.statement.opened
  • kyuubi.statement.failed.${errorType}

Since v1.5.0, you can use the following metrics to replace:

  • kyuubi.operation.total.ExecuteStatement
  • kyuubi.operation.opened.ExecuteStatement
  • kyuubi.operation.failed.ExecuteStatement.${errorType}

Grafana and Prometheus

Grafana is a popular open and composable observability platform. Kyuubi provides a Grafana Dashboard template at <KYUUBI_HOME>/grafana/dashboard-template.json to help users to monitor the Kyuubi server.

To use the provided Grafana Dashboard, Prometheus must be used to collect Kyuubi server's metrics.

By default, Kyuubi server exposes Prometheus metrics at http://<host>:10019/metrics, you can also modify the relative configurations in kyuubi-defaults.conf.

kyuubi.metrics.enabled          true
kyuubi.metrics.reporters        PROMETHEUS
kyuubi.metrics.prometheus.port  10019
kyuubi.metrics.prometheus.path  /metrics

To verify Prometheus metrics endpoint, run curl http://<host>:10019/metrics, and the output should look like

# HELP kyuubi_buffer_pool_mapped_count Generated from Dropwizard metric import (metric=kyuubi.buffer_pool.mapped.count, type=com.codahale.metrics.jvm.JmxAttributeGauge)
# TYPE kyuubi_buffer_pool_mapped_count gauge
kyuubi_buffer_pool_mapped_count 0.0
# HELP kyuubi_memory_usage_pools_PS_Eden_Space_max Generated from Dropwizard metric import (metric=kyuubi.memory_usage.pools.PS-Eden-Space.max, type=com.codahale.metrics.jvm.MemoryUsageGaugeSet$$Lambda$231/207471778)
# TYPE kyuubi_memory_usage_pools_PS_Eden_Space_max gauge
kyuubi_memory_usage_pools_PS_Eden_Space_max 2.064646144E9
# HELP kyuubi_gc_PS_MarkSweep_time Generated from Dropwizard metric import (metric=kyuubi.gc.PS-MarkSweep.time, type=com.codahale.metrics.jvm.GarbageCollectorMetricSet$$Lambda$218/811207775)
# TYPE kyuubi_gc_PS_MarkSweep_time gauge
kyuubi_gc_PS_MarkSweep_time 831.0
...

Set Prometheus's scraper to target the Kyuubi server cluster endpoints, for example,

cat > /etc/prometheus/prometheus.yml <<EOF
global:
scrape_interval: 10s
scrape_configs:
  - job_name: "kyuubi-server"
    scheme: "http"
    metrics_path: "/metrics"
    static_configs:
      - targets:
          - "kyuubi-server-1:10019"
          - "kyuubi-server-2:10019"
EOF

Grafana has built-in support for Prometheus, add the Prometheus data source, and then import the <KYUUBI_HOME>/grafana/dashboard-template.json into Grafana and customize.

If you have good ideas to improve the dashboard, please don't hesitate to reach out to us by opening GitHub Issues/PRs or sending an email to dev@kyuubi.apache.org.