[KYUUBI #5834] Add Grafana dashboard template

### _Why are the changes needed?_

This PR adds a basic Grafana Dashboard template, also updates the metrics docs to guide users to use Prometheus and Grafana to monitor the Kyuubi server.

The Grafana Dashboard template is exported from the Grafana OSS v11.4.0

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

<img width="1484" alt="image" src="https://github.com/user-attachments/assets/417b35fa-cd12-4e51-b73f-2955282aa187" />

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5147 from zhaohehuhu/Improvement-0809.

Closes #5834

f6fc2d71e [Cheng Pan] fix style
465f0546a [Cheng Pan] update dashboard
3fa2d237e [hezhao2] add status chart
4b2bd3dbc [hezhao2] add status chart
185f2cccf [hezhao2] make it compatible with kyuubi 1.8
457085be5 [hezhao2] add REAMDE.md to guide users
45e3ba3e5 [hezhao2] add docker file build a grafana image and load dashboards available
dbc22108b [hezhao2] Add Grafana dashboard template

Lead-authored-by: hezhao2 <hezhao2@cisco.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
This commit is contained in:
hezhao2 2024-12-24 10:30:50 +08:00 committed by Cheng Pan
parent 1d1e8a0a3b
commit 7e8275b7b4
No known key found for this signature in database
GPG Key ID: 8001952629BCC75D
4 changed files with 2349 additions and 2 deletions

View File

@ -28,5 +28,5 @@ SPARK_VERSION=3.4.3
SPARK_BINARY_VERSION=3.4
SPARK_HADOOP_VERSION=3.3.4
ZOOKEEPER_VERSION=3.6.3
PROMETHEUS_VERSION=2.45.2
GRAFANA_VERSION=10.0.10
PROMETHEUS_VERSION=2.53.3
GRAFANA_VERSION=11.4.0

View File

@ -101,3 +101,61 @@ Since v1.5.0, you can use the following metrics to replace:
- `kyuubi.operation.total.ExecuteStatement`
- `kyuubi.operation.opened.ExecuteStatement`
- `kyuubi.operation.failed.ExecuteStatement.${errorType}`
## Grafana and Prometheus
[Grafana](https://grafana.com/) is a popular open and composable observability platform. Kyuubi provides
a Grafana Dashboard template at `<KYUUBI_HOME>/grafana/dashboard-template.json` to help users to monitor
the Kyuubi server.
To use the provided Grafana Dashboard, [Prometheus](https://prometheus.io/) must be used to collect Kyuubi
server's metrics.
By default, Kyuubi server exposes Prometheus metrics at `http://<host>:10019/metrics`, you can also modify
the relative configurations in `kyuubi-defaults.conf`.
```
kyuubi.metrics.enabled true
kyuubi.metrics.reporters PROMETHEUS
kyuubi.metrics.prometheus.port 10019
kyuubi.metrics.prometheus.path /metrics
```
To verify Prometheus metrics endpoint, run `curl http://<host>:10019/metrics`, and the output should look like
```
# HELP kyuubi_buffer_pool_mapped_count Generated from Dropwizard metric import (metric=kyuubi.buffer_pool.mapped.count, type=com.codahale.metrics.jvm.JmxAttributeGauge)
# TYPE kyuubi_buffer_pool_mapped_count gauge
kyuubi_buffer_pool_mapped_count 0.0
# HELP kyuubi_memory_usage_pools_PS_Eden_Space_max Generated from Dropwizard metric import (metric=kyuubi.memory_usage.pools.PS-Eden-Space.max, type=com.codahale.metrics.jvm.MemoryUsageGaugeSet$$Lambda$231/207471778)
# TYPE kyuubi_memory_usage_pools_PS_Eden_Space_max gauge
kyuubi_memory_usage_pools_PS_Eden_Space_max 2.064646144E9
# HELP kyuubi_gc_PS_MarkSweep_time Generated from Dropwizard metric import (metric=kyuubi.gc.PS-MarkSweep.time, type=com.codahale.metrics.jvm.GarbageCollectorMetricSet$$Lambda$218/811207775)
# TYPE kyuubi_gc_PS_MarkSweep_time gauge
kyuubi_gc_PS_MarkSweep_time 831.0
...
```
Set Prometheus's scraper to target the Kyuubi server cluster endpoints, for example,
```
cat > /etc/prometheus/prometheus.yml <<EOF
global:
scrape_interval: 10s
scrape_configs:
- job_name: "kyuubi-server"
scheme: "http"
metrics_path: "/metrics"
static_configs:
- targets:
- "kyuubi-server-1:10019"
- "kyuubi-server-2:10019"
EOF
```
Grafana has built-in support for Prometheus, add the Prometheus data source, and then import the
`<KYUUBI_HOME>/grafana/dashboard-template.json` into Grafana and customize.
If you have good ideas to improve the dashboard, please don't hesitate to reach out to us by opening
GitHub [Issues](https://github.com/apache/kyuubi/issues)/[PRs](https://github.com/apache/kyuubi/pulls)
or sending an email to `dev@kyuubi.apache.org`.

31
grafana/REAMDE.md Normal file
View File

@ -0,0 +1,31 @@
# Kyuubi Grafana Dashboard
[Grafana](https://grafana.com/) is a popular open and composable observability platform. Kyuubi provides
a Grafana Dashboard template `dashboard-template.json` to help users to monitor the Kyuubi server.
## For Users
By default, Kyuubi server enables metrics system and exposes Prometheus endpoints at `http://<host>:10019/metrics`,
to use the Kyuubi Grafana Dashboard, you are supposed to have an available Prometheus and Grafana service, then
configure Prometheus to scrape Kyuubi metrics, add the Prometheus data source into Grafana, and then import the
`dashboard-template.json` into Grafana and customize. For more details, please read the
[Kyuubi Docs](https://kyuubi.readthedocs.io/en/master/monitor/metrics.html#grafana-and-prometheus)
## For Developers
If you have good ideas to improve the dashboard, please don't hesitate to reach out to us by opening
GitHub [Issues](https://github.com/apache/kyuubi/issues)/[PRs](https://github.com/apache/kyuubi/pulls)
or sending an email to `dev@kyuubi.apache.org`.
### Export Grafana Dashboard template
Depends on your Grafana version, the exporting steps might be a little different.
Use Grafana 11.4 as an example, after modifying the dashboard, save your changes and click the "Share" button
on the top-right corner, then choose the "Export" tab and enable the "Export for sharing externally", finally,
click the "View JSON" button and update the `dashboard-template.json` with that JSON content.
We encourage the developers to use a similar version of Grafana to the existing `dashboard-template.json`,
and focus on one topic in each PR, to avoid introducing unnecessary and huge diff of `dashboard-template.json`.
Additionally, to make the reviewers easy to understand your changes, don't forget to attach the current and
updated dashboard screenshots in your PR description.

File diff suppressed because it is too large Load Diff