celeborn/docs
onebox-li 0c869ac9a0
[CELEBORN-642] Improve metrics and update grafana
### What changes were proposed in this pull request?
Change in grafana

(ALL)
add:
JVMCPUTime
LastMinuteSystemLoad
AvailableProcessors
(For Master)
add:
LostWorkers
IsActiveMaster
PartitionSize
(For Worker)
add:
PushDataFailCount -> WriteDataFailCount
ReplicateDataFailCount
ReplicateDataWriteFailCount
ReplicateDataCreateConnectionFailCount
ReplicateDataConnectionExceptionCount
ReplicateDataTimeoutCount
SortedFileSize
PushDataHandshakeFailCount
RegionStartFailCount
RegionFinishFailCount
MasterPushDataHandshakeTime
SlavePushDataHandshakeTime
MasterRegionStartTime
SlaveRegionStartTime
MasterRegionFinishTime
SlaveRegionFinishTime
PotentialConsumeSpeed
UserProduceSpeed
WorkerConsumeSpeed
DeviceOSFreeBytes
DeviceCelebornFreeBytes
push usedHeapMemory/usedDirectMemory
fetch usedHeapMemory/usedDirectMemory
replicate usedHeapMemory/usedDirectMemory
remove:
dup ReserveSlotsTime

Change dashboard layout.

Fix support for multiple labels.

Modify some metrics docs.

### Why are the changes needed?
For better use of metrics.

### Does this PR introduce _any_ user-facing change?
Below metrics change name, extract some value to the label.
DeviceOSFreeCapacity(B) -> DeviceOSFreeBytes
DeviceOSTotalCapacity(B) -> DeviceOSTotalBytes
DeviceCelebornFreeCapacity(B) -> DeviceCelebornFreeBytes
DeviceCelebornTotalCapacity(B) -> DeviceCelebornTotalBytes
push usedHeapMemory/usedDirectMemory
fetch usedHeapMemory/usedDirectMemory
replicate usedHeapMemory/usedDirectMemory

### How was this patch tested?
Cluster test.

Closes #1557 from onebox-li/improve-metrics.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-08 18:10:06 +08:00
..
assets/logo [CELEBORN-499] Move version specific resource to main repo (#1429) 2023-04-14 16:20:51 +08:00
configuration [CELEBORN-610][FLINK] Eliminate pluginconf and merge its content to CelebornConf 2023-06-05 14:08:53 +08:00
celeborn_ratis_shell.md [CELEBORN-623][FOLLUPUP] Refine doc about use ratis shell with RSS cluster 2023-06-02 22:09:05 +08:00
cluster_planning.md [CELEBORN-570] Update docs about monitor and deployment. (#1478) 2023-05-08 17:07:42 +08:00
deploy_on_k8s.md [CELEBORN-450][HELM] Configurable volumes in the values.yaml (#1508) 2023-05-29 13:48:23 +08:00
deploy.md [CELEBORN-570] Update docs about monitor and deployment. (#1478) 2023-05-08 17:07:42 +08:00
migration.md [CELEBORN-590] Remove hadoop prefix of WORKER_WORKING_DIR (#1494) 2023-05-17 17:57:27 +08:00
monitoring.md [CELEBORN-642] Improve metrics and update grafana 2023-06-08 18:10:06 +08:00
README.md [CELEBORN-499] Move version specific resource to main repo (#1429) 2023-04-14 16:20:51 +08:00
upgrade.md [CELEBORN-499] Move version specific resource to main repo (#1429) 2023-04-14 16:20:51 +08:00

hide license
navigation
toc
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Apache Celeborn (Incubating)