celeborn/worker
onebox-li 0c869ac9a0
[CELEBORN-642] Improve metrics and update grafana
### What changes were proposed in this pull request?
Change in grafana

(ALL)
add:
JVMCPUTime
LastMinuteSystemLoad
AvailableProcessors
(For Master)
add:
LostWorkers
IsActiveMaster
PartitionSize
(For Worker)
add:
PushDataFailCount -> WriteDataFailCount
ReplicateDataFailCount
ReplicateDataWriteFailCount
ReplicateDataCreateConnectionFailCount
ReplicateDataConnectionExceptionCount
ReplicateDataTimeoutCount
SortedFileSize
PushDataHandshakeFailCount
RegionStartFailCount
RegionFinishFailCount
MasterPushDataHandshakeTime
SlavePushDataHandshakeTime
MasterRegionStartTime
SlaveRegionStartTime
MasterRegionFinishTime
SlaveRegionFinishTime
PotentialConsumeSpeed
UserProduceSpeed
WorkerConsumeSpeed
DeviceOSFreeBytes
DeviceCelebornFreeBytes
push usedHeapMemory/usedDirectMemory
fetch usedHeapMemory/usedDirectMemory
replicate usedHeapMemory/usedDirectMemory
remove:
dup ReserveSlotsTime

Change dashboard layout.

Fix support for multiple labels.

Modify some metrics docs.

### Why are the changes needed?
For better use of metrics.

### Does this PR introduce _any_ user-facing change?
Below metrics change name, extract some value to the label.
DeviceOSFreeCapacity(B) -> DeviceOSFreeBytes
DeviceOSTotalCapacity(B) -> DeviceOSTotalBytes
DeviceCelebornFreeCapacity(B) -> DeviceCelebornFreeBytes
DeviceCelebornTotalCapacity(B) -> DeviceCelebornTotalBytes
push usedHeapMemory/usedDirectMemory
fetch usedHeapMemory/usedDirectMemory
replicate usedHeapMemory/usedDirectMemory

### How was this patch tested?
Cluster test.

Closes #1557 from onebox-li/improve-metrics.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-08 18:10:06 +08:00
..
src [CELEBORN-642] Improve metrics and update grafana 2023-06-08 18:10:06 +08:00
pom.xml [CELEBORN-83] Fix various bug when using HDFS as storage. 2022-11-30 19:33:18 +08:00