[CELEBORN-2083] For WorkerStatusTracker, log error for recordWorkerFailure
### What changes were proposed in this pull request?
For WorkerStatusTracker, log error for recordWorkerFailure to separate with status change from application heartbeat response.
### Why are the changes needed?
Currently, in `WorkerStatusTracker`, it logs warning for two cases:
1. status change from application heartbeat response
ae40222351/client/src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala (L213-L214)
2. `recordWorkerFailure ` on some failures, likes `connectFailedWorkers`.
In our use case, the celeborn cluster is very large and the worker status change frequently, so the log for case 1 is very noisy.
I think that:
1. for case2, it is more critical, should use error level
2. for case1, it might be normal for large celeborn cluster, warning level is fine.
With separated log levels, we can mute the noisy status change from application heartbeat response by setting the log level for `WorkerStatusTracker` to error.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Code review.
Closes #3392 from turboFei/log_level_worker_status.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
This commit is contained in:
parent
ae40222351
commit
7ab6268e38
@ -124,7 +124,7 @@ class WorkerStatusTracker(
|
||||
val failedWorkersMsg = failedWorkers.asScala.map { case (worker, (status, time)) =>
|
||||
s"${worker.readableAddress()} ${status.name()} ${Utils.formatTimestamp(time)}"
|
||||
}.mkString("\n")
|
||||
logWarning(
|
||||
logError(
|
||||
s"""
|
||||
|Reporting failed workers:
|
||||
|$failedWorkersMsg$currentFailedWorkers""".stripMargin)
|
||||
|
||||
Loading…
Reference in New Issue
Block a user