celeborn/service
Fei Wang 493e0f10cf [CELEBORN-1317][FOLLOWUP] Fix threadDump UT stuck issue
### What changes were proposed in this pull request?

Try to fix ApiWorkerResourceSuite::threadDump UT stuck issue.
1. Using program way to get thread dump.

Related code copied from apache/spark
https://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/util/Utils.scala
https://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/status/api/v1/api.scala

### Why are the changes needed?
I found that sometimes the UT stuck for threadDump api:
For example: https://github.com/apache/celeborn/actions/runs/8462056188/job/23182806487?pr=2428
<img width="1291" alt="image" src="https://github.com/apache/celeborn/assets/6757692/f39d7bb9-6e31-4ce3-a573-1ff86f335318">

<img width="762" alt="image" src="https://github.com/apache/celeborn/assets/6757692/437592dd-fc9c-404d-a452-834fcf630bd1">

threadDump api UT is new introduced in [CELEBORN-1317](https://issues.apache.org/jira/browse/CELEBORN-1317).

Before there is no UT to cover that, and now it stuck sometimes.

And for getThreadDump, before it leverages processBuilder to get the thread info.

I wonder that the process is stuck because of some unknown reason, so, in this pr, we try to use program way to get thread info.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

UT.

![image](https://github.com/apache/celeborn/assets/6757692/51aaa44e-0523-4b60-b6c8-f4e83c709497)

Closes #2429 from turboFei/thread_dump.

Lead-authored-by: Fei Wang <fwang12@ebay.com>
Co-authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-05-27 15:12:50 +08:00
..
src [CELEBORN-1317][FOLLOWUP] Fix threadDump UT stuck issue 2024-05-27 15:12:50 +08:00
pom.xml [CELEBORN-1317] Refine celeborn http server and support swagger ui 2024-03-27 23:18:18 +08:00