[CELEBORN-2046] Specify extractionDir of AsyncProfilerLoader with celeborn.worker.jvmProfiler.localDir

### What changes were proposed in this pull request?

Specify `extractionDir` of `AsyncProfilerLoader` with `celeborn.worker.jvmProfiler.localDir`.

### Why are the changes needed?

`AsyncProfilerLoader` uses `user.home` directory to store the extracted libraries by default . When `user.home` directory is not initialized, it will cause `AsyncProfilerLoader#load` to fail. `extractionDir` of `AsyncProfilerLoader` could be specified with `celeborn.worker.jvmProfiler.localDir` to avoid failure of loading.

Backport https://github.com/apache/spark/pull/51229.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.

Closes #3345 from SteNicholas/CELEBORN-2046.

Lead-authored-by: SteNicholas <programgeek@163.com>
Co-authored-by: 子懿 <ziyi.jxf@antgroup.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
This commit is contained in:
SteNicholas 2025-06-25 10:15:38 -07:00 committed by Wang, Fei
parent 8ae9737601
commit 3ee3a26220

View File

@ -23,13 +23,14 @@ import one.profiler.{AsyncProfiler, AsyncProfilerLoader}
import org.apache.celeborn.common.CelebornConf
import org.apache.celeborn.common.internal.Logging
import org.apache.celeborn.common.util.Utils
/**
* The JVM profiler provides code profiling of worker based on the the async profiler, a low overhead sampling profiler for Java.
* This allows a worker to capture CPU and memory profiles for worker which can later be analyzed for performance issues.
* The profiler captures Java Flight Recorder (jfr) files for each worker read by tools including Java Mission Control and Intellij.
*
* <p> Note: The profiler writes the jfr files to the worker's working directory in the worker's local file system and the files can grow to be large so it is advisable
* <p>Note: The profiler writes the jfr files to the worker's working directory in the worker's local file system and the files can grow to be large so it is advisable
* that the worker machines have adequate storage.
*
* <p>Note: code copied from Apache Spark.
@ -46,9 +47,15 @@ class JVMProfiler(conf: CelebornConf) extends Logging {
private val startcmd = s"start,$profilerOptions,file=$profilerLocalDir/profile.jfr"
private val stopcmd = s"stop,$profilerOptions,file=$profilerLocalDir/profile.jfr"
private lazy val extractionDir = Utils.createTempDir(profilerLocalDir, "profiler").toPath
val profiler: Option[AsyncProfiler] = {
Option(
if (enableProfiler && AsyncProfilerLoader.isSupported) AsyncProfilerLoader.load() else null)
if (enableProfiler && AsyncProfilerLoader.isSupported) {
logInfo(s"Profiler extraction directory: ${extractionDir.toString}.")
AsyncProfilerLoader.setExtractionDirectory(extractionDir)
AsyncProfilerLoader.load()
} else { null })
}
def start(): Unit = {