Commit Graph

  • dfeaef1355
    [MINOR] Add spec link to JavaSerializer Cheng Pan 2025-04-02 14:26:32 +0800
  • 0d923c37bf
    [CELEBORN-1956] Forward GitHub discussion to ASF mailing list Cheng Pan 2025-04-02 14:25:37 +0800
  • 621afaa5d7 [CELEBORN-1949][FOLLOWUP] Fix typo for kind:deploy label Wang, Fei 2025-04-01 23:11:29 -0700
  • 951b626a98 [CELEBORN-1844][CIP-8] introduce tier writer proxy and simplify partition data writer mingji 2025-04-02 13:29:39 +0800
  • 8dbbebc644
    [CELEBORN-1954][HELM] Add a new value image.registry Yi Chen 2025-04-02 11:02:35 +0800
  • 6e5bd2403c
    [CELEBORN-1952][HELM] Define template helpers for master/worker respectively Yi Chen 2025-04-02 11:01:27 +0800
  • 5e12b7d607 [CELEBORN-1921] Broadcast large GetReducerFileGroupResponse to prevent Spark driver network exhausted Wang, Fei 2025-04-01 08:29:21 -0700
  • 1e30f159b9 [CELEBORN-1577][FOLLOWUP] Add UpdateResourceConsumptionTime timer and prevent NPE if metrics not found Wang, Fei 2025-04-01 19:24:23 +0800
  • 5adce2b408 [CELEBORN-1949] Add a labeler github action to triage PRs Wang, Fei 2025-04-01 19:13:34 +0800
  • 193dc6cf8b [CELEBORN-1929] Avoid unnecessary buffer loss to get better buffer reusability Saurabh Dubey 2025-03-30 23:53:48 -0700
  • 99ca4dffe8 [CELEBORN-1918] Add batchOpenStream time to fetch wait time zhengtao 2025-03-30 21:51:53 -0700
  • 3038942233 [CELEBORN-1900][FOLLOWUP] push celeborn docker image Björn Boschman 2025-03-31 11:32:15 +0800
  • d8495e5b65 [CELEBORN-1532][HELM] Read log4j2 and metrics configurations from file Yi Chen 2025-03-31 11:00:38 +0800
  • 56bf87d3c1 [CELEBORN-1543][FOLLOWUP] celeborn-flink-it project should set FLINK_VERSION environment variable for HybridShuffleWordCountTest SteNicholas 2025-03-31 10:51:12 +0800
  • f1c963d0b0 [CELEBORN-1947] Reduce log for CelebornShuffleReader sleeping before inputStream ready sychen 2025-03-28 10:50:12 -0700
  • 3cf1802e78 [CELEBORN-1928][CIP-12] Support HARD_SPLIT in PushMergedData should support handle older worker success response Angerszhuuuu 2025-03-27 15:06:01 +0800
  • 15ea5d3664 [CELEBORN-1930][CIP-12] Support HARD_SPLIT in PushMergedData should handle congestion control NPE issue Xianming Lei 2025-03-26 23:44:04 -0700
  • 5f298f5ce2 [CELEBORN-1190][FOLLOWUP] Use -XepDisableWarningsInGeneratedCode to disable warnings for openapi-client module SteNicholas 2025-03-26 12:04:07 +0800
  • d5645a98d7 [CELEBORN-1900][FOLLOWUP] use github actions to login to docker hub Björn Boschman 2025-03-25 19:58:53 +0800
  • 9bae3fbd5e [CELEBORN-1915][CIP-14] Add reader's ShuffleClient to cppClient HolyLow 2025-03-25 17:54:34 +0800
  • 0a97ca0aa9 [CELEBORN-1577][PHASE2] QuotaManager should support interrupt shuffle Xianming Lei 2025-03-24 22:05:45 +0800
  • 4bacd1f211 [CELEBORN-1856] Support stage-rerun when read partition by chunkOffsets when enable optimize skew partition read wangshengjie3 2025-03-24 22:03:15 +0800
  • 192213dafb [CELEBORN-1900][FOLLOWUP] fixed wrong CI parameter Björn Boschman 2025-03-24 11:27:18 +0800
  • 151fd35676 [CELEBORN-1923] Correct Celeborn available slots calculation logic Angerszhuuuu 2025-03-23 15:27:06 -0700
  • 7b73f59173 [CELEBORN-1678][FOLLOWUP] Update master and worker commands in celeborn_cli.md SteNicholas 2025-03-21 15:50:55 +0800
  • 9e8f9f6b19 [CELEBORN-1900] docker images Björn Boschman 2025-03-19 15:37:20 +0800
  • 7174275533 [CELEBORN-1914] incWriteTime when ShuffleWriter invoke pushGiantRecord TheodoreLx 2025-03-18 15:38:36 +0800
  • 7571e10ad5 [CELEBORN-1894] Allow skipping already read chunks during unreplicated shuffle read retried Saurabh Dubey 2025-03-18 11:37:33 +0800
  • 38f3bdd375 [CELEBORN-1909] Support pre-run static code blocks of TransportMessages to improve performance of protobuf serialization SteNicholas 2025-03-18 11:34:39 +0800
  • d96457909d [CELEBORN-1911] Move multipart-uploader to multipart-uploader/multipart-uploader-s3 for extensibility veli.yang 2025-03-14 22:34:32 +0800
  • a5214e2535 [CELEBORN-1906][CIP-14] Add CelebornInputStream to cppClient HolyLow 2025-03-14 22:31:51 +0800
  • b5fab42604 [CELEBORN-1822] Respond to RegisterShuffle with max epoch PartitionLocation to avoid revive zhengtao 2025-03-14 16:08:05 +0800
  • c1fb94d6e3 [CELEBORN-1910] Remove redundant synchronized of isTerminated in ThreadUtils#sameThreadExecutorService SteNicholas 2025-03-13 16:10:07 +0800
  • 464a3842e3 [CELEBORN-1899] Fix configuration bug in shuffle s3 veli.yang 2025-03-12 10:40:52 +0800
  • 8d6c49aed3 [CELEBORN-1901] updated base docker image tag Björn Boschman 2025-03-11 15:09:57 +0800
  • 05b6ad4a7b [MINOR] Change config versions Sanskar Modi 2025-03-11 07:39:32 +0800
  • 595ab41f5e [CELEBORN-1881][CIP-14] Add WorkerPartitionReader to cppClient HolyLow 2025-03-10 20:47:17 +0800
  • df809159d1 [CELEBORN-1898] SparkOutOfMemoryError compatible with Spark 4.0 and 4.1 Cheng Pan 2025-03-10 15:19:50 +0800
  • 18b268d085 [CELEBORN-1897] Avoid calling toString for too long messages CodingCat 2025-03-10 11:33:13 +0800
  • 3d05c8998f [CELEBORN-1895] Bump log4j2 version to 2.24.3 Wang, Fei 2025-03-10 11:30:52 +0800
  • 6f5ad2dde8 [MINOR] Refine the log for fetch failure and rpc metrics dump Wang, Fei 2025-03-10 10:56:53 +0800
  • 196ad607cd [CELEBORN-1792][FOLLOWUP] Keep resume for a while after resumeByPinnedMemory TheodoreLx 2025-03-05 09:37:59 +0800
  • fa4327e093 [CELEBORN-1885] Fix nullptr exceptions in FetchChunk after worker restart Sanskar Modi 2025-03-04 22:26:38 +0800
  • e85207e2c7 [CELEBORN-1413][FOLLOWUP] Rename celeborn-client-spark-3-4 back to celeborn-client-spark-3 Cheng Pan 2025-03-04 22:25:10 +0800
  • 15e34eca6e [CELEBORN-1890] Bump Spark from 3.5.4 to 3.5.5 SteNicholas 2025-03-04 14:15:04 +0800
  • 3a83ac7693 [CELEBORN-1889] Fix scala 2.13 complie error Chongchen Chen 2025-03-03 15:41:02 +0800
  • 660cf24deb [CELEBORN-1319][FOLLOWUP] Fix IndexOutOfBoundsException when using old celeborn client Wang, Fei 2025-02-28 11:34:14 +0800
  • d90cf0d427 [CELEBORN-1884] Bump rocksdbjni version from 9.5.2 to 9.10.0 SteNicholas 2025-02-28 11:29:42 +0800
  • 44d772df75 [CELEBORN-1882] Support configuring the SSL handshake timeout for SSLHandler Minchu Yang 2025-02-27 15:43:32 -0600
  • cc501928ce [CELEBORN-1875][FOLLOWUP] Support master --show-workers-topology command to show registered workers topology SteNicholas 2025-02-27 10:44:51 +0800
  • a4ce369eef [CELEBORN-1883] Replace HashSet with ConcurrentHashMap.newKeySet for ShuffleFileGroups Aidar Bariev 2025-02-27 10:27:12 +0800
  • ef30c23916 [CELEBORN-1858] Support DfsPartitionReader read partition by chunkOffsets when enable optimize skew partition read wuziyi 2025-02-26 23:15:34 +0800
  • 18655e2869 [CELEBORN-1879] Ignore invalid chunk range generated by splitSkewedPartitionLocations wuziyi 2025-02-26 23:12:45 +0800
  • e10aefd046 [CELEBORN-1857] Support LocalPartitionReader read partition by chunkOffsets when enable optimize skew partition read wangshengjie3 2025-02-26 23:11:35 +0800
  • fc056a3c3a [CELEBORN-1875] Support to get workers topology information with RESTful api Wang, Fei 2025-02-26 16:27:32 +0800
  • 79b49805e8 [CELEBORN-1877] Bump zstd-jni version from 1.5.2-1 to 1.5.7-1 Nicholas Jiang 2025-02-25 11:08:44 +0800
  • 8c04e5e8a0 [CELEBORN-1871][CIP-14] Add NettyRpcEndpointRef to cppClient HolyLow 2025-02-24 18:04:48 +0800
  • e244d01af9 [CELEBORN-1876] Log remote address on RPC exception for TransportRequestHandler Nicholas Jiang 2025-02-24 18:04:05 +0800
  • f09482108f [CELEBORN-1867][FLINK] Fix flink client memory leak of TransportResponseHandler#outstandingRpcs for handling addCredit and notifyRequiredSegment response codenohup 2025-02-21 13:50:54 +0800
  • 27c6605c4a [CELEBORN-1865] Update master endpointRef when master leader is abnormal zhengtao 2025-02-20 10:26:11 +0800
  • d659e06d45 [CELEBORN-1319] Optimize skew partition logic for Reduce Mode to avoid sorting shuffle files wangshengjie 2025-02-19 16:57:44 +0800
  • 7ca69e200f [CELEBORN-1861] Support celeborn.worker.storage.baseDir.diskType option to specify disk type of base directory for worker Nicholas Jiang 2025-02-19 15:54:42 +0800
  • 2097fcdfea [CELEBORN-1870] Fix typos in in 'Developer' documents KenGeng 2025-02-19 14:31:14 +0800
  • f1eec656f4 [CELEBORN-1866][FLINK] Fix CelebornChannelBufferReader request more buffers than needed zuosi.hx 2025-02-19 10:53:03 +0800
  • 5b507aed72 [CELEBORN-1872] Bump Flink from 1.19.1, 1.20.0 to 1.19.2, 1.20.1 Nicholas Jiang 2025-02-19 10:49:44 +0800
  • fc459c0f7d [CELEBORN-1757] Add retry when sending RPC to LifecycleManager zhengtao 2025-02-17 11:27:02 -0800
  • 6a836f9523 [CELEBORN-1859] DfsPartitionReader and LocalPartitionReader should reuse pbStreamHandlers get from BatchOpenStream request wuziyi 2025-02-17 09:46:46 +0800
  • 2dd26936e8 [CELEBORN-1864] Bump Netty version from 4.1.115.Final to 4.1.118.Final Nicholas Jiang 2025-02-15 11:46:28 +0800
  • 113c7eadb7 [CELEBORN-1863][CIP-14] Add TransportClient to cppClient HolyLow 2025-02-15 11:44:23 +0800
  • b5c00ea645 [CELEBORN-1862] Bump Ratis version from 3.1.2 to 3.1.3 madlnu 2025-02-12 17:46:58 +0800
  • c45197c0c1 [CELEBORN-1843] Optimize roundrobin for more balanced disk slot allocation gaoyajun02 2025-02-12 14:30:08 +0800
  • f9526021c7 [CELEBORN-1846] Fix the StreamHandler usage in fetching chunk when task attempt is odd onebox-li 2025-02-12 14:22:33 +0800
  • 1455b6e2f3 [CELEBORN-1860] Remove unused celeborn.<module>.io.enableVerboseMetrics option Nicholas Jiang 2025-02-12 11:42:26 +0800
  • 749a9798ee [CELEBORN-1854] Change receive revive request log level to debug Sanskar Modi 2025-02-11 17:54:00 +0800
  • 9f8a89e61e [CELEBORN-1841] Support custom implementation of EventExecutorChooser to avoid deadlock when calling await in EventLoop thread xinyuwang1 2025-02-10 22:25:55 +0800
  • 6f7647e4b4 [CELEBORN-1847][CIP-8] Introduce local and DFS tier writer mingji 2025-02-10 13:55:17 +0800
  • b9e4bbb5a7
    [MINOR] Change some config version sychen 2025-02-08 17:55:56 +0800
  • 2e4f36f9d4 [CELEBORN-1792][FOLLOWUP] Suppress noisy logs when there is no memory pressure mingji 2025-02-08 09:44:08 +0800
  • e78c9b8ab5 [CELEBORN-1721][FOLLOWUP] Fix the problem of getting partition location in ShuffleClientImpl during soft split xinyuwang1 2025-02-08 09:40:32 +0800
  • 2b0a755870 [CELEBORN-1851] Disable spark ui when run celeborn spark-it binjie yang 2025-02-06 16:22:58 +0800
  • fdf1883f25 [CELEBORN-1850] Setup worker endpoint after initalizing controller Sanskar Modi 2025-02-06 16:08:26 +0800
  • 75b697d815 [CELEBORN-1838] Interrupt spark task should not report fetch failure mingji 2025-01-23 14:46:36 +0800
  • a77a64b89a [CELEBORN-1835][CIP-8] Add tier writer base and memory tier writer mingji 2025-01-23 09:48:14 +0800
  • f28ba6e728 [CELEBORN-1810] Using Operation description instead of ApiResponse description for RESTful APIs Wang, Fei 2025-01-23 09:42:03 +0800
  • 39a40dd2a1 [CELEBORN-1845][CIP-14] Add MessageDispatcher to cppClient HolyLow 2025-01-22 20:07:22 +0800
  • 9131c1e07a [CELEBORN-1792] MemoryManager resume should use pinnedDirectMemory instead of usedDirectMemory Xianming Lei 2025-01-22 14:30:20 +0800
  • 30e46eee28
    [CELEBORN-1842] Bump ap-loader version from 3.0-8 to 3.0-9 SteNicholas 2025-01-21 12:22:00 +0800
  • 6bd0cfe2f9 [CELEBORN-1720][FOLLOWUP] Fix compilation error of CelebornTezReader for ShuffleClient#readPartition SteNicholas 2025-01-20 17:41:37 +0800
  • f2751c2802 [CELEBORN-1829] Replace waitThreadPoll's thread pool with ScheduledExecutorService in Controller zhengtao 2025-01-18 13:00:04 +0800
  • 35a14d2469 [CELEBORN-1836][CIP-14] Add Message to cppClient HolyLow 2025-01-18 12:52:32 +0800
  • ac0d335f40 [CELEBORN-1831] Add ratis commitIndex metrics zhengtao 2025-01-17 10:58:06 +0800
  • ad933815b6 [CELEBORN-1720] Prevent stage re-run if another task attempt is running or successful Wang, Fei 2025-01-16 11:09:44 +0800
  • 45450e793c [CELEBORN-1832] MapPartitionData should create fixed thread pool with registration of ThreadPoolSource SteNicholas 2025-01-15 15:15:34 +0800
  • eb950c82e5 [CELEBORN-1827][CIP-14] Add messageDecoder to cppClient HolyLow 2025-01-10 16:42:31 +0800
  • 893a7449e0 [CELEBORN-1830] Chart statefulset resources key duplicate pengqli 2025-01-10 14:30:55 +0800
  • df2512994d [CELEBORN-1482][CIP-8] Add partition meta handler mingji 2025-01-09 21:30:17 +0800
  • b74e05b603 [CELEBORN-1821][CIP-14] Add controlMessages to cppClient HolyLow 2025-01-08 13:35:09 +0800
  • 2962c11493 [CELEBORN-1823] Remove unused remote-shuffle.job.min.memory-per-partition and remote-shuffle.job.min.memory-per-gate SteNicholas 2025-01-07 20:58:52 +0800
  • 19fecadcd7 [CELEBORN-1413][FOLLOWUP] Bump zstd-jni version to 1.5.6-5 for 4.0.0-preview2 SteNicholas 2025-01-07 17:37:22 +0800
  • eb9e164800 [CELEBORN-1820] Failing to write and flush StreamChunk data should be counted as FETCH_CHUNK_FAIL wuziyi 2025-01-07 13:54:59 +0800