Commit Graph

  • 3efc60cc9f [CELEBORN-2047] Reuse FileChannel/FSDataInputStream in PartitionDataReader main daowu.hzy 2025-09-03 16:14:56 +0800
  • 449cb5d588 [CELEBORN-2127] When fileWriter is closed, it should return HARD_SPLIT StatusCode xxx 2025-09-03 16:04:32 +0800
  • 750aeefbc6 [CELEBORN-2139] Fix the condition for using OSS storage xxx 2025-09-03 15:40:42 +0800
  • 6c102441c3 [CELEBORN-2138] Avoiding multiple accesses to HDFS When writting index file taowenjun 2025-09-03 10:43:13 +0800
  • 7f08eb8f1d [CELEBORN-2137] Remove unused MAPGROUP PartitionType SteNicholas 2025-09-03 09:58:07 +0800
  • d038dd2b32 [CELEBORN-1258] Support to register application info with user identifier and extra info Wang, Fei 2025-09-01 11:15:40 +0800
  • 2817f7fb9e [CELEBORN-2104] Clean up sources of NettyRpcEnv, Master and Worker to avoid thread leaks dz 2025-08-29 19:04:19 +0800
  • ffdaef98c3 [CELEBORN-2097] Support Zstd Compression in CppClient Jray 2025-08-29 18:58:22 +0800
  • 185890381b [CELEBORN-2135] Rename Blaze to Auron sychen 2025-08-29 18:55:45 +0800
  • db70473de2 [CELEBORN-2134] When creating a DiskFile, retrieve the storage type b… xxx 2025-08-28 17:21:51 +0800
  • 1be3094fb2 [CELEBORN-2132] Enhance ratis peer add operation to support clientAddress & adminAddress gaoyajun02 2025-08-28 11:29:51 +0800
  • 1a3b9f35b5 [CELEBORN-2129] CelebornBufferStream should invoke openStreamInternal in moveToNextPartitionIfPossible to avoid client creation timeout SteNicholas 2025-08-27 14:21:15 +0800
  • f590fb275d [CELEBORN-2122] Avoiding multiple accesses to HDFS when retrieving in… xxx 2025-08-27 14:16:59 +0800
  • d4e13b6ba2 [CELEBORN-2128] Close hadoopFs FileSystem when worker is closed xxx 2025-08-27 14:12:55 +0800
  • 20e36ca72d [CELEBORN-2133] LifecycleManager should log stack trace of Throwable for invoking appShuffleTrackerCallback SteNicholas 2025-08-27 10:31:56 +0800
  • 679df6c0f5 [CELEBORN-2125] Imporve PartitionFilesSorter sort timeout log sychen 2025-08-26 19:23:22 +0800
  • a9490d6e24 [CELEBORN-2118] Introduce IsHighWorkload metric to monitor worker overload status xxx 2025-08-25 20:46:17 +0800
  • 1d6299717f [CELEBORN-2123] Add log for commit file size xxx 2025-08-25 17:18:32 +0800
  • d6df794ae7 [CELEBORN-2115][CIP-14] Support PushData in cppClient HolyLow 2025-08-25 15:03:33 +0800
  • 5f3884f5fb [MINOR] Fix migration doc style sychen 2025-08-22 20:52:31 +0800
  • 3c9fe9897c [MINOR] Fix doc about PushMergedData split sychen 2025-08-22 20:50:05 +0800
  • 686242cc1b [MINOR] Fix appVersion in Chart.yaml SteNicholas 2025-08-22 10:55:33 +0800
  • 8effb735f7 [CELEBORN-2066] Release workers only with high workload when the number of excluded worker set is too large yuanzhen 2025-08-22 10:14:38 +0800
  • 661a096b77 [CELEBORN-2112] Introduce PausePushDataStatus and PausePushDataAndReplicateStatus metric to record status of pause push data xxx 2025-08-21 11:17:44 +0800
  • 8a37a7ca17 [CELEBORN-2106] CommitFile/Reserved location shows detail primary location UniqueId dz 2025-08-21 10:45:30 +0800
  • 11b41f97ad [CELEBORN-2102] Introduce SorterCacheHitRate metric to monitor the hit reate of index cache for sorter dz 2025-08-20 10:47:38 +0800
  • b537798e37 [CELEBORN-2108] Remove redundant PartitionType xxx 2025-08-19 16:38:19 +0800
  • adfc563828 [CELEBORN-2119] DfsTierWriter should close s3MultipartUploadHandler and ossMultipartUploadHandler for close resource SteNicholas 2025-08-19 14:57:16 +0800
  • a8f6de5cc6 [CELEBORN-2117] Use git submodules for Chart Actions sychen 2025-08-19 14:25:48 +0800
  • 0882db926d [CELEBORN-2096] Support Lz4 Compression in CppClient Jray 2025-08-19 10:05:13 +0800
  • 7e13c9934f [CELEBORN-2098][CIP-14] Support Revive/Response in cppClient HolyLow 2025-08-16 12:59:32 +0800
  • 180a74146c [CELEBORN-2100] Fix performance issue on readToReadOnlyBuffer Jray 2025-08-16 12:56:38 +0800
  • 1ed2abc6bf [CELEBORN-2095][CIP-14] Support RegisterShuffle/Response in cppClient HolyLow 2025-08-13 10:29:34 +0800
  • eb5e8a46f8 [CELEBORN-2091] Support Zstd Decompression in CppClient Jray 2025-08-11 15:55:32 +0800
  • cfb490c938 [CELEBORN-2090] Support Lz4 Decompression in CppClient Jray 2025-08-08 18:19:48 +0800
  • 1ead784fa1 [CELEBORN-2085] Use a fixed buffer for flush copying to reduce GC TheodoreLx 2025-08-08 13:57:21 +0800
  • 5a459250b0
    [CELEBORN-1844][FOLLOWUP] Fix the condition of StoragePolicy that worker uses memory storage SteNicholas 2025-08-06 10:36:07 +0800
  • fdcc108689 [CELEBORN-1792][FOLLOWUP] Add missing break in resumeByPinnedMemory liuyang62 2025-08-05 20:32:55 +0800
  • 75446a05d3 [CELEBORN-2093] Support Flink 2.1 SteNicholas 2025-08-04 14:12:55 +0800
  • a61d6a517f [CELEBORN-2064] Fix the issue where reading replica partition that returns zero chunk causes tasks to hang xinyuwang1 2025-08-01 13:47:07 -0700
  • a498b1137f [CELEBORN-1984] Merge ResourceRequest to transportMessageProtobuf zhaohehuhu 2025-08-01 23:28:32 +0800
  • 20a629a432 [CELEBORN-2088] Fix NPE if celeborn.client.spark.fetch.cleanFailedShuffle enabled Wang, Fei 2025-07-31 21:15:51 -0700
  • 604485779c [CELEBORN-2092] Inc COMMIT_FILES_FAIL_COUNT when TimerWriter::close timeout Wang, Fei 2025-07-31 21:12:21 -0700
  • f3c6f306c1 [CELEBORN-2070][CIP-14] Support MapperEnd/Response in CppClient HolyLow 2025-07-30 14:40:55 +0800
  • 0a465adea7 [CELEBORN-2087] Refine the docs configuration table view Wang, Fei 2025-07-30 10:33:07 +0800
  • 392f6186df [CELEBORN-2086] S3FlushTask and OssFlushTask should close ByteArrayInputStream to avoid resource leak SteNicholas 2025-07-29 17:19:18 +0800
  • 4540b5772b [MINOR] Document introduced metrics into monitoring.md SteNicholas 2025-07-29 14:33:46 +0800
  • 3ff44fae3f [CELEBORN-894][CELEBORN-474][FOLLOWUP] PushState uses JavaUtils#newConcurrentHashMap to speed up ConcurrentHashMap#computeIfAbsent SteNicholas 2025-07-29 10:30:31 +0800
  • c6e68fddfa [CELEBORN-2053] Refactor remote storage configration usage Kalvin2077 2025-07-28 16:56:32 +0800
  • 7ab6268e38 [CELEBORN-2083] For WorkerStatusTracker, log error for recordWorkerFailure Wang, Fei 2025-07-27 22:46:20 -0700
  • ae40222351 [CELEBORN-2047] Support MapPartitionData on DFS SteNicholas 2025-07-26 22:11:32 +0800
  • df0def6701 [CELEBORN-2082] Add the log of excluded workers with high workloads sychen 2025-07-25 20:56:18 +0800
  • abd6233a50 [CELEBORN-2081] PushDataHandler onFailure log shuffle key sychen 2025-07-25 20:53:46 +0800
  • c587f33aaf [CELEBORN-1793] Add netty pinned memory metrics Wang, Fei 2025-07-25 17:09:42 +0800
  • 29ab16989d [CELEBORN-2056] Make the wait time for the client to read non shuffle partitions configurable duanhao-jk 2025-07-24 23:20:34 -0700
  • 4656bcb98a [CELEBORN-2071] Fix the issue where some gauge metrics were not registered to the metricRegistry TheodoreLx 2025-07-24 16:15:21 +0800
  • 4ced621534 [CELEBORN-2080] Bump Flink from 1.19.2, 1.20.1 to 1.19.3, 1.20.2 SteNicholas 2025-07-24 14:13:12 +0800
  • b8253b0864 [CELEBORN-2078] Fix wrong grafana metrics units Wang, Fei 2025-07-23 15:59:32 +0800
  • 66856f21b3 [CELEBORN-2077] Improve toString by JEP-280 instead of ToStringBuilder SteNicholas 2025-07-22 22:50:01 -0700
  • 0ed590dc81 [CELEBORN-1917] Support celeborn.client.push.maxBytesSizeInFlight DDDominik 2025-07-22 23:07:56 +0800
  • 8a0d0d5fd4 [CELEBORN-2075] Fix OpenStreamTime metrics for PbOpenStreamList request Wang, Fei 2025-07-22 17:52:38 +0800
  • d09b424756 [CELEBORN-2061] Introduce metrics to count the amount of data flushed into different storage types TheodoreLx 2025-07-21 16:40:57 +0800
  • b92820c635 [CELEBORN-2072] Add missing instance filter to grafana dashboard Wang, Fei 2025-07-21 14:27:22 +0800
  • 979f2e2148 [CELEBORN-2073] Fix PartitionFileSizeBytes metrics Wang, Fei 2025-07-21 14:25:06 +0800
  • 05fca23ed2 [MINOR] Fix a typo buffer to body in ChunkFetchSuccess.toString SteNicholas 2025-07-21 13:31:18 +0800
  • cf3c05d668 [CELEBORN-2068] TransportClientFactory should close channel explicitly to avoid resource leak for timeout or failure SteNicholas 2025-07-18 17:50:08 +0800
  • 6a0e19c076 [CELEBORN-2067] Clean up deprecated Guava API usage SteNicholas 2025-07-18 17:39:50 +0800
  • 765265a87d [CELEBORN-2031] Interruption Aware Slot Selection Aravind Patnam 2025-07-15 17:33:00 +0800
  • cfb4438ade [CELEBORN-2057] Bump ap-loader version from 3.0-9 to 4.0-10 SteNicholas 2025-07-10 16:18:28 +0800
  • 532cedbfd2 [CELEBORN-1844][FOLLOWUP] alway try to use memory storage if available mingji 2025-07-10 15:55:29 +0800
  • 0fa600ade1 [CELEBORN-2055] Fix some typos codenohup 2025-07-10 12:01:02 +0800
  • cd5d9cd93d [CELEBORN-2052] Fix unexpected warning logs in Flink caused by duplicate BufferStreamEnd messages codenohup 2025-07-07 17:56:58 +0800
  • a649823e1a [INFRA] More contributors name mapping Wang, Fei 2025-07-07 16:39:48 +0800
  • 41b5154030 [CELEBORN-2051] Support write MapPartition to DFS daowu.hzy 2025-07-03 11:41:10 +0800
  • d2474e0402 [CELEBORN-894][FOLLOWUP] update commitMeta before update subPartition… lijianfu03 2025-07-01 14:05:45 +0800
  • cde33d953b [CELEBORN-894] End to End Integrity Checks Gaurav Mittal 2025-06-28 09:19:57 +0800
  • 7a0eee332a [CELEBORN-2045] Add logger sinks to allow persist metrics data and avoid possible worker OOM mingji 2025-06-26 18:42:20 -0700
  • 0fc7827ab8 [CELEBORN-2036] Fix NPE when TransportMessage has null payload Jray 2025-06-26 10:43:12 +0800
  • 3ee3a26220 [CELEBORN-2046] Specify extractionDir of AsyncProfilerLoader with celeborn.worker.jvmProfiler.localDir SteNicholas 2025-06-25 10:15:38 -0700
  • 8ae9737601 [CELEBORN-2044] Proactively cleanup stream state from ChunkStreamManager when the stream ends Mridul Muralidharan 2025-06-24 12:51:00 -0500
  • 582726fff8 [CELEBORN-1721][FOLLOWUP] Return softsplit if there is no hardsplit for pushMergeData Shuang 2025-06-23 17:32:10 -0700
  • 676beca616 [CELEBORN-2043] Fix IndexOutOfBoundsException exception in getEvictedFileWriter mingji 2025-06-23 00:49:26 -0700
  • 4d4012e4c3 [CELEBORN-2040] Avoid throw FetchFailedException when GetReducerFileGroupResponse failed via broadcast caohaotian 2025-06-22 23:59:29 -0700
  • dac0f56e94 [CELEBORN-1056][FOLLOWUP] Support testing of dynamic configuration management cli SteNicholas 2025-06-22 21:25:09 -0700
  • 6a097944cf [CELEBORN-2042] Fix FetchFailure handling when TaskSetManager is not found gaoyajun02 2025-06-18 10:22:10 -0700
  • d44242ec20 [MINOR] Batch few celeborn client logs Sanskar Modi 2025-06-17 15:34:48 -0700
  • 46c998067e [CELEBORN-1056][FOLLOWUP] Support upsert and delete of dynamic configuration management SteNicholas 2025-06-17 14:54:50 -0700
  • 3d614f848e [CELEBORN-1931][FOLLOWUP] Update config version for worker local flusher gather api Wang, Fei 2025-06-17 11:54:26 -0700
  • 2a2c6e4687 [CELEBORN-2024] Publish commit files fail count metrics Sanskar Modi 2025-06-17 11:52:45 -0700
  • a0a4260013 [CELEBORN-1817][FOLLOWUP] Correct the problematic metrics Shuang 2025-06-16 21:11:40 -0700
  • 6f1c10527f [CELEBORN-1413][FOLLOWUP] Check JAVA_HOME variables for release Wang, Fei 2025-06-16 21:10:54 -0700
  • cc13c1e643 [CELEBORN-2011][FOLLOWUP][INFRA] Write sorted authors for release contributors Wang, Fei 2025-06-13 11:45:35 -0700
  • cfc3f1b13a [CELEBORN-1319][FOLLOWUP] Support celeborn optimize skew partitions patch for Spark v3.5.6 and v4.0.0 SteNicholas 2025-06-12 11:04:17 -0700
  • 03f97e6166 [CELEBORN-1577][FOLLOWUP] Improve check quota message Xianming Lei 2025-06-12 11:01:18 -0700
  • 80bdb46801 [CELEBORN-1892] Adding register with master fail count metric for worker Sanskar Modi 2025-06-11 11:04:59 -0700
  • bbd3bb4814 [CELEBORN-2033] updateProduceBytes should be called even if updateProduceBytes throws exception Xianming Lei 2025-06-11 10:54:24 -0700
  • edeeb4b30a [CELEBORN-1719][FOLLOWUP] Rename throwsFetchFailure to stageRerunEnabled Xianming Lei 2025-06-11 19:33:19 +0800
  • 68f32303cd [CELEBORN-1572][FOLLOWUP] Support to show Celeborn CLI version for sub command Wang, Fei 2025-06-11 14:00:20 +0800
  • 9a689b7482 [CELEBORN-2028] Setup GA for grafana dashboard Wang, Fei 2025-06-10 16:14:49 +0800
  • 919ece8ad2 [CELEBORN-2015][FOLLOWUP] Retry IOException failures for RPC requests Sanskar Modi 2025-06-09 11:53:48 -0700