[CELEBORN-2011][INFRA] Add a script to simplify the process of creating release notes

### What changes were proposed in this pull request?

Add release utils tool.

Copied from:
https://github.com/apache/kyuubi/blob/master/build/release/pre_gen_release_notes.py
https://github.com/apache/kyuubi/blob/master/build/release/release_utils.py
### Why are the changes needed?

To reduce the release efforts

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

```
 RELEASE_TAG=v0.6.0-rc0 PREVIOUS_RELEASE_TAG=v0.5.0 build/release/pre_gen_release_notes.py
```
```
(base) ➜  celeborn-p2 git:(release_utils) RELEASE_TAG=v0.6.0-rc0 PREVIOUS_RELEASE_TAG=v0.5.0 build/release/pre_gen_release_notes.py

Gathering new commits between tags v0.5.0 and v0.6.0-rc0

==================================================================================
Release tag: v0.6.0-rc0
Previous release tag: v0.5.0
Number of commits in this range: 535

Show all commits? [y/n]: y
  2d3c48460 Wang, Fei [RELEASE] Bump 0.6.0
  f7be34194 Jinqian Fan [CELEBORN-1902] Read client throws PartitionConnectionException (Closes #3147)
  2a847ba90 Wang, Fei [MINOR] Change some config version (Closes #3269)
  ...
  142d0a053 mingji Bump 0.6.0-SNAPSHOT
==================================================================================

Does this look correct? [y/n]: y

==================================================================================
Found 1 release commits
Found 2 revert commits
Found 19 commits with no Ticket
==================== Warning: these commits will be ignored ======================

Release (1)
  2d3c48460 Wang, Fei [RELEASE] Bump 0.6.0
Revert (2)
  c316fdbdf zaynt4606 Revert "[CELEBORN-1376] Push data failed should always release request body" (Closes #2992) (Reverts b65b5433d)
  8d0b4cf4c waitinfuture [CELEBORN-1506][BUG] Revert "[CELEBORN-1036][FOLLOWUP] totalInflightReqs should decrement when batchIdSet contains the batchId to avoid duplicate caller of removeBatch" (Closes #2621)
No Ticket (19)
  2a847ba90 Wang, Fei [MINOR] Change some config version (Closes #3269)
  54732c7b3 Nicolas Fraison Update celeborn conf to add S3 in default and doc for policy (Closes #3218)
  a06362259 Cheng Pan [MINOR][INFRA] Do not cancel GHA jobs on committing to main/branch-* branches (Closes #3235)
  529fd6e01 cxzl25 [MINOR] Avoid use `_$eq` in Scala file (Closes #3208)
  dfeaef135 Cheng Pan [MINOR] Add spec link to JavaSerializer (Closes #3194)
  05b6ad4a7 Sanskar Modi [MINOR] Change config versions (Closes #3142)
  6f5ad2dde Wang, Fei [MINOR] Refine the log for fetch failure and rpc metrics dump (Closes #3136)
  b9e4bbb5a cxzl25 [MINOR] Change some config version (Closes #3082)
  4ccb0c7fc SteNicholas [MINOR] Rename org.apache.celeborn.plugin.flink.readclient to org.apache.celeborn.plugin.flink.client (Closes #3048)
  80523214e Sanskar Modi [MINOR] Add documentation for `CELEBORN_NO_DAEMONIZE` (Closes #3020)
  43e1b8a24 FMX [MINOR] Update DingTalk group link (Closes #2948)
  7fbf0e2fa Wang, Fei [MINOR] Fix missing blanks in docs (Closes #2917)
  71e3c03a1 Wang, Fei [MINOR] Fix docs typo (Closes #2890)
  d44b23c85 Weijie Guo [MINOR] Remove unused TODO comments in CelebornTierProducerAgent#processBuffer (Closes #2883)
  7018996e2 jiang13021 [MINOR] Fix typo in ExceptionUtils (Closes #2841)
  8bd5ac0b9 SteNicholas [MINOR] Add navigation for REST API document (Closes #2775)
  3cc043a17 SteNicholas [MINOR] Delete DEPLOY_ON_K8S.md (Closes #2752)
  f226424b9 Bowen Liang [CLEBORN-1555] Replace deprecated config celeborn.storage.activeTypes in docs and tests (Closes #2675)
  142d0a053 mingji Bump 0.6.0-SNAPSHOT
==================== Warning: the above commits will be ignored ==================

513 effective commits left to process after filtering. OK to proceed? [y/n]: y

=========================== Compiling contributor list ===========================
  Processed commit f7be34194 authored by Jinqian Fan on Wed May 21 16:58:30 2025 -0700
  Processed commit 082f0dd8c authored by Sanskar Modi on Wed May 21 16:37:38 2025 -0700
  Processed commit 45b94bf05 authored by Yi Chen on Wed May 21 01:21:45 2025 -0700
  ...
  Processed commit 450dac824 authored by Nicholas Jiang on Mon Jun 3 17:47:01 2024 +0800
==================================================================================

Commits list is successfully written to commits-v0.6.0-rc0.txt!
Contributors list is successfully written to contributors-v0.6.0-rc0.txt!

============ Warnings encountered while creating the contributor list ============
Found the following invalid authors:
        avishnus
        Madhukar525722
        xy2953396112
Please update 'known_translations'.
Please correct these in the final contributors list at contributors-v0.6.0-rc0.txt.
==================================================================================

```

```
 cat build/release/contributors-v0.6.0-rc0.txt
* Tao Zheng
* Amandeep Singh
* Ziyi Wu
* Sanskar Modi
* Yuting Wang
* Cheng Pan
* Aravind Patnam
* Zhao Zhao
* Saurabh Dubey
* xy2953396112
* Bowen Liang
* Shlomi Uubul
* Erik Fang
* Yi Chen
* Leo Li
* Nicolas Fraison
* Jiashu Xiong
* Pengqi Li
* Jiaming Xie
* Keyong Zhou
* Jinqian Fan
* Guangwei Hong
* Yi Zhu
* Madhukar525722
* Jianfu Li
* Chongchen Chen
* Biao Geng
* Lianne Li
* Fei Wang
* Mridul Muralidharan
* Wang, Fei
* avishnus
* Xu Huang
* Weijie Guo
* Xinyu Wang
* Yajun Gao
* He Zhao
* Björn Boschman
* Shaoyun Chen
* Kerwin Zhang
* Kun Wan
* Zhengqi Zhang
* Minchu Yang
* Haotian Cao
* Xianming Lei
* Shengjie Wang
* Veli Yang
* Arsen Gumin
* Mingxiao Feng
* Yuxin Tan
* Aidar Bariev
* Nan Zhu
* Fu Chen
* Binjie Yang
* Yanze Jiang
* Nicholas Jiang
```

```
cat build/release/commits-v0.6.0-rc0.txt|wc -l
532
```

Closes #3280 from turboFei/release_utils.

Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
This commit is contained in:
Fei Wang 2025-05-25 18:50:06 -07:00 committed by Wang, Fei
parent 637c42338e
commit 48fb71ee7c
4 changed files with 460 additions and 0 deletions

1
.gitignore vendored
View File

@ -27,6 +27,7 @@
build/apache-maven*
build/sbt-launch-*.jar
build/sbt-config/repositories-local
build/release/*.txt
cache
checkpoint
conf/*.cmd

View File

@ -0,0 +1,64 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# This is a mapping of names to be translated.
# The format expected on each line should be: <GitHub ID> - <Full Name>
akpatnam25 - Aravind Patnam
AmandeepSingh285 - Amandeep Singh
AngersZhuuuu - Yi Zhu
ashangit - Nicolas Fraison
bgeng777 - Biao Geng
buska88 - Jianfu Li
cfmcgrady - Fu Chen
ChenYi015 - Yi Chen
codenohup - Xu Huang
CodingCat - Nan Zhu
cxzl25 - Shaoyun Chen
dev-lpq - Pengqi Li
ErikFang - Erik Fang
FMX - Mingxiao Feng
gaoyajun02 - Yajun Gao
GH-Gloway - Guangwei Hong
HolyLow - Jiaming Xie
jiang13021 - Yanze Jiang
jiaoqingbo - Qingbo Jiao
kerwin-zk - Kerwin Zhang
leixm - Xianming Lei
littlexyw - Xinyu Wang
mridulm - Mridul Muralidharan
onebox-li - Leo Li
otterc - Chandni Singh
pan3793 - Cheng Pan
reswqa - Weijie Guo
RexXiong - Jiashu Xiong
s0nskar - Sanskar Modi
shlomitubul - Shlomi Uubul
shouwangyw - Veli Yang
SteNicholas - Nicholas Jiang
TheodoreLx - Zhengqi Zhang
turboFei - Fei Wang
vastian180 - Haotian Cao
waitinfuture - Keyong Zhou
wangshengjie123 - Shengjie Wang
wankunde - Kun Wan
YutingWang98 - Yuting Wang
Z1Wu - Ziyi Wu
zaynt4606 - Tao Zheng
zhaohehuhu - He Zhao
zhaostu4 - Zhao Zhao
zhongqiangczq - Zhongqiang Chen
zwangsheng - Binjie Yang

View File

@ -0,0 +1,236 @@
#!/usr/bin/env python3
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# This script is inspired by Apache Spark
# This script simplifies the process of creating release notes, it
# - folds the original and the revert commits
# - filters out unrelated commits
# - generates the contributor list
# - canonicalizes the contributors' name with the known_translations
# TODO
# - canonicalizes the commits' title
# Usage:
# set environment variables: RELEASE_TAG and PREVIOUS_RELEASE_TAG, then perform
# ./pre_gen_release_notes.py
# Example:
# RELEASE_TAG=v0.6.0 PREVIOUS_RELEASE_TAG=0.5.0 ./pre_gen_release_notes.py
# It outputs
# - commits-${RELEASE_TAG}.txt: the canonical commit list
# - contributors-${RELEASE_TAG}.txt: the canonical contributor list
import os
import re
import sys
from release_utils import (
tag_exists,
get_commits,
yes_or_no_prompt,
get_date,
is_valid_author,
capitalize_author,
print_indented
)
RELEASE_TAG = os.environ.get("RELEASE_TAG")
if RELEASE_TAG is None:
sys.exit("RELEASE_TAG is required")
if not tag_exists(RELEASE_TAG):
sys.exit("RELEASE_TAG: %s does not exist!" % RELEASE_TAG)
PREVIOUS_RELEASE_TAG = os.environ.get("PREVIOUS_RELEASE_TAG")
if PREVIOUS_RELEASE_TAG is None:
sys.exit("PREVIOUS_RELEASE_TAG is required")
if not tag_exists(PREVIOUS_RELEASE_TAG):
sys.exit("PREVIOUS_RELEASE_TAG: %s does not exist!" % PREVIOUS_RELEASE_TAG)
release_dir = os.path.dirname(os.path.abspath(__file__))
commits_file_name = "commits-%s.txt" % RELEASE_TAG
contributors_file_name = "contributors-%s.txt" % RELEASE_TAG
# Gather commits found in the new tag but not in the old tag.
# This filters commits based on both the git hash and the PR number.
# If either is present in the old tag, then we ignore the commit.
print("Gathering new commits between tags %s and %s" % (PREVIOUS_RELEASE_TAG, RELEASE_TAG))
release_commits = get_commits(RELEASE_TAG)
previous_release_commits = get_commits(PREVIOUS_RELEASE_TAG)
previous_release_hashes = set()
previous_release_prs = set()
for old_commit in previous_release_commits:
previous_release_hashes.add(old_commit.get_hash())
if old_commit.get_pr_number():
previous_release_prs.add(old_commit.get_pr_number())
new_commits = []
for this_commit in release_commits:
this_hash = this_commit.get_hash()
this_pr_number = this_commit.get_pr_number()
if this_hash in previous_release_hashes:
continue
if this_pr_number and this_pr_number in previous_release_prs:
continue
new_commits.append(this_commit)
if not new_commits:
sys.exit("There are no new commits between %s and %s!" % (PREVIOUS_RELEASE_TAG, RELEASE_TAG))
# Prompt the user for confirmation that the commit range is correct
print("\n==================================================================================")
print("Release tag: %s" % RELEASE_TAG)
print("Previous release tag: %s" % PREVIOUS_RELEASE_TAG)
print("Number of commits in this range: %s" % len(new_commits))
print("")
if yes_or_no_prompt("Show all commits?"):
print_indented(new_commits)
print("==================================================================================\n")
if not yes_or_no_prompt("Does this look correct?"):
sys.exit("Ok, exiting")
# Filter out special commits
releases = []
reverts = []
no_tickets = []
effective_commits = []
def is_release(commit_title):
return "[release]" in commit_title.lower()
def has_no_ticket(commit_title):
return not re.findall("\\[CELEBORN\\-[0-9]+\\]", commit_title.upper())
def is_revert(commit_title):
return "revert" in commit_title.lower()
for c in new_commits:
t = c.get_title()
if not t:
continue
elif is_release(t):
releases.append(c)
elif is_revert(t):
reverts.append(c)
elif has_no_ticket(t):
no_tickets.append(c)
else:
effective_commits.append(c)
# Warn against ignored commits
if releases or reverts or no_tickets:
print("\n==================================================================================")
if releases:
print("Found %d release commits" % len(releases))
if reverts:
print("Found %d revert commits" % len(reverts))
if no_tickets:
print("Found %d commits with no Ticket" % len(no_tickets))
print("==================== Warning: these commits will be ignored ======================\n")
if releases:
print("Release (%d)" % len(releases))
print_indented(releases)
if reverts:
print("Revert (%d)" % len(reverts))
print_indented(reverts)
if no_tickets:
print("No Ticket (%d)" % len(no_tickets))
print_indented(no_tickets)
print("==================== Warning: the above commits will be ignored ==================\n")
prompt_msg = "%d effective commits left to process after filtering. OK to proceed?" % len(effective_commits)
if not yes_or_no_prompt(prompt_msg):
sys.exit("OK, exiting.")
# Load known author translations that are cached locally
known_translations = {}
known_translations_file_name = "known_translations"
known_translations_file = open(os.path.join(release_dir, known_translations_file_name), "r")
for line in known_translations_file:
if line.startswith("#") or not line.strip():
continue
[old_name, new_name] = line.strip("\n").split(" - ")
known_translations[old_name] = new_name
known_translations_file.close()
# Keep track of warnings to tell the user at the end
warnings = []
# The author name that needs to translate
invalid_authors = set()
authors = set()
print("\n=========================== Compiling contributor list ===========================")
for commit in effective_commits:
_hash = commit.get_hash()
title = commit.get_title()
issues = re.findall("\\[CELEBORN\\-[0-9]+\\]", title.upper())
author = commit.get_author()
date = get_date(_hash)
# Translate the known author name
if author in known_translations:
author = known_translations[author]
elif is_valid_author(author):
# If the author name is invalid, keep track of it along
# with all associated issues so we can translate it later
author = capitalize_author(author)
else:
invalid_authors.add(author)
authors.add(author)
print(" Processed commit %s authored by %s on %s" % (_hash, author, date))
print("==================================================================================\n")
commits_file = open(os.path.join(release_dir, commits_file_name), "w")
for commit in effective_commits:
if commit.get_hash() not in map(lambda revert: revert.get_revert_hash(), reverts):
commits_file.write(commit.title + "\n")
for commit in no_tickets:
commits_file.write(commit.title + "\n")
commits_file.close()
print("Commits list is successfully written to %s!" % commits_file_name)
# Write to contributors file ordered by author names
# Each line takes the format " * Author Name"
# e.g. * Cheng Pan
# e.g. * Fu Chen
contributors_file = open(os.path.join(release_dir, contributors_file_name), "w")
sorted_authors = list(authors)
sorted_authors.sort(key=lambda author: author.split(" ")[-1])
for author in authors:
contributors_file.write("* %s\n" % author)
contributors_file.close()
print("Contributors list is successfully written to %s!" % contributors_file_name)
# Prompt the user to translate author names if necessary
if invalid_authors:
warnings.append("Found the following invalid authors:")
for a in invalid_authors:
warnings.append("\t%s" % a)
warnings.append("Please update 'known_translations'.")
# Log any warnings encountered in the process
if warnings:
print("\n============ Warnings encountered while creating the contributor list ============")
for w in warnings:
print(w)
print("Please correct these in the final contributors list at %s." % contributors_file_name)
print("==================================================================================\n")

159
build/release/release_utils.py Executable file
View File

@ -0,0 +1,159 @@
#!/usr/bin/env python3
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# This script is inspired by Apache Spark
# This file contains helper methods used in creating a release.
import re
import sys
from subprocess import Popen, PIPE
# Prompt the user to answer yes or no until they do so
def yes_or_no_prompt(msg):
response = input("%s [y/n]: " % msg)
while response != "y" and response != "n":
return yes_or_no_prompt(msg)
return response == "y"
def run_cmd(cmd):
return Popen(cmd, stdout=PIPE).communicate()[0].decode("utf8")
def run_cmd_error(cmd):
return Popen(cmd, stdout=PIPE, stderr=PIPE).communicate()[1].decode("utf8")
def get_date(commit_hash):
return run_cmd(["git", "show", "--quiet", "--pretty=format:%cd", commit_hash])
def tag_exists(tag):
stderr = run_cmd_error(["git", "show", tag])
return "error" not in stderr and "fatal" not in stderr
# A type-safe representation of a commit
class Commit:
def __init__(self, _hash, author, title, pr_number=None, revert_hash=None):
self._hash = _hash
self.author = author
self.title = title
self.pr_number = pr_number
self.revert_hash = revert_hash
def get_hash(self):
return self._hash
def get_author(self):
return self.author
def get_title(self):
return self.title
def get_pr_number(self):
return self.pr_number
def get_revert_hash(self):
return self.revert_hash
def __str__(self):
closes_pr = "(Closes #%s)" % self.pr_number if self.pr_number else ""
revert_commit = "(Reverts %s)" % self.revert_hash if self.revert_hash else ""
return "%s %s %s %s %s" % (self._hash, self.author, self.title, closes_pr, revert_commit)
# Return all commits that belong to the specified tag.
#
# Under the hood, this runs a `git log` on that tag and parses the fields
# from the command output to construct a list of Commit objects. Note that
# because certain fields reside in the commit description, we need to do
# some intelligent regex parsing to extract those fields.
def get_commits(tag):
commit_start_marker = "|=== COMMIT START MARKER ===|"
commit_end_marker = "|=== COMMIT END MARKER ===|"
field_end_marker = "|=== COMMIT FIELD END MARKER ===|"
log_format = (
commit_start_marker
+ "%h"
+ field_end_marker
+ "%an"
+ field_end_marker
+ "%s"
+ commit_end_marker
+ "%b"
)
output = run_cmd(["git", "log", "--quiet", "--pretty=format:" + log_format, tag])
commits = []
raw_commits = [c for c in output.split(commit_start_marker) if c]
for commit in raw_commits:
if commit.count(commit_end_marker) != 1:
print("Commit end marker not found in commit: ")
for line in commit.split("\n"):
print(line)
sys.exit(1)
# Separate commit digest from the body
# From the digest we extract the hash, author and the title
# From the body, we extract the PR number and the github username
[commit_digest, commit_body] = commit.split(commit_end_marker)
if commit_digest.count(field_end_marker) != 2:
sys.exit("Unexpected format in commit: %s" % commit_digest)
[_hash, author, title] = commit_digest.split(field_end_marker)
# The PR number and github username is in the commit message
# itself and cannot be accessed through any GitHub API
pr_number = None
match = re.search("Closes #([0-9]+) from ([^/\\s]+)/", commit_body)
if match:
[pr_number, github_username] = match.groups()
# If the author name is not valid, use the github
# username so we can translate it properly later
if not is_valid_author(author):
author = github_username
author = author.strip()
revert_hash = None
match = re.search("This reverts commit ([0-9a-f]+)", commit_body)
if match:
[revert_hash] = match.groups()
revert_hash = revert_hash[:9]
commit = Commit(_hash, author, title, pr_number, revert_hash)
commits.append(commit)
return commits
# Return whether the given name is in the form <First Name><space><Last Name>
def is_valid_author(author):
if not author:
return False
return " " in author and not re.findall("[0-9]", author)
# Capitalize the first letter of each word in the given author name
def capitalize_author(author):
if not author:
return None
words = author.split(" ")
words = [w[0].capitalize() + w[1:] for w in words if w]
return " ".join(words)
def print_indented(_list):
for x in _list:
print(" %s" % x)