celeborn/conf
Wang, Fei 90ece9665c [CELEBORN-2002][MASTER] Audit shuffle lifecycle in separate log file
### What changes were proposed in this pull request?
Audit shuffle lifecycle in separate log file
- OFFER_SLOTS
- EXPIRE
- REVIVE
- UNREGISTER

### Why are the changes needed?
 Remove redundant logs of expired shuffle in master-worker heartbeat, see https://github.com/apache/celeborn/pull/3244

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
```
(base) ➜  celeborn git:(shuffle_audit) grep ShuffleAuditLogger tests/spark-it/target/unit-tests.log
25/05/19 20:05:27,031 INFO [celeborn-dispatcher-41] ShuffleAuditLogger: shuffleKey=local-1747710326897-0        op=OFFER_SLOTS  numReducers=4   workerNum=5     extraSlots=1
25/05/19 20:05:27,719 INFO [celeborn-dispatcher-44] ShuffleAuditLogger: shuffleKey=local-1747710326897-1        op=OFFER_SLOTS  numReducers=2   workerNum=5     extraSlots=3
25/05/19 20:05:28,094 INFO [celeborn-dispatcher-47] ShuffleAuditLogger: shuffleKey=local-1747710326897-2        op=OFFER_SLOTS  numReducers=2   workerNum=5     extraSlots=3
25/05/19 20:05:28,467 INFO [celeborn-dispatcher-52] ShuffleAuditLogger: shuffleKey=local-1747710326897-3        op=OFFER_SLOTS  numReducers=8   workerNum=5     extraSlots=0
25/05/19 20:05:28,769 INFO [celeborn-dispatcher-53] ShuffleAuditLogger: shuffleKey=local-1747710326897-4        op=OFFER_SLOTS  numReducers=8   workerNum=5     extraSlots=0
25/05/19 20:05:29,720 INFO [celeborn-dispatcher-56] ShuffleAuditLogger: shuffleKey=local-1747710326897-5        op=OFFER_SLOTS  numReducers=200 workerNum=5     extraSlots=0
25/05/19 20:05:30,349 INFO [celeborn-dispatcher-59] ShuffleAuditLogger: shuffleKey=local-1747710326897-6        op=OFFER_SLOTS  numReducers=4   workerNum=5     extraSlots=1
25/05/19 20:05:40,534 INFO [celeborn-dispatcher-11] ShuffleAuditLogger: shuffleKey=local-1747710340484-0        op=OFFER_SLOTS  numReducers=4   workerNum=5     extraSlots=1
25/05/19 20:05:41,101 INFO [celeborn-dispatcher-14] ShuffleAuditLogger: shuffleKey=local-1747710340484-1        op=OFFER_SLOTS  numReducers=2   workerNum=5     extraSlots=3
25/05/19 20:05:41,480 INFO [celeborn-dispatcher-17] ShuffleAuditLogger: shuffleKey=local-1747710340484-2        op=OFFER_SLOTS  numReducers=2   workerNum=5     extraSlots=3
25/05/19 20:05:41,848 INFO [celeborn-dispatcher-26] ShuffleAuditLogger: shuffleKey=local-1747710340484-3        op=OFFER_SLOTS  numReducers=8   workerNum=5     extraSlots=0
25/05/19 20:05:42,136 INFO [celeborn-dispatcher-18] ShuffleAuditLogger: shuffleKey=local-1747710340484-4        op=OFFER_SLOTS  numReducers=8   workerNum=5     extraSlots=0
25/05/19 20:05:43,058 INFO [celeborn-dispatcher-21] ShuffleAuditLogger: shuffleKey=local-1747710340484-5        op=OFFER_SLOTS  numReducers=200 workerNum=5     extraSlots=0
25/05/19 20:05:43,542 INFO [celeborn-dispatcher-31] ShuffleAuditLogger: shuffleKey=local-1747710340484-6        op=OFFER_SLOTS  numReducers=4   workerNum=5     extraSlots=1
25/05/19 20:05:44,436 INFO [celeborn-dispatcher-29] ShuffleAuditLogger: shuffleKeys=local-1747710326897-0,local-1747710326897-1,local-1747710326897-2,local-1747710326897-3,local-1747710326897-4,local-1747710326897-5 op=EXPIRE       worker=127.0.0.1:59932:59934:59948:59941
25/05/19 20:05:44,436 INFO [celeborn-dispatcher-27] ShuffleAuditLogger: shuffleKeys=local-1747710326897-0,local-1747710326897-3,local-1747710326897-4,local-1747710326897-5,local-1747710326897-6       op=EXPIRE       worker=127.0.0.1:59930:59938:59944:59940
25/05/19 20:05:44,436 INFO [celeborn-dispatcher-32] ShuffleAuditLogger: shuffleKeys=local-1747710326897-1,local-1747710326897-2,local-1747710326897-3,local-1747710326897-4,local-1747710326897-5,local-1747710326897-6 op=EXPIRE       worker=127.0.0.1:59931:59936:59945:59939
25/05/19 20:05:44,436 INFO [celeborn-dispatcher-33] ShuffleAuditLogger: shuffleKeys=local-1747710326897-0,local-1747710326897-3,local-1747710326897-4,local-1747710326897-5,local-1747710326897-6       op=EXPIRE       worker=127.0.0.1:59933:59935:59946:59943
25/05/19 20:05:44,436 INFO [celeborn-dispatcher-28] ShuffleAuditLogger: shuffleKeys=local-1747710326897-0,local-1747710326897-3,local-1747710326897-4,local-1747710326897-5,local-1747710326897-6       op=EXPIRE       worker=127.0.0.1:59929:59937:59947:59942

```

Closes #3265 from turboFei/shuffle_audit.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-05-20 05:45:34 -07:00
..
celeborn-defaults.conf.template [CELEBORN-1455] Remove improper configs from config template 2024-06-11 15:35:53 +08:00
celeborn-env.sh.template [MINOR] Add documentation for CELEBORN_NO_DAEMONIZE 2024-12-23 10:31:37 +08:00
dynamicConfig.yaml.template [CELEBORN-1594] Refine dynamicConfig template and prevent NPE 2024-09-15 22:11:23 +08:00
hosts.template [CELEBORN-420] Add hosts template and docs about start-all scripts. (#1354) 2023-03-16 11:33:32 +08:00
log4j2.xml.template [CELEBORN-2002][MASTER] Audit shuffle lifecycle in separate log file 2025-05-20 05:45:34 -07:00
metrics.properties.template [CELEBORN-1122] Metrics supports json format 2023-12-06 09:24:28 +08:00
ratis-log4j.properties.template [CELEBORN-360] Add celeborn ratis shell command line (#1294) 2023-03-02 16:30:45 +08:00