celeborn/worker
sunjunjie aa9dfd0566 [CELEBORN-1005][BUG] Clean Expired App Dirs will delete the running a…
### What changes were proposed in this pull request?
When working on reading shuffle data, the file was accidentally deleted

`2023-09-22 16:32:36,810 [storage-scheduler] INFO  org.apache.celeborn.service.deploy.worker.storage.StorageManager[51]: Delete expired app dir /data8/rssdata/celeborn-worker/shuffle_data/application_1689848866482_12296544_1.
2023-09-22 16:32:36,810 [Disk-cleaner-/data8-6] DEBUG org.apache.celeborn.service.deploy.worker.storage.StorageManager[47]: Deleted expired shuffle file /data8/rssdata/celeborn-worker/shuffle_data/application_1689848866482_12296544_1/32/924-0-0.
2023-09-22 16:32:53,304 [fetch-server-11-31] DEBUG org.apache.celeborn.service.deploy.worker.FetchHandler[47]: Received chunk fetch request application_1689848866482_12296544_1-32 924-0-0 0 2147483647 get file info FileInfo{file=/data8/rssdata/celeborn-worker/shuffle_data/application_1689848866482_12296544_1/32/924-0-0, chunkOffsets=0,558, userIdentifier=`default`.`default`, partitionType=REDUCE}
java.io.FileNotFoundException: /data8/rssdata/celeborn-worker/shuffle_data/application_1689848866482_12296544_1/32/924-0-0 (No such file or directory)`

Because when cleaning up the directories of expired apps, the file directory is created first and then added to the fileInfos collection. As a result, when getting the shuffleKeySet, the running apps do not yet exist, causing the files to be mistakenly deleted.

https://issues.apache.org/jira/browse/CELEBORN-1005
### Why are the changes needed?
bugfix

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1937 from wilsonjie/CELEBORN-1005.

Lead-authored-by: sunjunjie <sunjunjie@zto.com>
Co-authored-by: junjie.sun <40379361+wilsonjie@users.noreply.github.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-25 23:20:48 +08:00
..
src [CELEBORN-1005][BUG] Clean Expired App Dirs will delete the running a… 2023-09-25 23:20:48 +08:00
pom.xml [CELEBORN-977] Support RocksDB as recover DB backend 2023-09-19 09:20:33 +08:00