shared cache: address O(P) scalability of SCR metadata files #493

adammoody · 2022-03-05T22:29:42Z

When using a global file system as cache, we need to examine the number of metadata files that SCR creates for each dataset. SCR stores a filemap for each process as well as a number of files resulting from the redundancy encoding, even when using SINGLE. In all, SCR writes 9 metadata files per process, per dataset.

ls -ltr /dev/shm/$USER/scr.${SLURM_JOBID}/scr.dataset.12/.scr
-rw------- 1  520 Mar  5 14:35 filemap_0

-rw------- 1   44 Mar  5 14:35 reddescmap.er.er
-rw------- 1  312 Mar  5 14:35 reddescmap.er.shuffile
-rw------- 1  178 Mar  5 14:35 reddescmap.er.0.redset
-rw------- 1  548 Mar  5 14:35 reddescmap.er.0.single.grp_1_of_2.mem_1_of_1.redset

-rw------- 1   44 Mar  5 14:35 reddesc.er.er
-rw------- 1  303 Mar  5 14:35 reddesc.er.shuffile
-rw------- 1  178 Mar  5 14:35 reddesc.er.0.redset
-rw------- 1  548 Mar  5 14:35 reddesc.er.0.single.grp_1_of_2.mem_1_of_1.redset

When cache is node-local storage, these files are distributed among the compute nodes. Each node only holds a small subset of the files, and they are written in parallel. However, these files all pile into a single scr.dataset.<id>/.scr directory when cache is a global file system. The number of files written to this single directory scales as O(9*P) where P is the number of processes.

That feels extreme, especially since the application may write one single shared file in the dataset. For a large-scale run where P=16,000, SCR would produce 144,000 files!

We have a few options:

Modify SCR to keep those metadata files in node-local storage when cache is a global file system.
Modify er/shuffile/redset to avoid creating (so many) redundancy files in SINGLE.
Modify scr/er/shuffile/redset to merge data into fewer physical files, where data from multiple compute nodes are combined.

The text was updated successfully, but these errors were encountered:

adammoody changed the title ~~Address O(P) scalability of SCR cache metadata files~~ shared cache: address O(P) scalability of SCR cache metadata files Mar 5, 2022

adammoody changed the title ~~shared cache: address O(P) scalability of SCR cache metadata files~~ shared cache: address O(P) scalability of SCR metadata files Mar 5, 2022

adammoody mentioned this issue Apr 21, 2022

support global storage as cache locations #351

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shared cache: address O(P) scalability of SCR metadata files #493

shared cache: address O(P) scalability of SCR metadata files #493

adammoody commented Mar 5, 2022 •

edited

Loading

shared cache: address O(P) scalability of SCR metadata files #493

shared cache: address O(P) scalability of SCR metadata files #493

Comments

adammoody commented Mar 5, 2022 • edited Loading

adammoody commented Mar 5, 2022 •

edited

Loading