Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we make a YGM RNG helper? #131

Open
rogerpearce opened this issue Feb 21, 2023 · 6 comments
Open

Should we make a YGM RNG helper? #131

rogerpearce opened this issue Feb 21, 2023 · 6 comments

Comments

@rogerpearce
Copy link
Collaborator

@steiltre @bwpriest @LFletch1

What are you're thoughts on making a ygm::random that helps initialize distributed STD RNGs from a common base seed?

It could be templated by the RNG engine, defaulting to std::default_random_engine.

Base seed, or single random device value, then apply some deterministic rank offset mechanism.

Helper functions for pulling out the common distribution types we use.

@bwpriest
Copy link
Member

There are currently two open pull requests that use randomization, so it would make sense for ygm::comm to have a unified random interface. I'd be happy to move that stuff out of my PR and create one.

I'm thinking that a ygm::comm object will need to initialize an RNG on creation, which means that we will need to generalize the constructors to accept RNG and seed arguments. The alternative is to add a ygm::comm::init_random() method or similar. But then we have to either initialize a replaceable RNG in existing constructors so that methods requiring randomness do not barf, allow them to barf if the RNG is not created, or do some tricky constexpr compile time shenanigans that will be hard to maintain.

@bwpriest
Copy link
Member

I also concur that we'll want a to set rank RNG seeds via something like hash(seed + rank) if we want each rank to have a separate RNG.

Are there scenarios where we might want ranks to share seeds? I've encountered this in distributed memory optimization, where you're evaluating an objective function globally via all_reduce but ranks are all running the same outer loop making the same decisions about the parameter values to evaluate next. That doesn't sound like a YGM workflow to me, but there might be a similar task that I haven't thought of.

@rogerpearce
Copy link
Collaborator Author

We may want to have multiple independent RNGs that are seeded differently. That's why I was thinking about something outside of comm, so that you could define more than one of them, as needed. Obviously it will need the comm to set itself up correctly, much like the containers do.

How about:

ygm::random rng(comm, 42); // sets up based on a global seed of 42, using default std::default_random_engine

ygm::random rng2(comm); // uses std::random_device to initialize global seed, then follows same rank offset mechanisms.

ygm::randomstd::mt19937 rng3(comm, 42); // customizes underlying RNG engine based on standard C++ interface.

auto dice = rng.uniform_int(1,6); // helper functions for the standard distributions, add esoteric ones as needed.

Saving and restoring the distributed RNGs is also important for some apps.

@rogerpearce
Copy link
Collaborator Author

hash(seed + rank), but at very high rank count, we should probably should really do a global duplication detection check.

Supporting algorithms that want to sample many independent runs of code without ever encountering a duplicate seed might take something like:
hash( (seed << 24) + rank), and we just assume that we never have more than 2^24 ranks, and our base seed is [0,2^40).

There probably are some uses cases where we would want the same RNG seed on every rank; however, that is easy enough to do with just standard sequential code and a broadcast for the seed. Perhaps we don't try to support this directly in YGM.

@steiltre
Copy link
Collaborator

We may want to have multiple independent RNGs that are seeded differently. That's why I was thinking about something outside of comm, so that you could define more than one of them, as needed. Obviously it will need the comm to set itself up correctly, much like the containers do.

How about:

ygm::random rng(comm, 42); // sets up based on a global seed of 42, using default std::default_random_engine

ygm::random rng2(comm); // uses std::random_device to initialize global seed, then follows same rank offset mechanisms.

ygm::randomstd::mt19937 rng3(comm, 42); // customizes underlying RNG engine based on standard C++ interface.

auto dice = rng.uniform_int(1,6); // helper functions for the standard distributions, add esoteric ones as needed.

Saving and restoring the distributed RNGs is also important for some apps.

I'm in favor of a YGM rng interface that is a standalone object in the spirit you describe above. The one aspect I'm not positive on is adding the distributions as a method on the rng objects. This is definitely a less confusing interface than sticking with STL but may have drawbacks. My main concerns are:

  1. Generating a new distribution object with every call. I know this is "free" for a uniform_int_distribution, but is there a nontrivial set-up for some of the others?
  2. An STL-like interface may still be necessary if a user wants to define a custom distribution that isn't appropriate for inclusion in YGM. This can certainly be worked around, so I'm not quite as concerned.
  3. Manual step needed to include new distributions. Sticking with STL, everything should "just work".

@rogerpearce
Copy link
Collaborator Author

I see your point. we could have the YGM interface decay into a reference to its underlying std::rng so that users can construct their own distributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants