Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2170: Create MPI Runtime #2217

Open
andreyvelich opened this issue Aug 14, 2024 · 4 comments
Open

KEP-2170: Create MPI Runtime #2217

andreyvelich opened this issue Aug 14, 2024 · 4 comments

Comments

@andreyvelich
Copy link
Member

andreyvelich commented Aug 14, 2024

Related: #2170

As part of this KEP, we will migrate to the MPI V2 implementation.

We should add support for the MPI Runtime.

/area runtime

@tenzen-y
Copy link
Member

Note that we need to extend the KEP-2170 for the MPI before we implement anything.

@tenzen-y
Copy link
Member

Note that we need to extend the KEP-2170 for the MPI before we implement anything.

Oh, we already added the design for the MPI here: https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#the-mpi-spec-api

NVM

@andreyvelich
Copy link
Member Author

Note that we need to extend the KEP-2170 for the MPI before we implement anything.

Oh, we already added the design for the MPI here: https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#the-mpi-spec-api

NVM

Once we will be ready to implement MPI runtime, we should probably update this ClusterTrainingRuntime: https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#mpi-runtime.

It might have incorrect values, since we didn't get a chance to finalize it.

@tenzen-y
Copy link
Member

Note that we need to extend the KEP-2170 for the MPI before we implement anything.

Oh, we already added the design for the MPI here: https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#the-mpi-spec-api
NVM

Once we will be ready to implement MPI runtime, we should probably update this ClusterTrainingRuntime: https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#mpi-runtime.

It might have incorrect values, since we didn't get a chance to finalize it.

That sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants