Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ray job #1123

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Support ray job #1123

wants to merge 1 commit into from

Conversation

qile123
Copy link

@qile123 qile123 commented Aug 30, 2024

What this PR does / why we need it:
This PR is used to support ray job in arena.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #1120

@qile123
Copy link
Author

qile123 commented Aug 30, 2024

@cheyang @Syulin7

@Syulin7
Copy link
Collaborator

Syulin7 commented Aug 30, 2024

@qile123 Thank you for this great contribution! Please rebase your PR.

go.mod Outdated Show resolved Hide resolved
Comment on lines 43 to 44
// RayCluster defines the ray cluster
RayJob TrainingJobType = "rayjob"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RayCluster defines the ray cluster
RayJob TrainingJobType = "rayjob"
// RayJob defines the ray job
RayJob TrainingJobType = "rayjob"

@ChenYi015
Copy link
Collaborator

@qile123 Are you willing to contribute some docs about how to submit an example Ray job?

@qile123
Copy link
Author

qile123 commented Sep 11, 2024

Are you willing to contribute some docs about how to submit an example Ray job?

Certainly, I'm more than happy to contribute documentation on submitting a sample Ray job.

@qile123 qile123 force-pushed the feature/ray-job branch 3 times, most recently from 578fa03 to f319e32 Compare September 12, 2024 03:40
@@ -0,0 +1,280 @@
// Copyright 2018 The Kubeflow Authors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright 2018 The Kubeflow Authors
// Copyright 2024 The Kubeflow Authors.

allocatedGPU int64
trainerType types.TrainingJobType // return trainer type: rayjob
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RayJob implements the TrainingJob interface.
var _ TrainingJob = &RayJob{}

It would be better to add the above code to ensure that RayJob implements the TrainingJob interface.

// check if it's enabled
enabled bool
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RayJobTrainer implements the Trainer interface.
var _ Trainer = &RayJobTrainer{}

pkg/training/trainer_ray.go Outdated Show resolved Hide resolved
pkg/training/logs.go Outdated Show resolved Hide resolved
@ChenYi015
Copy link
Collaborator

@qile123 Could you rebase this PR please? There are some merge conflicts due to the recent merged PRs.

@qile123
Copy link
Author

qile123 commented Sep 13, 2024

@qile123 Could you rebase this PR please? There are some merge conflicts due to the recent merged PRs.

sure

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChenYi015

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ChenYi015
Copy link
Collaborator

@qile123 Thanks for the great contribution of adding support for Ray job! LGTM, will wait for another approval @cheyang @Syulin7 .

@qile123
Copy link
Author

qile123 commented Sep 14, 2024

@qile123 Thanks for the great contribution of adding support for Ray job! LGTM, will wait for another approval @cheyang @Syulin7 .

Thank you for your positive feedback and for reviewing the changes that add support for Ray jobs! I appreciate your time and input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support ray job
3 participants