Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix measure bottlenecks #337

Merged
merged 17 commits into from
Nov 20, 2023
Merged

fix measure bottlenecks #337

merged 17 commits into from
Nov 20, 2023

Conversation

RaphaelS1
Copy link
Collaborator

Depends on xoopR/distr6#296

@bblodfon
Copy link
Collaborator

Above commits may seem strange but I had to fix these for the new RCLL tests

Comment on lines 135 to 141
ok = inherits(distr, c("VectorDistribution", "Matdist", "Arrdist")) &&
length(keep) > 1 # edge case: Arrdist(1xYxZ) and keep = FALSE
if (ok) {
pdata$distr = distr[keep] # we can subset row/samples like this
} else {
pdata$distr = base::switch(keep, distr) # one distribution only
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bblodfon can you tell me what's happening here?

Copy link
Collaborator

@bblodfon bblodfon Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to filter Distribution objects (like the prediction object from surv.parametric) since I noticed we don't do any filtering on those and I needed on the tests to compare RCLL old and new code kind-of. The above is the result of testing some edge cases.

In general you can subset with distr[keep] when you have a Matdist, Arrdist or an VectorDistribution if you have two or more observations. The problem starts when 1 observation is left. distr[keep] with a VectorDistribution leaves a Distribution (e.g. WeibullAFT-something in surv.parametric), a WeightedDiscrete() for Matdist, and an Arrdist of dim 1x... (it doesn't degenerate to a Matdist). And that's all fine, we still use pdata$distr = distr[keep] to do this. But then if you redo some more filtering (on that same specific row index or another to go to 0 observations - a very edge case :), keep will be either TRUE or FALSE, and most of the above are not subsettable or don't work as expected:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
arrdist_logical_subsetting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking this - based on the above, we might want to make an Arrdist to a Matdist when only one observation is left by design in distr6?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaphaelS1 Do you have an opinion on this?

R/MeasureSurvRCLL.R Outdated Show resolved Hide resolved
R/MeasureSurvRCLL.R Outdated Show resolved Hide resolved
Comment on lines +87 to 93
distr = prediction$distr
if (inherits(distr, c("Matdist", "Arrdist"))) {
si = diag(distr$survival(true_times))
} else { # VectorDistribution or single Distribution, e.g. WeightDisc()
si = as.numeric(distr$survival(data = matrix(true_times, nrow = 1L)))
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is from using lrn('surv.parametric') and filtering to one observation, p$distr is a WeibullAFT() which is not a VectorDistribution, so diag(distr$survival(true_times)) returned 0 and later NaN in the score

@RaphaelS1 RaphaelS1 merged commit 083e685 into main Nov 20, 2023
3 of 4 checks passed
@RaphaelS1 RaphaelS1 deleted the fix_rcll_bottleneck branch November 20, 2023 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants