-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getdatalink() exits messily when there isn't a datalink #328
Labels
Comments
On Wed, May 11, 2022 at 12:42:23PM -0700, trjaffe wrote:
return DatalinkResults.from_result_url(self.getdataurl(), session=self._session)
This came in in a commit to "add support for datalink without
resource elements" in November 2018. Stefan, if you're still reading
this, feel free to comment.
Anyway, the idea here is that certain services have datalink-valued
columns. Much of these perhaps shouldn't be there and were hacks to
get around lack of datalink support in clients, but then there are
legitimate cases. The classic are services with datasets too large
to unleash them on unsuspecting clients. And these can come mixed
with cases where there is a straight dataset at the other end. As
examples, consider the result of
select top 5 access_url, access_format
from ivoa.obscore
where obs_collection='HDAP'
on the TAP service at <http://dc.g-vo.org/tap> (and then other
records in that table).
Whether such cases should indeed be silently turned into datalink
results is an interesting question. However, changing that
particular API now would probably need a stronger justification than
"it probably wasn't a good idea".
The problem is that the service is returning a URL to a FITS file.
When the get_adhocservice_by_ivoid() fails to find a datalink in
the metadata of the result, it excepts into the call to
DatalinkResults.from_result_url(). The problem in this case is
that the data URL of this result is to a FITS file, and it's
expecting a VOTable result, and it barfs trying to parse the FITS
file as a VOTable.
Yeah, that really shouldn't happen, and the exception *is* far too
messy. The problem here is in getdataurl; it can't just go returning
any old access_url, and I'm not sure why the code was written that it
does that. I don't have the time to look more closely at all this
code, but getdataurl needs to make an attempt to ensure what's at the
other end really is a datalink.
It probably can do a few things before talking to the remote server.
If you are looking at an obscore-compatible row (a case that I think
we can heuristically cover), then inspecting access_format is a
reasonable thing to do. There is also a convention that aladin
understands and may deserve wider takeup; you'd see it in the
metadata of the dlurl column of the result of
select top 1 dlurl from rosat.images
It's a
<LINK
content-type="application/x-votable+xml;content=datalink"
title="Datalink"/>
child of the FIELD; although I'm probably to blame that such a thing
exists, I don't think we should follow this path.
I can't say if we should *require* a positive indication that the
access_url is a datalink. I guess at this point we could.
But even if we did not (and there's something to be said for that,
too), we should at least do a HEAD request to see if there's a
datalink document at the other end before retrieving and parsing it.
And access url can really point at just about anything, and we should
not torture our XML parser with potentially gigabytes of weird data.
Datalink is clear that datalink tables must come with a specific
media type, and those that don't we must not touch.
Personally, I think that if there is no datalink, the call to
getdatalink() should return None rather than an exception.
That I'm not so sure about. I'd say a clear exception "Cannot find
datalinks on this dataset" would be preferable. You see, I'd expect
typical usage of datalink to look like
next(self.getdatalink().bysemantics('#this'))
If you return None from getdatalink, that will result in an
AttributeError ("NoneType object has no attribute 'getdatalink'") and
that I think is quite a bit less helpful than coming clean about
missing datalink.
Oh, by the way: If you touch all this, I'd totally vote for renaming
adhoc.py to datalink.py or something like that. Adhoc is just the
"prefix" of the utype of the descriptor group, and we chose that way
back when because we expected there would at some point a non-ad hoc
way to construct utypes...
|
Oh, I see, this is also how to get nested datalinks, which aren't defined as resources either. I should have realized that. So indeed, all it needs is a small fix to check the type. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Most simply,
This results in an expected exception because skyview doesn't offer datalinks with its results:
but then there's another exception because getdatalink() is doing something I don't understand. Here's the code in adhoc.py around line 243:
The problem is that the service is returning a URL to a FITS file. When the get_adhocservice_by_ivoid() fails to find a datalink in the metadata of the result, it excepts into the call to DatalinkResults.from_result_url(). The problem in this case is that the data URL of this result is to a FITS file, and it's expecting a VOTable result, and it barfs trying to parse the FITS file as a VOTable.
At first I thought that there should be a check for the current content_type, e.g., only call that when it's XML or VOTable or something. But I don't understand what the use case is for this anyway, so I'm not sure what fix to propose.
Personally, I think that if there is no datalink, the call to getdatalink() should return None rather than an exception.
The text was updated successfully, but these errors were encountered: