Skip to content

Commit

Permalink
Adding an .original_row attribute to DatalinkResults.
Browse files Browse the repository at this point in the history
This is intended to fix #542.

It also does some minor improvements to the documentation of how to use
datalink.  But that part could really use a lot more love...
  • Loading branch information
msdemlei committed Jun 20, 2024
1 parent efecda8 commit a8e7643
Show file tree
Hide file tree
Showing 4 changed files with 80 additions and 24 deletions.
3 changes: 3 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ Enhancements and Fixes
any model serialized in VO-DML. This package dynamically generates python objects
whose structure corresponds to the classes of the mapped models. [#497]

- Where datalink records are made from table rows, the table row is
now accessible as datalinks.original_row. []


Deprecations and Removals
-------------------------
Expand Down
66 changes: 49 additions & 17 deletions docs/dal/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -786,23 +786,54 @@ as quantities):
>>> astropy_table = resultset.to_table()
>>> astropy_qtable = resultset.to_qtable()

Multiple datasets
-----------------
PyVO supports multiple datasets exposed on record level through the datalink.
To get an iterator yielding specific datasets, call
:py:meth:`pyvo.dal.adhoc.DatalinkResults.bysemantics` with the identifier
identifying the dataset you want it to return.
Datalink
--------

.. remove skip once https://github.com/astropy/pyvo/issues/361 is fixed
.. doctest-skip::
Datalink lets operators associate multiple artefacts with a dataset.
Examples include linking raw data, applicable or applied calibration
data, derived datasets such as extracted sources, extra documentation,
and much more.

>>> preview = next(row.getdatalink().bysemantics('#preview')).getdataset()
Datalink can both be used on result rows of queries and from
datalink-valued URLs. The typical use is to call ``iter_datalinks()``
on some DAL result; this will iterate over all datalinks pyVO finds in a
document and yields :py:class:`pyvo.dal.adhoc.DatalinkResults` instances
for them. In those, you can, for instance, pick out items by semantics,
where the standard vocabulary datalink documents use is documented at
http://www.ivoa.net/rdf/datalink/core. Here is how to find URLs for
previews:

.. note::
Since the creation of datalink objects requires a network roundtrip, it is
recommended to call ``getdatalink`` only once.
.. doctest-remote-data::
>>> rows = pyvo.dal.TAPService("http://dc.g-vo.org/tap"
... ).run_sync("select top 5 * from califadr3.cubes order by califaid")
>>> for dl in rows.iter_datalinks():
... print(next(dl.bysemantics("#preview"))["access_url"])
http://dc.zah.uni-heidelberg.de/getproduct/califa/datadr3/V1200/IC5376.V1200.rscube.fits?preview=True
http://dc.zah.uni-heidelberg.de/getproduct/califa/datadr3/COMB/IC5376.COMB.rscube.fits?preview=True
http://dc.zah.uni-heidelberg.de/getproduct/califa/datadr3/V500/IC5376.V500.rscube.fits?preview=True

The call to ``next`` in this example picks the first link marked
*preview*. For previews, this may be enough, but in general there can
be multiple links for a given semantics value for one dataset.

Of course one can also build a datalink object from its url.
It is sometimes useful to go back to the original row the datalink was
generated from; use the ``original_row`` attribute for that (which may
be None if pyvo does not know what row the datalink came from):

.. doctest-remote-data::
>>> dl.original_row["obs_title"]
'CALIFA V500 IC5376'

Rows from tables supporting datalink also have a ``getdatalink()``
method that returns a ``DatalinkResults`` instance. In general, this is
less flexible than using ``iter_datalinks``, and it may also cause more
network traffic because each such call will cause a network request.

When one has a link to a Datalink document – for instance, from an
obscore or SIAP service, where the media type is
application/x-votable;content=datalink –, one can build a
DatalinkResults using
:py:meth:`~pyvo.adhoc.DatalinkResults.from_result_url`:

.. doctest-remote-data::

Expand All @@ -811,14 +842,15 @@ Of course one can also build a datalink object from its url.
>>> url = 'https://ws.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/caom2ops/datalink?ID=ivo%3A%2F%2Fcadc.nrc.ca%2FHSTHLA%3Fhst_12477_28_acs_wfc_f606w_01%2Fhst_12477_28_acs_wfc_f606w_01_drz'
>>> datalink = DatalinkResults.from_result_url(url)


Server-side processing
----------------------
Some services support the server-side processing of record datasets.
This includes spatial cutouts for 2d-images, reducing of spectra to a certain
waveband range, and many more depending on the service.

Datalink
^^^^^^^^
Generic Datalink Processing Service
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Generic access to processing services is provided through the datalink
interface.

Expand All @@ -828,8 +860,8 @@ interface.
>>> datalink_proc = next(row.getdatalink().bysemantics('#proc'))

.. note::
most times there is only one processing service per result, and thats all you
need.
Most datalink documents only have one processing service per dataset,
which is why there is the ``get_first_proc`` shortcut mentioned below.


The returned object lets you access the available input parameters which you
Expand Down
27 changes: 21 additions & 6 deletions pyvo/dal/adhoc.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,9 @@ def iter_datalinks(self):
id1 = current_ids.pop(0)
processed_ids.append(id1)
remaining_ids.remove(id1)
yield current_batch.clone_byid(id1)
yield current_batch.clone_byid(
id1,
original_row=row)
elif row.access_format == DATALINK_MIME_TYPE:
yield DatalinkResults.from_result_url(
row.getdataurl(),
Expand Down Expand Up @@ -371,6 +373,8 @@ def from_resource(cls, rows, resource, *, session=None, **kwargs):
ref="srcGroup"/>
</GROUP>
"""
original_row = kwargs.pop("original_row", None)

input_params = _get_input_params_from_resource(resource)
# get params outside of any group
dl_params = _get_params_from_resource(resource)
Expand Down Expand Up @@ -407,7 +411,11 @@ def from_resource(cls, rows, resource, *, session=None, **kwargs):
except KeyError:
query_params[name] = query_param

return cls(accessurl, session=session, **query_params)
return cls(
accessurl,
session=session,
original_row=original_row,
**query_params)

def __init__(
self, baseurl, *, id=None, responseformat=None, session=None, **keywords):
Expand All @@ -425,6 +433,8 @@ def __init__(
session : object
optional session to use for network requests
"""
self.original_row = keywords.pop("original_row", None)

super().__init__(baseurl, session=session, **keywords)

if id is not None:
Expand All @@ -446,8 +456,11 @@ def execute(self, post=False):
DALFormatError
for errors parsing the VOTable response
"""
return DatalinkResults(self.execute_votable(post=post),
url=self.queryurl, session=self._session)
return DatalinkResults(
self.execute_votable(post=post),
url=self.queryurl,
original_row=self.original_row,
session=self._session)


class DatalinkResults(DatalinkResultsMixin, DALResults):
Expand Down Expand Up @@ -494,7 +507,7 @@ class DatalinkResults(DatalinkResultsMixin, DALResults):
"""

def __init__(self, *args, **kwargs):
self.original_row = kwargs.pop("original_row")
self.original_row = kwargs.pop("original_row", None)
super().__init__(*args, **kwargs)

def getrecord(self, index):
Expand Down Expand Up @@ -638,8 +651,10 @@ def get_first_proc(self):
return proc
raise IndexError("No processing service found in datalink result")

@classmethod
def from_result_url(cls, result_url, *, session=None, original_row=None):
res = DALResults(result_url, session=session)
res = super(DatalinkResults, cls).from_result_url(
result_url, session=session)
res.original_row = original_row
return res

Expand Down
8 changes: 7 additions & 1 deletion pyvo/dal/tests/test_datalink.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ def test_datalink():

datalinks = next(results.iter_datalinks())

assert datalinks.original_row["accsize"] == 100800

row = datalinks[0]
assert row.semantics == "#progenitor"

Expand All @@ -132,7 +134,9 @@ def test_datalink_batch():
results = vo.dal.imagesearch(
'http://example.com/obscore', (30, 30))

assert len([_ for _ in results.iter_datalinks()]) == 3
dls = list(results.iter_datalinks())
assert len(dls) == 3
assert dls[0].original_row["obs_collection"] == "MACHO"


@pytest.mark.usefixtures('proc', 'datalink_vocabulary')
Expand All @@ -143,6 +147,8 @@ def test_datalink_batch():
class TestSemanticsRetrieval:
def test_access_with_string(self):
datalinks = DatalinkResults.from_result_url('http://example.com/proc')

assert datalinks.original_row is None
res = [r["access_url"] for r in datalinks.bysemantics("#this")]
assert len(res) == 1
assert res[0].endswith("eq010000ms/20100927.comb_avg.0001.fits.fz")
Expand Down

0 comments on commit a8e7643

Please sign in to comment.