This notebook demonstrates some of the functions of the github.com/msbentley/psa_utils package.
Please raise any bugs or criticisms as GitHub issues
from psa_utils import download, tap, pdap, packager, geogen, internal
This modules uses requests
to make simple calls to a PDAP service. By default it uses the PSA unless another URL is given:
p = pdap.Pdap()
The first function uses the meta-data endpoint to lists available datasets:
get_datasets
¶dsets = p.get_datasets()
dsets.head()
DATA_SET.DATA_SET_ID | DATA_SET.DATA_SET_NAME | DATA_SET.DATA_ACCESS_REFERENCE | DATA_SET.XML_DESCRIPTION | DATA_SET.PRODUCER.FULL_NAME | DATA_SET.PRODUCER.INSTITUTION_NAME | DATA_SET.PRODUCER.NODE_NAME | DATA_SET.START_TIME | DATA_SET.STOP_TIME | DATA_SET.NPRODUCTS | DATA_SET.MISSION_NAME | DATA_SET.INSTRUMENT_ID | DATA_SET.INSTRUMENT_NAME | DATA_SET.TARGET_NAME | RESOURCE_CLASS | DATA_SET.RELEASE_DATE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0 | AIRUB-HALLEY-PHOTOGRAPHIC-PROJECT-EDR-1986-V1.0 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='AIR... | 1986-02-17T06:44:00 | 1986-04-17T09:13:00 | 1833 | EARTH | 300 | PENTACON-OPTICS-F4-300MM | 1P/HALLEY | DATA_SET | 2006-03-01 | |||
1 | CH1ORB-L-C1XS-2-NPO-EDR-V1.0 | CHANDRAYAAN-1-ORBITER MOON C1XS 2 NPO EDR V1.0 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='CH1... | 2008-10-22T03:37:35 | 2009-08-28T18:21:31 | 11738 | CHANDRAYAAN-1 | C1XS | C1XS | MOON | DATA_SET | 2019-07-27 | |||
2 | CH1ORB-L-C1XS-4-NPO-REFDR-V1.0 | CHANDRAYAAN-1-ORBITER MOON C1XS 4 NPO REFDR V1.0 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='CH1... | 2008-11-20T18:32:11.358 | 2009-08-28T18:21:01.505 | 1675 | CHANDRAYAAN-1 | C1XS | C1XS | MOON | DATA_SET | 2019-07-27 | |||
3 | CH1ORB-L-SARA-2-NPO-EDR-CENA-V1.0 | CHANDRAYAAN-1-ORBITER MOON SARA 2 NPO EDR CENA... | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='CH1... | 2008-12-08T11:54:53.374 | 2009-08-12T17:02:49.417 | 835 | CHANDRAYAAN-1 | SARA | SARA | MOON | DATA_SET | 2019-11-26 | |||
4 | CH1ORB-L-SARA-2-NPO-EDR-SWIM-V1.0 | CHANDRAYAAN-1-ORBITER MOON SARA 2 NPO EDR SWIM... | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='CH1... | 2008-12-08T13:53:52.117 | 2009-08-12T17:02:49.319 | 1424 | CHANDRAYAAN-1 | SARA | SARA | MOON | DATA_SET | 2019-12-05 |
We have ALL of the datasets now, and their basic meta data, which we can easily slice and dice.
For example, let's find those with Mars as a target:
dsets[dsets['DATA_SET.TARGET_NAME'].str.contains('mars', case=False)]
DATA_SET.DATA_SET_ID | DATA_SET.DATA_SET_NAME | DATA_SET.DATA_ACCESS_REFERENCE | DATA_SET.XML_DESCRIPTION | DATA_SET.PRODUCER.FULL_NAME | DATA_SET.PRODUCER.INSTITUTION_NAME | DATA_SET.PRODUCER.NODE_NAME | DATA_SET.START_TIME | DATA_SET.STOP_TIME | DATA_SET.NPRODUCTS | DATA_SET.MISSION_NAME | DATA_SET.INSTRUMENT_ID | DATA_SET.INSTRUMENT_NAME | DATA_SET.TARGET_NAME | RESOURCE_CLASS | DATA_SET.RELEASE_DATE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34 | MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT1-V1.0 | MARS EXPRESS ASPERA-3 RAW-CAL NTRL PARTICLE IM... | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='MEX... | 2006-01-01T22:42:28.096 | 2007-10-01T01:42:54.52 | 1532 | MARS EXPRESS | ASPERA-3 | ANALYZER OF SPACE PLASMA AND ENERGETIC ATOMS (... | MARS | DATA_SET | 2007-01-31 | |||
35 | MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT2-V1.0 | MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT2-V1.0 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='MEX... | 2007-10-01T04:39:48.77 | 2009-12-31T22:38:25.988 | 2302 | MARS EXPRESS | ASPERA-3 | ANALYZER OF SPACE PLASMA AND ENERGETIC ATOMS (... | MARS | DATA_SET | 2007-01-31 | |||
36 | MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT3-V1.0 | MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT3-V1.0 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='MEX... | 2010-01-01T15:49:39.192 | 2012-12-31T22:26:30.631 | 2026 | MARS EXPRESS | ASPERA-3 | ANALYZER OF SPACE PLASMA AND ENERGETIC ATOMS (... | MARS | DATA_SET | 2011-01-19 | |||
37 | MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT4-V1.0 | MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT4-V1.0 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='MEX... | 2013-01-01T01:34:28.707 | 2015-01-01T03:40:04.823 | 1948 | MARS EXPRESS | ASPERA-3 | ANALYZER OF SPACE PLASMA AND ENERGETIC ATOMS (... | MARS | DATA_SET | 2017-02-07 | |||
38 | MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT5-V1.0 | MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT5-V1.0 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='MEX... | 2015-01-01T04:00:26.611 | 2017-01-01T02:45:17.442 | 1465 | MARS EXPRESS | ASPERA-3 | ANALYZER OF SPACE PLASMA AND ENERGETIC ATOMS (... | MARS | DATA_SET | 2017-02-07 | |||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7080 | RO-X-SREM-2-MARS-V1.0 | ROSETTA-ORBITER X SREM 2 MARS V1.0 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='RO-... | 2006-07-29T00:00:27.741 | 2007-05-28T23:57:32.769 | 304 | INTERNATIONAL ROSETTA MISSION | N/A | STANDARD RADIATION ENVIROMENT MONITOR | MARS | DATA_SET | 2020-12-08 | |||
7129 | RO-X-SREM-5-MARS-V1.0 | ROSETTA-ORBITER X SREM 5 MARS V1.0 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='RO-... | 2006-07-29T00:00:27.741 | 2007-05-28T23:57:32.769 | 303 | INTERNATIONAL ROSETTA MISSION | N/A | STANDARD RADIATION ENVIROMENT MONITOR | MARS | DATA_SET | 2020-12-08 | |||
7177 | urn:esa:psa:em16_tgo_acs | ACS | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='urn... | 2016-03-14T00:00:00 | 2021-03-07T20:06:09.614 | 2281756 | ExoMars 2016 | ACS | ACS | urn:nasa:pds:context:target:planet.mars | DATA_SET | 2020-07-31 | |||
7178 | urn:esa:psa:em16_tgo_cas | Instrument CASSIS | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='urn... | 2016-03-14T00:00:00 | 2021-03-07T21:44:08.425 | 4166552 | ExoMars 2016 | CaSSIS | CASSIS | urn:nasa:pds:context:target:planet.mars | DATA_SET | 2020-04-30 | |||
7180 | urn:esa:psa:em16_tgo_nmd | NOMAD | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | http://psa.esa.int/pdap/files?DATA_SET_ID='urn... | 2016-04-04T06:00:00 | 2021-03-07T23:10:23.793 | 149874 | ExoMars 2016 | NOMAD | NOMAD | urn:nasa:pds:context:target:planet.mars | DATA_SET | 2020-02-29 |
2751 rows × 16 columns
get_products
¶p.get_products?
Signature: p.get_products(dataset_id) Docstring: Queries the meta-data endpoint for products in the dataset ID given in the call File: ~/Dropbox/work/bepi/software/psa_utils/psa_utils/pdap.py Type: method
products = p.get_products(dataset_id='RO-X-SREM-2-MARS-V1.0')
products.head()
PRODUCT.PRODUCT_ID | PRODUCT.DATA_ACCESS_REFERENCE | DATA_SET.DATA_SET_ID | DATA_SET.DATA_SET_NAME | DATA_SET.MISSION_NAME | DATA_SET.PRODUCER.FULL_NAME | DATA_SET.PRODUCER.INSTITUTION_NAME | DATA_SET.PRODUCER.NODE_NAME | PRODUCT.TARGET_NAME | PRODUCT.TARGET_TYPE | PRODUCT.INSTRUMENT_ID | PRODUCT.INSTRUMENT_NAME | PRODUCT.START_TIME | PRODUCT.STOP_TIME | PRODUCT.ICON_ACCESS_REFERENCE | RESOURCE_CLASS | VID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070528 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | RO-X-SREM-2-MARS-V1.0 | ROSETTA-ORBITER X SREM 2 MARS V1.0 | INTERNATIONAL ROSETTA MISSION | MARS | PLANET | N/A | STANDARD RADIATION ENVIROMENT MONITOR | 2007-05-28T00:03:03.755 | 2007-05-28T23:57:32.769 | PRODUCT | 1.0 | ||||
1 | RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070527 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | RO-X-SREM-2-MARS-V1.0 | ROSETTA-ORBITER X SREM 2 MARS V1.0 | INTERNATIONAL ROSETTA MISSION | MARS | PLANET | N/A | STANDARD RADIATION ENVIROMENT MONITOR | 2007-05-27T00:00:04.24 | 2007-05-27T23:59:04.255 | PRODUCT | 1.0 | ||||
2 | RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070526 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | RO-X-SREM-2-MARS-V1.0 | ROSETTA-ORBITER X SREM 2 MARS V1.0 | INTERNATIONAL ROSETTA MISSION | MARS | PLANET | N/A | STANDARD RADIATION ENVIROMENT MONITOR | 2007-05-26T00:00:28.225 | 2007-05-26T23:56:03.74 | PRODUCT | 1.0 | ||||
3 | RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070525 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | RO-X-SREM-2-MARS-V1.0 | ROSETTA-ORBITER X SREM 2 MARS V1.0 | INTERNATIONAL ROSETTA MISSION | MARS | PLANET | N/A | STANDARD RADIATION ENVIROMENT MONITOR | 2007-05-25T00:00:00.211 | 2007-05-25T23:56:32.225 | PRODUCT | 1.0 | ||||
4 | RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070524 | http://psa.esa.int/pdap/download?RESOURCE_CLAS... | RO-X-SREM-2-MARS-V1.0 | ROSETTA-ORBITER X SREM 2 MARS V1.0 | INTERNATIONAL ROSETTA MISSION | MARS | PLANET | N/A | STANDARD RADIATION ENVIROMENT MONITOR | 2007-05-24T00:02:32.696 | 2007-05-24T23:56:00.211 | PRODUCT | 1.0 |
Let's look at what one entry contains:
products.iloc[0]
PRODUCT.PRODUCT_ID RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070528 PRODUCT.DATA_ACCESS_REFERENCE http://psa.esa.int/pdap/download?RESOURCE_CLAS... DATA_SET.DATA_SET_ID RO-X-SREM-2-MARS-V1.0 DATA_SET.DATA_SET_NAME ROSETTA-ORBITER X SREM 2 MARS V1.0 DATA_SET.MISSION_NAME INTERNATIONAL ROSETTA MISSION DATA_SET.PRODUCER.FULL_NAME DATA_SET.PRODUCER.INSTITUTION_NAME DATA_SET.PRODUCER.NODE_NAME PRODUCT.TARGET_NAME MARS PRODUCT.TARGET_TYPE PLANET PRODUCT.INSTRUMENT_ID N/A PRODUCT.INSTRUMENT_NAME STANDARD RADIATION ENVIROMENT MONITOR PRODUCT.START_TIME 2007-05-28T00:03:03.755 PRODUCT.STOP_TIME 2007-05-28T23:57:32.769 PRODUCT.ICON_ACCESS_REFERENCE RESOURCE_CLASS PRODUCT VID 1.0 Name: 0, dtype: object
get_product
¶p.get_product?
Signature: p.get_product(product_id) Docstring: <no docstring> File: ~/Dropbox/work/bepi/software/psa_utils/psa_utils/pdap.py Type: method
p.get_product('RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070528')
PRODUCT.PRODUCT_ID RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070528 PRODUCT.DATA_ACCESS_REFERENCE http://psa.esa.int/pdap/download?RESOURCE_CLAS... DATA_SET.DATA_SET_ID RO-X-SREM-2-MARS-V1.0 DATA_SET.DATA_SET_NAME ROSETTA-ORBITER X SREM 2 MARS V1.0 DATA_SET.MISSION_NAME INTERNATIONAL ROSETTA MISSION DATA_SET.PRODUCER.FULL_NAME DATA_SET.PRODUCER.INSTITUTION_NAME DATA_SET.PRODUCER.NODE_NAME PRODUCT.TARGET_NAME MARS PRODUCT.TARGET_TYPE PLANET PRODUCT.INSTRUMENT_ID N/A PRODUCT.INSTRUMENT_NAME STANDARD RADIATION ENVIROMENT MONITOR PRODUCT.START_TIME 2007-05-28T00:03:03.755 PRODUCT.STOP_TIME 2007-05-28T23:57:32.769 PRODUCT.ICON_ACCESS_REFERENCE RESOURCE_CLASS PRODUCT VID 1.0 Name: 0, dtype: object
get_files
¶Uses the files endpoint to retrieve a list of files in a given dataset.
files = p.get_files(dataset_id='RO-X-SREM-2-MARS-V1.0')
files.head()
Reference | DataSetId | ProductId | RELATIVE_DIRECTORY | Filename | |
---|---|---|---|---|---|
0 | http://psa.esa.int/pdap/fileaccess?ID=INTERNAT... | RO-X-SREM-2-MARS-V1.0 | SREM_L2_20060918 | /DATA/ | SREM_L2_20060918.TAB |
1 | http://psa.esa.int/pdap/fileaccess?ID=INTERNAT... | RO-X-SREM-2-MARS-V1.0 | SREM_L2_20061227 | /DATA/ | SREM_L2_20061227.TAB |
2 | http://psa.esa.int/pdap/fileaccess?ID=INTERNAT... | RO-X-SREM-2-MARS-V1.0 | SREM_L2_20060926 | /DATA/ | SREM_L2_20060926.TAB |
3 | http://psa.esa.int/pdap/fileaccess?ID=INTERNAT... | RO-X-SREM-2-MARS-V1.0 | /EXTRAS/ | SREM_ROSETTA_PACC_20070508.CDF | |
4 | http://psa.esa.int/pdap/fileaccess?ID=INTERNAT... | RO-X-SREM-2-MARS-V1.0 | /EXTRAS/ | SREM_ROSETTA_PACC_20070419.CDF |
files.iloc[0]
Reference http://psa.esa.int/pdap/fileaccess?ID=INTERNAT... DataSetId RO-X-SREM-2-MARS-V1.0 ProductId SREM_L2_20060918 RELATIVE_DIRECTORY /DATA/ Filename SREM_L2_20060918.TAB Name: 0, dtype: object
The tap
module contains a single class PsaTap
and various convenience functions that call this class. It is basically a very thin wrapper around astroquery
's TAP functionality.
psa = tap.PsaTap()
Currently PsaTap includes only a single method, query
which itself calls the astroquery
Tap function and converts the returned data to a Pandas DataFrame.
query
¶psa.query?
Signature: psa.query( q, sync=True, dropna=True, verbose=False, job_wait_cycles=10, job_wait_time=2, ) Docstring: Make a simple query and return the data as a pandas DataFrame File: ~/Dropbox/work/bepi/software/psa_utils/psa_utils/tap.py Type: method
By default you can simply pass a query and this will run a synchronous query job and return. For queries that may return >2k results, set sync=False - you can adjust the number of wait cycles, and the time (in seconds) of each, before query aborts.
Note also the dropna boolean - if True, any columns which are all NaN will be dropped from the returned DataFrame. This is useful because the epn_core schema contains a lot of fields which are not yet populated in the PSA database.
top10 = psa.query('select top 10 * from epn_core')
top10.head()
access_estsize | access_format | access_url | creation_date | dataproduct_type | granule_gid | granule_uid | instrument_host_name | instrument_name | measurement_type | ... | processing_level | release_date | service_title | spatial_frame_type | s_region | target_class | target_name | thumbnail_url | time_max | time_min | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 24 | application/x-pds-zip | https://archives.esac.esa.int/psa/pdap/downloa... | 2021-03-08T14:04:30.978734 | ci | CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA | CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R... | Chandrayaan-1 | C1XS | ... | 3 | 2019-07-27T00:00:00.0 | psa | none | satellite | Moon | 2008-11-20 18:55:51.000009472 | 2008-11-20 18:27:32.000011520 | |||
1 | 25 | application/x-pds-zip | https://archives.esac.esa.int/psa/pdap/downloa... | 2021-03-08T14:04:30.978734 | ci | CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA | CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R... | Chandrayaan-1 | C1XS | ... | 3 | 2019-07-27T00:00:00.0 | psa | none | satellite | Moon | 2008-11-23 00:24:13.999999232 | 2008-11-22 23:38:54.999981568 | |||
2 | 24 | application/x-pds-zip | https://archives.esac.esa.int/psa/pdap/downloa... | 2021-03-08T14:04:30.978734 | ci | CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA | CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R... | Chandrayaan-1 | C1XS | ... | 3 | 2019-07-27T00:00:00.0 | psa | none | satellite | Moon | 2008-11-24 02:31:26.000002816 | 2008-11-24 02:31:26.000002816 | |||
3 | 24 | application/x-pds-zip | https://archives.esac.esa.int/psa/pdap/downloa... | 2021-03-08T14:04:30.978734 | ci | CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA | CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R... | Chandrayaan-1 | C1XS | ... | 3 | 2019-07-27T00:00:00.0 | psa | none | satellite | Moon | 2008-11-24 04:00:05.999994368 | 2008-11-24 04:00:05.999994368 | |||
4 | 24 | application/x-pds-zip | https://archives.esac.esa.int/psa/pdap/downloa... | 2021-03-08T14:04:30.978734 | ci | CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA | CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R... | Chandrayaan-1 | C1XS | ... | 3 | 2019-07-27T00:00:00.0 | psa | none | satellite | Moon | 2008-11-28 14:45:38.000008448 | 2008-11-28 14:45:38.000008448 |
5 rows × 22 columns
Let's look at one single entry:
top10.iloc[0]
access_estsize 24 access_format application/x-pds-zip access_url https://archives.esac.esa.int/psa/pdap/downloa... creation_date 2021-03-08T14:04:30.978734 dataproduct_type ci granule_gid CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA granule_uid CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R... instrument_host_name Chandrayaan-1 instrument_name C1XS measurement_type modification_date 2021-03-08T14:04:30.978734 obs_id CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R... processing_level 3 release_date 2019-07-27T00:00:00.0 service_title psa spatial_frame_type none s_region target_class satellite target_name Moon thumbnail_url time_max 2008-11-20 18:55:51.000009472 time_min 2008-11-20 18:27:32.000011520 Name: 0, dtype: object
Here note also that time_min
and time_max
have been converted from Julian dates to standard date/times
Now let's try exceeding that 2k limit, also increasing astroquery's verbosity:
size_test = psa.query("select * from epn_core where instrument_name='OSIRIS'", verbose=True)
Launched query: 'select TOP 2000 * from epn_core where instrument_name='OSIRIS'' ------>https host = archives.esac.esa.int:443 context = /psa/epn-tap/tap//sync Content-type = application/x-www-form-urlencoded 200 [('Date', 'Mon, 08 Mar 2021 17:46:50 GMT'), ('Server', 'Apache/2.4.6 (Red Hat Enterprise Linux)'), ('Cache-Control', 'no-cache, no-store, max-age=0, must-revalidate'), ('Pragma', 'no-cache'), ('Expires', '0'), ('X-XSS-Protection', '1; mode=block'), ('X-Frame-Options', 'DENY'), ('X-Content-Type-Options', 'nosniff'), ('Content-Type', 'application/x-votable+xml'), ('Set-Cookie', 'JSESSIONID=1C2A96F3B34D2B9EAF6F85EDAD9FCA3A;path=/psa/epn-tap;HttpOnly'), ('Vary', 'Accept-Encoding'), ('Transfer-Encoding', 'chunked')] Retrieving sync. results... Query finished. WARNING 2021-03-08 18:46:51 (psa_utils.tap): results incomplete due to synchronous query limit - repeat with sync=false
len(size_test)
2000
So here we see that astroquery actually inserts an extra TOP 2000
clause into the query statement. Note that if you specify TOP
in your query, you automatically override this:
size_test = psa.query("select top 3000 * from epn_core where instrument_name='OSIRIS'", verbose=True)
Launched query: 'select top 3000 * from epn_core where instrument_name='OSIRIS'' ------>https host = archives.esac.esa.int:443 context = /psa/epn-tap/tap//sync Content-type = application/x-www-form-urlencoded 200 [('Date', 'Mon, 08 Mar 2021 17:46:55 GMT'), ('Server', 'Apache/2.4.6 (Red Hat Enterprise Linux)'), ('Cache-Control', 'no-cache, no-store, max-age=0, must-revalidate'), ('Pragma', 'no-cache'), ('Expires', '0'), ('X-XSS-Protection', '1; mode=block'), ('X-Frame-Options', 'DENY'), ('X-Content-Type-Options', 'nosniff'), ('Content-Type', 'application/x-votable+xml'), ('Set-Cookie', 'JSESSIONID=BC585CFC3BAFDB3CE865FD0271FC609F;path=/psa/epn-tap;HttpOnly'), ('Vary', 'Accept-Encoding'), ('Transfer-Encoding', 'chunked')] Retrieving sync. results... Query finished.
len(size_test)
3000
There is also the function, especially for larger queries, to run asynchronously:
asyn = psa.query("select top 10000 * from epn_core where instrument_name='ACS'",
sync=False, job_wait_cycles=2, job_wait_time=10)
len(asyn)
10000
You can of course also perform queries that return values other than than a product list:
psa.query("select count(*) from epn_core where instrument_name='MCAM'")
count_all | |
---|---|
0 | 1715 |
Note that products have an access_url which will download the product - this is used in the download
module.
psa.query("select count(*) from epn_core where instrument_name='CaSSIS'")
count_all | |
---|---|
0 | 4087604 |
psa.query("select count(*) from epn_core where instrument_name='CaSSIS' and granule_uid like '%sti%'")
count_all | |
---|---|
0 | 24435 |
cassis = psa.query("select top 10 * from epn_core where instrument_name='CaSSIS' and granule_uid like '%sti%'")
cassis.iloc[0]
access_estsize 112442 access_format application/x-pds-zip access_url https://archives.esac.esa.int/psa/pdap/downloa... creation_date 2021-03-08T14:04:30.978734 dataproduct_type ci granule_gid urn:esa:psa:em16_tgo_cas:data_calibrated granule_uid urn:esa:psa:em16_tgo_cas:data_calibrated:cas_c... instrument_host_name ExoMars 2016 instrument_name CaSSIS measurement_type modification_date 2021-03-08T14:04:30.978734 obs_id urn:esa:psa:em16_tgo_cas:data_calibrated:cas_c... processing_level 3 release_date 2018-10-14T00:00:00.0 service_title psa spatial_frame_type none s_region target_class planet target_name Mars thumbnail_url https://archives.esac.esa.int/psa/pdap/fileacc... time_max 2018-04-14 16:15:30.000011520 time_min 2018-04-14 16:15:16.000007168 Name: 0, dtype: object
So you can always use this access_url
to download the data product - the download
module has functions to do just this.
download_by_query
¶This is the key function in this module - it accepts an ADQL query string, uses the tap
module to run the query, downloads the referenced files (if they are public) and unzips them to the location of your choice.
download.download_by_query?
Signature: download.download_by_query(query, output_dir='.', unzip=True, tidy=True) Docstring: Runs a query against the PSA's EPN-TAP interface. Any products which match, and are public (have a download URL) will be downloaded and the zips placed into output_dir. If unzip=True they will be unzipped into output_dir and if tidy=True the zips will be removed after use File: ~/Dropbox/work/bepi/software/psa_utils/psa_utils/download.py Type: function
q = "select top 2 * from epn_core where instrument_name='OSIRIS' and granule_uid like '%.FIT%'"
filelist = download.download_by_query(q, output_dir='/tmp')
INFO 2021-03-08 18:47:36 (psa_utils.download): downloading product N20080814T023137516ID20F22.FIT INFO 2021-03-08 18:47:39 (psa_utils.download): downloaded file Download-20210308184736.zip INFO 2021-03-08 18:47:39 (psa_utils.download): downloading product N20080804T023124677ID20F22.FIT INFO 2021-03-08 18:47:40 (psa_utils.download): downloaded file Download-20210308184739.zip
filelist
['/tmp/RO-A-OSINAC-2-AST1-STEINSFLYBY-V2.0/DATA/FIT/N20080814T023137516ID20F22.LBL', '/tmp/RO-A-OSINAC-2-AST1-STEINSFLYBY-V2.0/DATA/FIT/N20080804T023124677ID20F22.LBL', '/tmp/RO-A-OSINAC-2-AST1-STEINSFLYBY-V2.0/DATA/FIT/N20080804T023124677ID20F22.FIT', '/tmp/inventory.txt', '/tmp/RO-A-OSINAC-2-AST1-STEINSFLYBY-V2.0/DATA/FIT/N20080814T023137516ID20F22.FIT']
Note that in most cases these settings are what you want to retrieve products. If you prefer to keep the zips after extract, set tidy=False - you will still get the list of individual files returned. If you just want to download the zips, set unzip=False and you will get the list of zips returned:
filelist = download.download_by_query(q, output_dir='/tmp', unzip=False)
WARNING 2021-03-08 18:47:40 (psa_utils.download): cannot remove source files without decompressiong - setting tidy=False INFO 2021-03-08 18:47:46 (psa_utils.download): downloading product N20080814T023137516ID20F22.FIT INFO 2021-03-08 18:47:48 (psa_utils.download): downloaded file Download-20210308184746.zip INFO 2021-03-08 18:47:48 (psa_utils.download): downloading product N20080804T023124677ID20F22.FIT INFO 2021-03-08 18:47:49 (psa_utils.download): downloaded file Download-20210308184748.zip
filelist
['/tmp/Download-20210308184746.zip', '/tmp/Download-20210308184748.zip']
There is a convenience function to download by the logical identifer, or product ID:
download.download_by_lid?
Signature: download.download_by_lid(lid, output_dir='.', unzip=True, tidy=True) Docstring: <no docstring> File: ~/Dropbox/work/bepi/software/psa_utils/psa_utils/download.py Type: function
download.download_by_lid('N20080804T023124677ID20F22', output_dir='/tmp')
INFO 2021-03-08 18:47:54 (psa_utils.download): downloading product N20080804T023124677ID20F22.FIT INFO 2021-03-08 18:47:55 (psa_utils.download): downloaded file Download-20210308184755.zip INFO 2021-03-08 18:47:55 (psa_utils.download): downloading product N20080804T023124677ID20F22.IMG INFO 2021-03-08 18:47:57 (psa_utils.download): downloaded file Download-20210308184755.zip
Note that all this really does is look for records with granule_uid matching the substring passed. So in this case the same data were archived as a PDS3 .IMG image and FITS file, so both were matched.
Sometimes you cannot narrow down a search using TAP or PDAP, but need to look into the custom meta-data in the products. In this case you do not want to download gigabytes of data, but only the labels.
This is a bit of a "hack" using the PDAP files
endpoint with EPN-TAP, but it works with the following caveats:
download.download_labels_by_query("select top 10 * from epn_core where instrument_name='MCAM'", output_dir='/tmp')
INFO 2021-03-08 18:47:59 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170632_01_f__a0100.xml INFO 2021-03-08 18:47:59 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170738_02_f__t0005.xml INFO 2021-03-08 18:47:59 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170748_03_f__t0020.xml INFO 2021-03-08 18:47:59 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170758_04_f__t0040.xml INFO 2021-03-08 18:47:59 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170808_05_f__t0080.xml INFO 2021-03-08 18:48:00 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170818_06_f__t0200.xml INFO 2021-03-08 18:48:00 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181026t103952_00_f__t0020.xml INFO 2021-03-08 18:48:00 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181027t060956_39_f__t0020.xml INFO 2021-03-08 18:48:01 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181027t061002_40_f__t0020.xml INFO 2021-03-08 18:48:01 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181027t061008_41_f__t0020.xml
This is a module specifically designed to package PDS4 products into a delivery package recognised by the PSA. It is probably not useful unless you are an external data provider preparing PDS4 products (not bundles), or an internal user!
packager.Packager?
Init signature: packager.Packager( products='*.xml', input_dir='.', recursive=True, output_dir='.', template=None, use_dir=False, clean=True, ) Docstring: <no docstring> Init docstring: Initialise the packager class. Accepts the following: products - file pattern to match labels (*.xml default) input_dir - the root directory for the labels (default=.) use_dir - uses the product directory structure for the archive (if false, a minimal structure will be adopted) clean - if True, removes the generated files are generating the tarball File: ~/Dropbox/work/bepi/software/psa_utils/psa_utils/packager.py Type: type Subclasses:
Most inputs should be clear - you need to specify the location and wildcard to match the input files - note that a delivery can only be for a single bundle, so use these filters to ensure you select products belonging to a single bundle!
A template PDS4 label is needed for complete the product delivery label. A default template is needed, and bundled with the source code. But should you wish to specify a different template, this can be done with the template keyword.
The main additional option is to set use_dir
. If true, the path from input_dir
will be used to build the archive structure (e.g. the directory structure users will see when downloading the files). If false, then all files will be placed in the root of the collection.
This module contains some routines to help working with the PSA geometry generator (geogen). These are likely only of interest to internal users.
generate_plf
¶geogen.generate_plf?
Signature: geogen.generate_plf( config_file, files=None, directory='.', table=None, extras={}, ) Docstring: Generates a GEOGEN plf input file. pds4_utils.Database() is used to scrape meta-data according to the config_file. files= specifies the label file pattern (defaults to *.xml) directory= specified the root of the input files (and the output location) table= specifies the table name in case the input file is configured to produce more than one. Default (None) assumes only one table. extras = a dictionary which provides extra static key/value pairs to be added to every entry (e.g. product type or similar). If an identical value exists in the table and extras, extras has priority. File: ~/Dropbox/work/bepi/software/psa_utils/psa_utils/geogen.py Type: function
Note that for this to work, pds4_utils
must be installed. The config_file
listed here is a pds4_utils.Database
configuration file which is used to scrape relevant meta-data from a collection of PDS4 products and output to a json file in the format expected by geogen.
Ingest_Test
¶This uses a configuration and template to effectively replicate a single PDS4 product according to the name, type, sub-instruments etc. in the configuration file
internal.Ingest_Test?
Init signature: internal.Ingest_Test( config_file='ingestion_test.yml', template_label='test_product.xml', output_dir='.', package=False, ) Docstring: A class for generating test products from a label and data product template and a configuration file specifying the instrument-specific data File: ~/Dropbox/work/bepi/software/psa_utils/psa_utils/internal.py Type: type Subclasses:
build_context_json
¶This routine builds a local_context_products.json file as used by the PDS validate. It is designed for PSA internal use.
It requires pds4_utils
.
internal.build_context_json?
Signature: internal.build_context_json( config_file, input_dir='.', output_dir='.', json_name='local_context_products.json', table='context_bundle', ) Docstring: Generates a json file listing the name, type and LIDVID of all context files in input_dir. Generates a local context json file which can be used by the PDS validate tool and writes it to output_dir pds4_utils.Database() is used to scrape meta-data according to the config_file. File: ~/Dropbox/work/bepi/software/psa_utils/psa_utils/internal.py Type: function
collection_summary
¶This routine generates an html table corresponding to specific meta-data scraped from instrument and mission context products. It is used to prototype information which is needed o populate Google Dataset Search and for a DOI landing page.
internal.collection_summary?
Signature: internal.collection_summary( config_file, input_dir='.', output_dir='.', context_dir='.', ) Docstring: collection_summary accesses meta-data in a collection label or referenced from it, to produce a set of summary information needed to register a DOI and/or create a Google Dataset Search landing page. File: ~/Dropbox/work/bepi/software/psa_utils/psa_utils/internal.py Type: function