Reading from and writing to S3 object storage is a bit different than regular filesystems. Here we access both public read and requester pays buckets and write to a bucket allowed by specified AWS credentials. We rely heavily on fsspec
, which offers filesystem interfaces to S3 (also HTTPS, FTP and many others) in Python.
import fsspec
import pandas as pd
fs = fsspec.filesystem('s3', anon=True)
fs.ls('anaconda-public-datasets')
['anaconda-public-datasets/enron-email', 'anaconda-public-datasets/fashion-mnist', 'anaconda-public-datasets/gdelt', 'anaconda-public-datasets/iris', 'anaconda-public-datasets/nyc-taxi', 'anaconda-public-datasets/reddit']
You can also use glob to explore items
fs.glob('anaconda-public-datasets/*/*.csv')
['anaconda-public-datasets/iris/iris.csv', 'anaconda-public-datasets/nyc-taxi/boros.csv']
infile = fsspec.open("s3://anaconda-public-datasets/iris/iris.csv",
mode='rt', anon=True)
with infile as f:
df = pd.read_csv(f)
df
5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa | |
---|---|---|---|---|---|
0 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
1 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
2 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
3 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
4 | 5.4 | 3.9 | 1.7 | 0.4 | Iris-setosa |
... | ... | ... | ... | ... | ... |
144 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
145 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
146 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
147 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
148 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
149 rows × 5 columns
To write data, you must first set up your AWS credentials. Open a terminal and type aws configure --profile esip-qhub
to run a script which will ask for your AWS credentials (the aws_access_key_id
and aws_secret_access_key
provided to you. These will be stored in /home/jovyan/.aws/credentials
with the profile name esip-qhub
. Make sure you don't commit or share that file anywhere!).
Once the credentials are in place, you should be able to write to buckets where your credentials have permission. On the ESIP qhub, this is the s3://esip-qhub
bucket.
Write CSV
outfile = fsspec.open(f"s3://esip-qhub/usgs/testing/iris.csv",
mode='wt', profile='esip-qhub')
with outfile as f:
df.to_csv(f)
List items in a bucket
fs = fsspec.filesystem('s3', anon=False, profile='esip-qhub')
fs.ls(f'esip-qhub/usgs/testing/')
['esip-qhub/usgs/testing/2651-A.nc', 'esip-qhub/usgs/testing/8544pcs-cal_z3.nc', 'esip-qhub/usgs/testing/cesmLE-20C-SSH.zarr', 'esip-qhub/usgs/testing/iris.csv', 'esip-qhub/usgs/testing/zarr_test']
import xarray as xr
infile = fsspec.open('simplecache::https://geoport.usgs.esipfed.org/erddap/files/8544pcs-cal_z3/8544pcs-cal_z3.nc')
ds = xr.open_dataset(infile.open(), engine='h5netcdf')
/home/conda/store/1275b47015e2a68aaa499ef3ef564199cf91d4ca75ac4fd29f3f296fd0ee70d0-pangeo/lib/python3.8/site-packages/h5netcdf/core.py:763: FutureWarning: String decoding changed with h5py >= 3.0. Currently backwards compatibility with h5py < 3.0 is kept by decoding strings per default. This will change in future versions for consistency with h5py >= 3.0. Setting 'decode_strings=True' forces string decoding. warnings.warn(msg, FutureWarning, stacklevel=0)
ds
<xarray.Dataset> Dimensions: (feature_type_instance: 11, time: 2276) Coordinates: * feature_type_instance (feature_type_instance) |S1 b'8' b'5' ... b'a' b'l' latitude float32 35.18 longitude float32 -75.42 * time (time) datetime64[ns] 2009-01-12T23:51:17 ... 2009... z float64 -12.95 Data variables: crs int32 -2147483647 platform int32 -2147483647 P_4023 (time) float32 23.7 23.81 23.77 ... 26.34 26.37 26.35 SDP_850 (time) float32 10.45 11.63 11.02 ... 48.2 50.49 48.5 Attributes: creator_email: rsignell@usgs.gov ExtPressInstalled: ParosFreq PCADPProbeProfRange: [1.586] PCADPProbeFrequency: [1500000.] CTDInstalled: No CREATION_DATE: 15-Jul-2010 11:13:09 creator_url: http://www.usgs.gov DATA_TYPE: TIME+PROFILE CellSize: [0.063] POS_CONST: [0] PCADPUserSetupPCMode: Yes PCADPProbeSerial: H96 PCADPUserSetupBurstMode: Enabled Deployment_date: 12-Jan-2009 EXPERIMENT: Carolinas Coastal Change Processes Project RecorderInstalled: Yes source: USGS publisher_phone: +1 (508) 548-8700 contributor_name: J. Warner PCADPProbeResMaxHorizVel: [0.87298013] PCADPProbeXformMat: [ 2638. -1319. -1319. 0. -2284. 228... Recovery_date: 11-May-2009 ExtSensorInstalled: Yes PCADPProbeBlankDistance: [0.2] id: 8544pcs-cal_z3 PressScale_2: [0.] creator_phone: +1 (508) 548-8700 PressOffset: [0.] FILL_FLAG: [1] PCADPProbeProfMaxVertVel: [0.22594388] PCADPProbeProfMaxHorizVel: [0.87298013] note: velocity data replaced with fillValue af... PCADPUserSetupNcells: [22.] PCADPUserSetupAvgInterval: [1.] PCADPUserSetupBurstInterval: [3600.] original_folder: DIAMONDSHOALS publisher_email: emontgomery@usgs.gov latitude: [35.17863] standard_name_vocabulary: CF-1.6 CompassInstalled: Yes TempInstalled: Yes DATA_SUBTYPE: DESCRIPT: Sontek PCADP raw data burst file nco_openmp_thread_number: [1] PressInstalled: No SensorOrientation: down PCADPProbeResMaxVertVel: [0.60968984] PCADPProbeNbeams: [3.] WATER_DEPTH: [14.72184] dspSoftwareVerNum: [4.] PCADPProbeNpingsPerBeam: [5.] PCADPUserSetupProfilesPerBurst: [1050.] DEPTH_CONST: [1] COORD_SYSTEM: GEOGRAPHICAL+PROFILE keywords: Oceans > Ocean Pressure > Water Pressure... COMPOSITE: [0] project_summary: This experiment was designed to investig... stop_time: 17-Apr-2009 23:51:16 contributor_role: principalInvestigator DATA_CMNT: Instrument failed on 4/17 in the storm t... summary: USGS-CMG time-series data from the Cape ... DELTA_T: 3600 publisher_url: http://www.usgs.gov PCADPProbeSampInterval: [5.] DATA_ORIGIN: USGS WHSC Sed Trans Group longitude: [-75.42292] title: USGS-CMG time-series data: DIAMONDSHOALS... VAR_FILL: [1.e+35] institution: USGS Coastal and Marine Geology Program start_time: 12-Jan-2009 23:51:17 INST_TYPE: Sontek PCADP PressFreqOffset: [21.] CompassOffset: [0.] magnetic_variation: [-10.5] PCADPUserSetupPingInterval: [0.] ProfilesPerBurst: [1050.] publisher_name: Ellyn Montgomery creator_name: Rich Signell PCADPUserSetupProfileInterval: [1.] WATER_MASS: ? PCADPProbeHeight: [1.1] PCADPProbeSlantAngle: [15.] keywords_vocabulary: GCMD Science Keywords project_title: Cape Hatteras- Diamond Shoals VAR_DESC: burst:u:v:w:USTD:VSTD:WSTD:u_min:v_min:w... cpuSoftwareVerNum: [17.] PROJECT: USGS Coastal Marine Geology Program PressScale: [0.] PCADPUserSetupCellSize: [0.063] MOORING: [854] naming_authority: gov.usgs.cmgp original_filename: 8544pcs-cal.nc Conventions: CF-1.6,ACDD-1.3 date_created: 2017-04-11T21:56:00Z date_modified: 2017-04-11T21:56:00Z date_issued: 2017-04-11T21:56:00Z date_metadata_modified: 2017-04-11T21:56:00Z cdm_data_type: Station geospatial_lat_min: [35.17863] geospatial_lat_max: [35.17863] geospatial_lat_resolution: [0] geospatial_lat_units: degrees_north geospatial_lon_min: [-75.42292] geospatial_lon_max: [-75.42292] geospatial_lon_resolution: [0] geospatial_lon_units: degrees_east geospatial_bounds: POINT(-75.42292022705078 35.17863082885742) geospatial_bounds_crs: EPSG:4326 time_coverage_start: 2009-01-12T23:51:17 time_coverage_end: 2009-04-17T18:51:16 time_coverage_duration: PT8189999S time_coverage_resolution: PT3600S geospatial_vertical_units: meters geospatial_vertical_positive: up featureType: timeSeries geospatial_vertical_resolution: 0 geospatial_vertical_min: [-12.9518] geospatial_vertical_max: [-12.9518] ncei_template_version: NCEI_NetCDF_TimeSeries_Orthogonal_Templa... history: Fri Nov 1 20:17:03 2019: ncatted -a pro... project: U.S. Geological Survey Oceanographic Tim... NCO: netCDF Operators version 4.8.1 (Homepage...
array([b'8', b'5', b'4', b'4', b'p', b'c', b's', b'-', b'c', b'a', b'l'], dtype='|S1')
array(35.17863, dtype=float32)
array(-75.42292, dtype=float32)
array(['2009-01-12T23:51:17.000000000', '2009-01-13T00:51:17.000000000', '2009-01-13T01:51:17.000000000', ..., '2009-04-17T16:51:16.000000000', '2009-04-17T17:51:16.000000000', '2009-04-17T18:51:16.000000000'], dtype='datetime64[ns]')
array(-12.9518)
array(-2147483647, dtype=int32)
array(-2147483647, dtype=int32)
array([23.699255, 23.80683 , 23.769455, ..., 26.337088, 26.3722 , 26.351273], dtype=float32)
array([10.449184, 11.632065, 11.018159, ..., 48.197933, 50.49052 , 48.496532], dtype=float32)
outfile = fsspec.open('simplecache::s3://esip-qhub/usgs/testing/8544pcs-cal_z3.nc',
mode='wb', s3=dict(profile='esip-qhub'))
with outfile as f:
ds.to_netcdf(f)
ncfile = fsspec.open(f's3://esip-qhub/usgs/testing/8544pcs-cal_z3.nc')
ds = xr.open_dataset(ncfile.open())
ds
<xarray.Dataset> Dimensions: (feature_type_instance: 11, time: 2276) Coordinates: * time (time) datetime64[ns] 2009-01-12T23:51:17 ... 2009... * feature_type_instance (feature_type_instance) |S1 b'8' b'5' ... b'a' b'l' latitude float32 35.18 longitude float32 -75.42 z float64 -12.95 Data variables: P_4023 (time) float32 23.7 23.81 23.77 ... 26.34 26.37 26.35 SDP_850 (time) float32 10.45 11.63 11.02 ... 48.2 50.49 48.5 crs int32 -2147483647 platform int32 -2147483647 Attributes: creator_email: rsignell@usgs.gov ExtPressInstalled: ParosFreq PCADPProbeProfRange: 1.586 PCADPProbeFrequency: 1500000.0 CTDInstalled: No CREATION_DATE: 15-Jul-2010 11:13:09 creator_url: http://www.usgs.gov DATA_TYPE: TIME+PROFILE CellSize: 0.063 POS_CONST: 0 PCADPUserSetupPCMode: Yes PCADPProbeSerial: H96 PCADPUserSetupBurstMode: Enabled Deployment_date: 12-Jan-2009 EXPERIMENT: Carolinas Coastal Change Processes Project RecorderInstalled: Yes source: USGS publisher_phone: +1 (508) 548-8700 contributor_name: J. Warner PCADPProbeResMaxHorizVel: 0.8729801288148435 PCADPProbeXformMat: [ 2638. -1319. -1319. 0. -2284. 228... Recovery_date: 11-May-2009 ExtSensorInstalled: Yes PCADPProbeBlankDistance: 0.2 id: 8544pcs-cal_z3 PressScale_2: 0.0 creator_phone: +1 (508) 548-8700 PressOffset: 0.0 FILL_FLAG: 1 PCADPProbeProfMaxVertVel: 0.22594388333333335 PCADPProbeProfMaxHorizVel: 0.8729801288148435 note: velocity data replaced with fillValue af... PCADPUserSetupNcells: 22.0 PCADPUserSetupAvgInterval: 1.0 PCADPUserSetupBurstInterval: 3600.0 original_folder: DIAMONDSHOALS publisher_email: emontgomery@usgs.gov latitude: 35.17863 standard_name_vocabulary: CF-1.6 CompassInstalled: Yes TempInstalled: Yes DATA_SUBTYPE: DESCRIPT: Sontek PCADP raw data burst file nco_openmp_thread_number: 1 PressInstalled: No SensorOrientation: down PCADPProbeResMaxVertVel: 0.609689843915344 PCADPProbeNbeams: 3.0 WATER_DEPTH: 14.72184 dspSoftwareVerNum: 4.0 PCADPProbeNpingsPerBeam: 5.0 PCADPUserSetupProfilesPerBurst: 1050.0 DEPTH_CONST: 1 COORD_SYSTEM: GEOGRAPHICAL+PROFILE keywords: Oceans > Ocean Pressure > Water Pressure... COMPOSITE: 0 project_summary: This experiment was designed to investig... stop_time: 17-Apr-2009 23:51:16 contributor_role: principalInvestigator DATA_CMNT: Instrument failed on 4/17 in the storm t... summary: USGS-CMG time-series data from the Cape ... DELTA_T: 3600 publisher_url: http://www.usgs.gov PCADPProbeSampInterval: 5.0 DATA_ORIGIN: USGS WHSC Sed Trans Group longitude: -75.42292 title: USGS-CMG time-series data: DIAMONDSHOALS... VAR_FILL: 1e+35 institution: USGS Coastal and Marine Geology Program start_time: 12-Jan-2009 23:51:17 INST_TYPE: Sontek PCADP PressFreqOffset: 21.0 CompassOffset: 0.0 magnetic_variation: -10.5 PCADPUserSetupPingInterval: 0.0 ProfilesPerBurst: 1050.0 publisher_name: Ellyn Montgomery creator_name: Rich Signell PCADPUserSetupProfileInterval: 1.0 WATER_MASS: ? PCADPProbeHeight: 1.1 PCADPProbeSlantAngle: 15.0 keywords_vocabulary: GCMD Science Keywords project_title: Cape Hatteras- Diamond Shoals VAR_DESC: burst:u:v:w:USTD:VSTD:WSTD:u_min:v_min:w... cpuSoftwareVerNum: 17.0 PROJECT: USGS Coastal Marine Geology Program PressScale: 0.0 PCADPUserSetupCellSize: 0.063 MOORING: 854 naming_authority: gov.usgs.cmgp original_filename: 8544pcs-cal.nc Conventions: CF-1.6,ACDD-1.3 date_created: 2017-04-11T21:56:00Z date_modified: 2017-04-11T21:56:00Z date_issued: 2017-04-11T21:56:00Z date_metadata_modified: 2017-04-11T21:56:00Z cdm_data_type: Station geospatial_lat_min: 35.17863 geospatial_lat_max: 35.17863 geospatial_lat_resolution: 0 geospatial_lat_units: degrees_north geospatial_lon_min: -75.42292 geospatial_lon_max: -75.42292 geospatial_lon_resolution: 0 geospatial_lon_units: degrees_east geospatial_bounds: POINT(-75.42292022705078 35.17863082885742) geospatial_bounds_crs: EPSG:4326 time_coverage_start: 2009-01-12T23:51:17 time_coverage_end: 2009-04-17T18:51:16 time_coverage_duration: PT8189999S time_coverage_resolution: PT3600S geospatial_vertical_units: meters geospatial_vertical_positive: up featureType: timeSeries geospatial_vertical_resolution: 0 geospatial_vertical_min: -12.9518 geospatial_vertical_max: -12.9518 ncei_template_version: NCEI_NetCDF_TimeSeries_Orthogonal_Templa... history: Fri Nov 1 20:17:03 2019: ncatted -a pro... project: U.S. Geological Survey Oceanographic Tim... NCO: netCDF Operators version 4.8.1 (Homepage...
array(['2009-01-12T23:51:17.000000000', '2009-01-13T00:51:17.000000000', '2009-01-13T01:51:17.000000000', ..., '2009-04-17T16:51:16.000000000', '2009-04-17T17:51:16.000000000', '2009-04-17T18:51:16.000000000'], dtype='datetime64[ns]')
array([b'8', b'5', b'4', b'4', b'p', b'c', b's', b'-', b'c', b'a', b'l'], dtype='|S1')
array(35.17863, dtype=float32)
array(-75.42292, dtype=float32)
array(-12.9518)
array([23.699255, 23.80683 , 23.769455, ..., 26.337088, 26.3722 , 26.351273], dtype=float32)
array([10.449184, 11.632065, 11.018159, ..., 48.197933, 50.49052 , 48.496532], dtype=float32)
array(-2147483647, dtype=int32)
array(-2147483647, dtype=int32)
ds = xr.open_dataset('http://geoport.usgs.esipfed.org/thredds/dodsC/silt/usgs/Projects/stellwagen/CF-1.6/BUZZ_BAY/2651-A.cdf')
ds
<xarray.Dataset> Dimensions: (time: 27617) Coordinates: latitude float32 41.65 longitude float32 -70.69 * time (time) datetime64[ns] 1982-08-20T16:31:52 ... 1982... z float64 -13.0 Data variables: feature_type_instance |S64 b'2651-A' crs int32 -2147483647 platform int32 -2147483647 NEP_56 (time) float32 0.2945 0.3092 0.3141 ... 2.096 2.096 P_4022 (time) float32 20.9 20.87 20.84 ... 21.08 21.05 21.03 P_4023 (time) float32 20.89 20.87 20.84 ... 21.05 21.03 SDP_850 (time) float32 0.03115 0.01675 ... 0.008073 0.00885 T_20 (time) float32 21.95 21.94 21.92 ... 12.72 12.72 V00_1900 (time) float32 2.133 2.206 2.338 ... 2.389 2.413 V01_1901 (time) float32 0.2555 0.2701 0.275 ... 2.155 2.106 V02_1902 (time) float32 2.802 3.13 2.93 ... 2.971 3.234 3.143 rtrn_4012 (time) float32 2.482 2.286 2.404 ... 2.228 2.279 Attributes: creator_email: rsignell@usgs.gov institution: USGS Coastal and Marine Geology Program DATA_TYPE: TIME DATA_SUBTYPE: publisher_phone: +1 (508) 548-8700 DRIFTER: 0 creator_url: http://www.usgs.gov project_title: Currents and Sediment Transport in Buzza... DEPTH_CONST: 0 COORD_SYSTEM: GEOGRAPHICAL keywords: Oceans > Ocean Pressure > Water Pressure... COMPOSITE: 0 contributor_name: B. Butman project_summary: Investigation of the near-bottom circula... stop_time: 31-Oct-1982 14:31:52 contributor_role: principalInvestigator DATA_CMNT: BUZZARDS BAY - A. SPEEDS DIED EARLY. TR... cycles: 27617 summary: USGS-CMG time-series data from the Curre... DELTA_T: 3.75 minutes \n EXPERIMENT: ? source: USGS DATA_ORIGIN: USGS/WHFC longitude: -70.68755 original_folder: BUZZ_BAY WHOI_Buoy_Group_summary: 2651-A start time = 20 Aug 1982 ... title: USGS-CMG time-series data: BUZZ_BAY - 26... original_filename: 2651-A.cdf VAR_FILL: POS_CONST: 0 id: 2651-A CREATION_DATE: 08:53 25-APR-97 creator_phone: +1 (508) 548-8700 start_time: 20-Aug-1982 16:31:52 INST_TYPE: FILL_FLAG: 0 magnetic_variation%28deg%29: -15.0 days: 71 publisher_name: Ellyn Montgomery creator_name: Rich Signell publisher_email: emontgomery@usgs.gov latitude: 41.65089 WATER_MASS: ? standard_name_vocabulary: CF-1.6 WATER_DEPTH: 14 keywords_vocabulary: GCMD Science Keywords DESCRIPT: ? VAR_DESC: u v upper lower int.p pres psdev PROJECT: ? MOORING: 265 naming_authority: gov.usgs.cmgp publisher_url: http://www.usgs.gov Conventions: CF-1.6,ACDD-1.3 date_created: 2017-04-11T21:26:00Z date_modified: 2017-04-11T21:26:00Z date_issued: 2017-04-11T21:26:00Z date_metadata_modified: 2017-04-11T21:26:00Z geospatial_lat_min: 41.65089 geospatial_lat_max: 41.65089 geospatial_lat_resolution: 0 geospatial_lat_units: degrees_north geospatial_lon_min: -70.68755 geospatial_lon_max: -70.68755 geospatial_lon_resolution: 0 geospatial_lon_units: degrees_east geospatial_bounds: POINT(-70.68755340576172 41.6508903503418) geospatial_bounds_crs: EPSG:4326 time_coverage_start: 1982-08-20T16:31:52 time_coverage_end: 1982-10-31T14:31:52 time_coverage_duration: PT6213600S geospatial_vertical_units: meters geospatial_vertical_positive: up featureType: timeSeries geospatial_vertical_resolution: 0 geospatial_vertical_min: -13.0 geospatial_vertical_max: -13.0 cdm_data_type: TimeSeries cdm_timeseries_variables: latitude,longitude,z,feature_type_instance ncei_template_version: NCEI_NetCDF_TimeSeries_Orthogonal_Templa... history: Fri Nov 1 20:15:19 2019: ncatted -a pro... project: U.S. Geological Survey Oceanographic Tim... NCO: netCDF Operators version 4.8.1 (Homepage... DODS.strlen: 6 DODS.dimName: feature_type_instance
array(41.65089, dtype=float32)
array(-70.68755, dtype=float32)
array(['1982-08-20T16:31:52.000000000', '1982-08-20T16:35:37.000000000', '1982-08-20T16:39:22.000000000', ..., '1982-10-31T14:24:22.000000000', '1982-10-31T14:28:07.000000000', '1982-10-31T14:31:52.000000000'], dtype='datetime64[ns]')
array(-13.)
array(b'2651-A', dtype='|S64')
array(-2147483647, dtype=int32)
array(-2147483647, dtype=int32)
array([0.29453, 0.30918, 0.31406, ..., 2.14481, 2.09599, 2.09599], dtype=float32)
array([20.901299, 20.8741 , 20.8445 , ..., 21.0789 , 21.051699, 21.0294 ], dtype=float32)
array([20.8921, 20.8704, 20.8371, ..., 21.0746, 21.0478, 21.0261], dtype=float32)
array([0.031155, 0.016747, 0.027254, ..., 0.008755, 0.008073, 0.00885 ], dtype=float32)
array([21.953888, 21.936493, 21.919098, ..., 12.74231 , 12.723206, 12.723206], dtype=float32)
array([2.1326 , 2.20583, 2.33765, ..., 2.15945, 2.38891, 2.41332], dtype=float32)
array([0.25547, 0.27012, 0.275 , ..., 2.10087, 2.15457, 2.10575], dtype=float32)
array([2.80203, 3.12986, 2.92996, ..., 2.97079, 3.23369, 3.1427 ], dtype=float32)
array([2.48166, 2.28639, 2.40355, ..., 2.37914, 2.2278 , 2.27906], dtype=float32)
ds = xr.open_dataset('http://erddap.sensors.ioos.us/erddap/tabledap/gov_usgs_cmgp_buzz_bay_265')
ds
<xarray.Dataset> Dimensions: (s: 27617) Dimensions without coordinates: s Data variables: s.time (s) datetime64[ns] ... s.latitude (s) float64 41.6... s.longitude (s) float64 -70.... s.z (s) float64 -13.... s.backscatter_intensity_2651_a (s) float64 2.09... s.sea_water_velocity_to_direction_2651ds_a (s) float64 nan ... s.sea_water_speed_2651ds_a (s) float64 nan ... s.eastward_sea_water_velocity_2651ds_a (s) float64 nan ... s.northward_sea_water_velocity_2651ds_a (s) float64 nan ... s.sea_water_pressure_cm_time__standard_deviation_2651_a (s) float64 0.00... s.sea_water_pressure_2651_a (s) float64 21.0... s.sea_water_temperature_2651_a (s) float64 12.7... s.station (s) |S64 b'BUZZ_... Attributes: cdm_altitude_proxy: z cdm_data_type: TimeSeriesProfile cdm_profile_variables: time cdm_timeseries_variables: station,longitude,latitude contributor_email: None,feedback@axiomdatascience.com contributor_name: Mid-Atlantic Coastal Ocean Observing Syste... contributor_role: funder,processor contributor_role_vocabulary: NERC contributor_url: https://maracoos.org/,https://www.axiomdat... Conventions: IOOS-1.2, CF-1.6, ACDD-1.3, NCCSV-1.0 creator_country: USA creator_institution: USGS Coastal and Marine Geology Program (U... creator_name: USGS Coastal and Marine Geology Program (U... creator_sector: gov_federal creator_type: institution creator_url: http://marine.usgs.gov/ defaultDataQuery: sea_water_speed_2651ds_a,backscatter_inten... Easternmost_Easting: -70.68755340576172 featureType: TimeSeriesProfile geospatial_lat_max: 41.6508903503418 geospatial_lat_min: 41.6508903503418 geospatial_lat_units: degrees_north geospatial_lon_max: -70.68755340576172 geospatial_lon_min: -70.68755340576172 geospatial_lon_units: degrees_east geospatial_vertical_max: 13.0 geospatial_vertical_min: 0.0 geospatial_vertical_positive: up geospatial_vertical_units: m history: Downloaded from USGS Coastal and Marine Ge... id: 56575 infoUrl: https://sensors.ioos.us/#metadata/56575/st... institution: USGS Coastal and Marine Geology Program (U... license: The data may be used and redistributed for... naming_authority: com.axiomdatascience Northernmost_Northing: 41.6508903503418 platform: fixed platform_name: BUZZ_BAY - 265 platform_vocabulary: http://mmisw.org/ont/ioos/platform processing_level: Level 2 publisher_country: USA publisher_institution: USGS Coastal and Marine Geology Program (U... publisher_name: USGS Coastal and Marine Geology Program (U... publisher_sector: gov_federal publisher_type: institution publisher_url: http://marine.usgs.gov/ references: https://stellwagen.er.usgs.gov/buzz_bay.ht... sourceUrl: https://sensors.axds.co/api/ Southernmost_Northing: 41.6508903503418 standard_name_vocabulary: CF Standard Name Table v72 summary: Timeseries data from 'BUZZ_BAY - 265' (urn... time_coverage_end: 1982-10-31T14:31:52Z time_coverage_start: 1982-08-20T16:31:52Z title: BUZZ_BAY - 265 Westernmost_Easting: -70.68755340576172
array(['1982-10-31T14:31:52.000000000', '1982-10-31T14:28:07.000000000', '1982-10-31T14:24:22.000000000', ..., '1982-08-20T16:39:22.000000000', '1982-08-20T16:35:37.000000000', '1982-08-20T16:31:52.000000000'], dtype='datetime64[ns]')
array([41.65089, 41.65089, 41.65089, ..., 41.65089, 41.65089, 41.65089])
array([-70.687553, -70.687553, -70.687553, ..., -70.687553, -70.687553, -70.687553])
array([-13., -13., -13., ..., -13., -13., -13.])
array([2.09599, 2.09599, 2.14481, ..., 0.31406, 0.30918, 0.29453])
array([ nan, nan, nan, ..., 200.987, 202.841, 207.28 ])
array([ nan, nan, nan, ..., 0.222773, 0.213547, 0.204052])
array([ nan, nan, nan, ..., -0.079787, -0.082895, -0.093525])
array([ nan, nan, nan, ..., -0.207994, -0.196801, -0.181357])
array([0.00885 , 0.008073, 0.008755, ..., 0.027254, 0.016747, 0.031155])
array([21.0294 , 21.051699, 21.0789 , ..., 20.8445 , 20.8741 , 20.901299])
array([12.723206, 12.723206, 12.74231 , ..., 21.919098, 21.936493, 21.953888])
array([b'BUZZ_BAY - 265', b'BUZZ_BAY - 265', b'BUZZ_BAY - 265', ..., b'BUZZ_BAY - 265', b'BUZZ_BAY - 265', b'BUZZ_BAY - 265'], dtype='|S64')
url = 'https://geoport.usgs.esipfed.org/erddap/griddap/adcp_grid_5d6e_e2f9_148d'
ds = xr.open_dataset(url)
ds
<xarray.Dataset> Dimensions: (altitude: 16, time: 8992) Coordinates: * time (time) datetime64[ns] 2009-09-08T04:14:36 ... 2009-11-09T14:44:36 * altitude (altitude) float32 -6.57 -6.07 -5.57 ... -0.06964 0.4304 0.9304 Data variables: AGC_1202 (time, altitude) float32 ... CD_310 (time, altitude) float32 ... CS_300 (time, altitude) float32 ... PGd_1203 (time, altitude) float32 ... Werr_1201 (time, altitude) float32 ... u_1205 (time, altitude) float32 ... v_1206 (time, altitude) float32 ... w_1204 (time, altitude) float32 ... Attributes: beam_angle: 20 beam_pattern: convex beam_width: 2 beams_in_velocity_calculation: 4 bin_count: 37 bin_size: 0.5 blanking_distance: 0.44 cdm_data_type: Grid center_first_bin: 1.06 code_repetitions: 5 COMPOSITE: 0 conductivity_sensor: NO contributor_name: N. Ganju contributor_role: principalInvestigator Conventions: CF-1.6,ACDD-1.3, COARDS COORD_SYSTEM: GEOGRAPHIC creation_date: 01-Mar-2011 15:26:49 creator_email: rsignell@usgs.gov creator_name: Rich Signell creator_phone: +1 (508) 548-8700 creator_url: https://www.usgs.gov DATA_CMNT: 1200 kHz ADCP on north micropod DATA_ORIGIN: USGS WHSC Sed Trans Group DATA_SUB_TYPE: MOORED DATA_TYPE: ADCP date_created: 2017-02-06T18:53:00Z date_issued: 2017-02-06T18:53:00Z date_metadata_modified: 2017-02-06T18:53:00Z date_modified: 2017-02-06T18:53:00Z DELTA_T: 600 Deployment_date: 13-Aug-2009 DEPTH_CONST: 0 depth_sensor: YES DESCRIPT: Tripod at north Buzzards Bay Site description: WFAL - 859 - 8591wh-a.nc DRIFTER: 0 ED_taken_from_depth_sensor: YES EH_taken_from_transducer_heading_sensor: YES ending_water_layer: 5 EP_taken_from_transducer_pitch_sensor: YES ER_taken_from_transducer_roll_sensor: YES error_velocity_threshold: 2000 ET_taken_from_temperature_sensor: YES EXPERIMENT: Buzzards Bay Ecosystem Studies false_target_reject_values: [255 255] FILL_FLAG: 0 firmware_version: 16.31 frequency: 1200 geospatial_bounds: POINT(-70.72550201416016 41.634... geospatial_bounds_crs: 4326 geospatial_vertical_max: 0.9303613 geospatial_vertical_min: -6.569639 geospatial_vertical_positive: up geospatial_vertical_resolution: 0.50000002 geospatial_vertical_units: m heading_sensor: YES history: Trimmed using trunc_cdf, SVN $... id: 8591wh-a infoUrl: https://www.usgs.gov initial_instrument_height: 1.25 initial_instrument_height_note: height in meters above bottom: ... INST_TYPE: RD Instruments ADCP institution: USGS Coastal and Marine Geology... janus: 4 Beam keywords: 8591wh, 8591wh-a, agc, AGC_1202... keywords_vocabulary: GCMD Science Keywords lag_length: 53 latitude: 41.6346 license: The data may be used and redist... longitude: -70.7255 magnetic_variation_applied: -15.0333 magnetic_variation_at_site: -15.0333 minmax_percent_good: [ 0 100] modification_date: 06-Aug-2010 16:34:12 MOORING: 859.0 naming_authority: gov.usgs.cmgp ncei_template_version: NCEI_NetCDF_TimeSeriesProfile_O... nominal_sensor_depth: 7.6296387 nominal_sensor_depth_note: inst_depth = (water_depth - ins... NOTE_1: transmit_pulse_length units are cm NOTE_2: transmit_lag_distance units are cm NOTE_3: bin depths are relative to the ... orientation: UP original_filename: 8591wh-a.nc original_folder: WFAL pings_per_ensemble: 180 pitch_sensor: YES platform_type: Tripod POS_CONST: 0 pred_accuracy: 0.52 profiling_mode: 1 PROJECT: USGS Coastal Marine Geology Pro... project: U.S. Geological Survey Oceanogr... project_summary: Oceanographic and water-quality... project_title: West Falmouth Harbor Fluxes publisher_email: emontgomery@usgs.gov publisher_name: Ellyn Montgomery publisher_phone: +1 (508) 548-8700 publisher_url: https://www.usgs.gov Recovery_date: 9-Nov-2009 roll_sensor: YES salinity_set_by_user: 35.0 salinity_set_by_user_units: PPT SciPi: N. Ganju sensor_configuration: 1 serial_number: 6983 simulated_data: 0 Sound_speed_computed_from_ED_ES_ET: YES source: USGS sourceUrl: (local files) standard_name_vocabulary: CF Standard Name Table v29 starting_water_layer: 1 summary: United States Geological Survey... temperature_sensor: YES time_between_ping_groups: 1.0 time_coverage_duration: PT5394600S time_coverage_end: 2009-11-09T14:44:36Z time_coverage_start: 2009-09-08T04:14:36Z title: USGS-CMG time-series data: West... transducer_attached: 49 transducer_offset_from_bottom: 1.25 transform: EARTH transmit_lag_distance: 12 transmit_pulse_length_cm: 61 valid_correlation_range: [ 64 255] VAR_DESC: bindist:ensemble:u:v:w:Werr:AGC... VAR_FILL: 1e+35 WATER_DEPTH: 8.879639 WATER_DEPTH_datum: not yet assigned WATER_DEPTH_source: water depth = MSL from pressure...
array(['2009-09-08T04:14:36.000000000', '2009-09-08T04:24:36.000000000', '2009-09-08T04:34:36.000000000', ..., '2009-11-09T14:24:36.000000000', '2009-11-09T14:34:36.000000000', '2009-11-09T14:44:36.000000000'], dtype='datetime64[ns]')
array([-6.569639, -6.069639, -5.569639, -5.069639, -4.569639, -4.069639, -3.569639, -3.069639, -2.569639, -2.069639, -1.569639, -1.069639, -0.569639, -0.069639, 0.430361, 0.930361], dtype=float32)
[143872 values with dtype=float32]
[143872 values with dtype=float32]
[143872 values with dtype=float32]
[143872 values with dtype=float32]
[143872 values with dtype=float32]
[143872 values with dtype=float32]
[143872 values with dtype=float32]
[143872 values with dtype=float32]
ds['CS_300']
<xarray.DataArray 'CS_300' (time: 8992, altitude: 16)> [143872 values with dtype=float32] Coordinates: * time (time) datetime64[ns] 2009-09-08T04:14:36 ... 2009-11-09T14:44:36 * altitude (altitude) float32 -6.57 -6.07 -5.57 ... -0.06964 0.4304 0.9304 Attributes: ancillary_variables: platform colorBarMaximum: 0.5 colorBarMinimum: 0.0 coverage_content_type: physicalMeasurement epic_code: 300 grid_mapping: crs ioos_category: Currents long_name: Current speed platform: platform standard_name: sea_water_speed units: m/s
[143872 values with dtype=float32]
array(['2009-09-08T04:14:36.000000000', '2009-09-08T04:24:36.000000000', '2009-09-08T04:34:36.000000000', ..., '2009-11-09T14:24:36.000000000', '2009-11-09T14:34:36.000000000', '2009-11-09T14:44:36.000000000'], dtype='datetime64[ns]')
array([-6.569639, -6.069639, -5.569639, -5.069639, -4.569639, -4.069639, -3.569639, -3.069639, -2.569639, -2.069639, -1.569639, -1.069639, -0.569639, -0.069639, 0.430361, 0.930361], dtype=float32)
List items on a Requester Pays bucket
fs = fsspec.filesystem('s3', anon=False, requester_pays=True)
fs.ls('esip-qhub/noaa/nwm/')
['esip-qhub/noaa/nwm/.zattrs', 'esip-qhub/noaa/nwm/.zgroup', 'esip-qhub/noaa/nwm/.zmetadata', 'esip-qhub/noaa/nwm/elevation', 'esip-qhub/noaa/nwm/feature_id', 'esip-qhub/noaa/nwm/latitude', 'esip-qhub/noaa/nwm/longitude', 'esip-qhub/noaa/nwm/order', 'esip-qhub/noaa/nwm/qBtmVertRunoff', 'esip-qhub/noaa/nwm/qBucket', 'esip-qhub/noaa/nwm/qSfcLatRunoff', 'esip-qhub/noaa/nwm/q_lateral', 'esip-qhub/noaa/nwm/streamflow', 'esip-qhub/noaa/nwm/time', 'esip-qhub/noaa/nwm/velocity']
In Zarr datasets, each chunk is a separate object, so instead of opening a file, we use a mapper
ds = xr.open_zarr(fsspec.get_mapper('s3://esip-qhub/noaa/nwm',
anon=False, requester_pays=True), consolidated=True)
from dask.distributed import Client
client = Client()
client
Client
|
Cluster
|
s3store = fsspec.get_mapper(f's3://esip-qhub/usgs/testing/zarr_test',
anon=False, profile='esip-qhub')
Write first time step of Zarr dataset to Bucket
%%time
ds.isel(time=0).load().to_zarr(store=s3store, mode='w', consolidated=True) #fails without .load()
CPU times: user 7.35 s, sys: 890 ms, total: 8.24 s Wall time: 1min 39s
<xarray.backends.zarr.ZarrStore at 0x7fc82aac6640>
client.close() # close the LocalCluster client
Here we spin up a Qhub Dask Gateway cluster.
from dask_gateway import Gateway
from dask.distributed import Client
gateway = Gateway()
# see Gateway options to use in new_cluster by doing: gateway.cluster_options()
cluster = gateway.new_cluster(environment='pangeo', profile='Pangeo Worker')
cluster.scale(4)
client = Client(cluster)
cluster
VBox(children=(HTML(value='<h2>GatewayCluster</h2>'), HBox(children=(HTML(value='\n<div>\n<style scoped>\n …
%%time
ds.isel(time=0).load().to_zarr(store=s3store, mode='w', consolidated=True) #fails without .load()
CPU times: user 2.66 s, sys: 435 ms, total: 3.09 s Wall time: 1min 43s
<xarray.backends.zarr.ZarrStore at 0x7fc82b0b27c0>
client.close(); cluster.close()