Just as when opening from a filesystem, when opening from s3, xarray loads:
We demonstrate this here opening a 25GB NetCDF file from s3 in only 6 seconds. The memory usage in the Notebook shows less than 1GB RAM usage.
import fsspec
import xarray as xr
fs = fsspec.filesystem('s3', anon=True, client_kwargs=dict(endpoint_url='https://ncsa.osn.xsede.org'))
ncfile_on_s3 = 's3://esip/examples/adcirc/adcirc_01.nc'
fs.size(ncfile_on_s3)/1e9 # GB
26.140007264
%%time
ds = xr.open_dataset(fs.open(ncfile_on_s3), chunks={'time':10, 'node':141973})
CPU times: user 3.22 s, sys: 292 ms, total: 3.51 s Wall time: 9.28 s
ds.zeta
<xarray.DataArray 'zeta' (time: 720, node: 9228245)> dask.array<open_dataset-07e4e31a5b972cd686bab63d3a63abe9zeta, shape=(720, 9228245), dtype=float64, chunksize=(10, 141973), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 2031-08-03T02:10:00 ... 2031-08-08T02:00:00 x (node) float64 dask.array<chunksize=(141973,), meta=np.ndarray> y (node) float64 dask.array<chunksize=(141973,), meta=np.ndarray> Dimensions without coordinates: node Attributes: location: node long_name: water surface elevation above geoid mesh: adcirc_mesh standard_name: sea_surface_height_above_geoid units: m