Minimal imports
import pyrasterframes
import pyrasterframes.rf_ipython
from pyrasterframes.utils import create_rf_spark_session
from pyrasterframes.rasterfunctions import rf_crs, rf_extent, rf_tile
from pyspark.sql.functions import col
spark = create_rf_spark_session()
uri = 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/' \
'MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF'
# here we flatten the projected raster structure
df = spark.read.raster(uri) \
.withColumn('tile', rf_tile('proj_raster')) \
.withColumn('crs', rf_crs(col('proj_raster'))) \
.withColumn('ext', rf_extent(col('proj_raster'))) \
.drop('proj_raster')
df.printSchema()
root |-- proj_raster_path: string (nullable = false) |-- tile: tile (nullable = true) |-- crs: struct (nullable = true) | |-- crsProj4: string (nullable = false) |-- ext: struct (nullable = true) | |-- xmin: double (nullable = false) | |-- ymin: double (nullable = false) | |-- xmax: double (nullable = false) | |-- ymax: double (nullable = false)
Tile
object in Jupyter / IPython¶A pyrasterframes.rf_types.Tile
will automatically render nicely in Jupyter or IPython.
A pandas.DataFrame
containing a Tile
column will automatically render nicely in Jupyter or IPython.
tile = df.select(df.tile).first()['tile']
tile
You can also still access the string representation easily.
str(tile)
'Tile(dimensions=[256, 256], cell_type=CellType(int16ud32767, 32767), cells=\n[[1225 1244 1247 ... 1305 1245 1206]\n [1166 1188 1190 ... 1381 1251 1193]\n [1156 1110 1122 ... 1248 1245 1270]\n ...\n [1485 1749 1761 ... 1034 996 998]\n [1780 1777 1663 ... 1008 1027 1174]\n [1728 1647 1562 ... 1189 1297 1382]])'
And access the tile's cells
member which is a numpy ndarray, or more specifically in this case a numpy.ma.MaskedArray.
tile.cells
masked_array( data=[[1225, 1244, 1247, ..., 1305, 1245, 1206], [1166, 1188, 1190, ..., 1381, 1251, 1193], [1156, 1110, 1122, ..., 1248, 1245, 1270], ..., [1485, 1749, 1761, ..., 1034, 996, 998], [1780, 1777, 1663, ..., 1008, 1027, 1174], [1728, 1647, 1562, ..., 1189, 1297, 1382]], mask=[[False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False], ..., [False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False]], fill_value=32767, dtype=int16)
There is also a capability for HTML rendering of the spark DataFrame. Rendering work is done on the JVM and the HTML string representation is provided for Jupyter to display.
df
proj_raster_path | tile | crs | ext |
---|---|---|---|
https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF | [+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs ] | [1.4455356755667E7, -2342509.0947640934, 1.4573964811098093E7, -2223901.039333] | |
https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF | [+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs ] | [1.4573964811098093E7, -2342509.0947640934, 1.4692572866529187E7, -2223901.039333] | |
https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF | [+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs ] | [1.4692572866529185E7, -2342509.0947640934, 1.481118092196028E7, -2223901.039333] | |
https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF | [+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs ] | [1.481118092196028E7, -2342509.0947640934, 1.4929788977391373E7, -2223901.039333] | |
https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF | [+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs ] | [1.4929788977391373E7, -2342509.0947640934, 1.5048397032822467E7, -2223901.039333] |
pandas.DataFrame
example¶If a Pandas DataFrame contains a column of Tile
s, the same image rendering is done to the column.
In this output you may like to double-click a cell in the tile2
column to "expand" the rows to full size rendering of the tile image.
pandas_df = df.limit(10).toPandas()
pandas_df.head(4)
proj_raster_path | tile | crs | ext | |
---|---|---|---|---|
0 | https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF | (+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs ,) | (14455356.755667, -2342509.0947640934, 14573964.811098093, -2223901.039333) | |
1 | https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF | (+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs ,) | (14573964.811098093, -2342509.0947640934, 14692572.866529187, -2223901.039333) | |
2 | https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF | (+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs ,) | (14692572.866529185, -2342509.0947640934, 14811180.92196028, -2223901.039333) | |
3 | https://modis-pds.s3.amazonaws.com/MCD43A4.006/31/11/2017158/MCD43A4.A2017158.h31v11.006.2017171203421_B01.TIF | (+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs ,) | (14811180.92196028, -2342509.0947640934, 14929788.977391373, -2223901.039333) |
You still get the default string representatation of a pandas.Series
pandas_df.iloc[8]
proj_raster_path https://modis-pds.s3.amazonaws.com/MCD43A4.006... tile Tile(dimensions=[256, 256], cell_type=CellType... crs (+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.... ext (15404221.199115746, -2342509.0947640934, 1552... Name: 8, dtype: object
pandas_df.tile
0 Tile(dimensions=[256, 256], cell_type=CellType... 1 Tile(dimensions=[256, 256], cell_type=CellType... 2 Tile(dimensions=[256, 256], cell_type=CellType... 3 Tile(dimensions=[256, 256], cell_type=CellType... 4 Tile(dimensions=[256, 256], cell_type=CellType... 5 Tile(dimensions=[256, 256], cell_type=CellType... 6 Tile(dimensions=[256, 256], cell_type=CellType... 7 Tile(dimensions=[256, 256], cell_type=CellType... 8 Tile(dimensions=[256, 256], cell_type=CellType... 9 Tile(dimensions=[96, 256], cell_type=CellType(... Name: tile, dtype: object
And nothing different happens for a pandas.DataFrame
that doesn't have a Tile
in it.
import pandas
pandas.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_february_us_airport_traffic.csv').head(10)