Creating hurricane tracks using Geoanalytics¶

The sample code below uses big data analytics (GeoAnalytics) to reconstruct hurricane tracks using data registered on a big data file share in the GIS. Note that this functionality is currently available on ArcGIS Enterprise 10.5 and not yet with ArcGIS Online.

Reconstruct tracks¶

Reconstruct tracks is a type of data aggregation tool available in the arcgis.geoanalytics module. This tool works with a layer of point features or polygon features that are time enabled. It first determines which points belong to a track using an identification number or identification string. Using the time at each location, the tracks are ordered sequentially and transformed into a line representing the path of movement.

Data used¶

For this sample, hurricane data from over a period of 50 years, totalling about 150,000 points split into 5 shape files was used. The National Hurricane Center provides similar datasets that can be used for exploratory purposes.

To illustrate the nature of the data a subset was published as a feature service and can be visualized as below:

In [1]:

from arcgis.gis import GIS

# Create an anonymous connection to ArcGIS Online
arcgis_online = GIS()
hurricane_pts = arcgis_online.content.search("Hurricane_tracks_points AND owner:atma.mani", "Feature Layer")[0]
hurricane_pts

Out[1]:

Hurricane_tracks_points
Years 1932 - 1942

Feature Layer Collection by atma.mani
Last Modified: September 15, 2016
0 comments, 0 views

In [2]:

subset_map = arcgis_online.map("USA")
subset_map

In [3]:

subset_map.add_layer(hurricane_pts)

Inspect the data attributes¶

Let us query the first layer in hurricane_pts and view its attribute table as a Pandas dataframe.

In [4]:

hurricane_pts.layers[0].query().df.head()

Out[4]:

	ATC_eye	ATC_grade	ATC_poci	ATC_pres	ATC_rmw	ATC_roci	ATC_w34_r1	ATC_w34_r2	ATC_w34_r3	ATC_w34_r4	...	hour	min_	month	wmo_pres	wmo_pres__	wmo_wind	wmo_wind__	year	geometry.x	geometry.y
FID
1	-999	-999.	-999	-999	-999	-999	-999	-999	-999	-999	...	0	0	1	-999	-999.0	-999	-999.0	1932	58.750000	-18.080000
2	-999	-999.	-999	-999	-999	-999	-999	-999	-999	-999	...	6	0	1	0	-100.0	0	-100.0	1932	58.400002	-18.500000
3	-999	-999.	-999	-999	-999	-999	-999	-999	-999	-999	...	12	0	1	-999	-999.0	-999	-999.0	1932	58.070000	-18.900000
4	-999	-999.	-999	-999	-999	-999	-999	-999	-999	-999	...	18	0	1	-999	-999.0	-999	-999.0	1932	57.730000	-19.309999
5	-999	-999.	-999	-999	-999	-999	-999	-999	-999	-999	...	0	0	1	-999	-999.0	-999	-999.0	1932	57.349998	-19.760000

5 rows × 148 columns

Create a data store¶

For the GeoAnalytics server to process your big data, it needs the data to be registered as a data store. In our case, the data is in multiple shape files and we will register the folder containing the files as a data store of type bigDataFileShare.

Let us connect to an ArcGIS Enterprise

In [5]:

gis = GIS("https://yourportal.domain.com/webcontext", "username", "password")

Get the geoanalytics datastores and search it for the registered datasets:

In [6]:

# Query the data stores available
import arcgis
datastores = arcgis.geoanalytics.get_datastores()
bigdata_fileshares = datastores.search()
bigdata_fileshares

Out[6]:

[<Datastore title:"/bigDataFileShares/Chicago_accidents" type:"bigDataFileShare">,
 <Datastore title:"/bigDataFileShares/hurricanes" type:"bigDataFileShare">,
 <Datastore title:"/bigDataFileShares/hurricanes_1m_168yrs" type:"bigDataFileShare">,
 <Datastore title:"/bigDataFileShares/hurricanes_all" type:"bigDataFileShare">,
 <Datastore title:"/bigDataFileShares/Hurricane_tracks" type:"bigDataFileShare">,
 <Datastore title:"/bigDataFileShares/NYCdata" type:"bigDataFileShare">,
 <Datastore title:"/bigDataFileShares/NYC_taxi" type:"bigDataFileShare">]

The dataset hurricanes_all data is registered as a big data file share with the Geoanalytics datastore, so we can reference it:

In [7]:

data_item = bigdata_fileshares[3]

If there is no big data file share for hurricane track data registered on the server, we can register one that points to the shared folder containing the shape files.

In [17]:

data_item = datastores.add_bigdata("Hurricane_tracks", r"\\path_to_hurricane_data")

Big Data file share exists for Hurricane_tracks

Once a big data file share is registered, the GeoAnalytics server processes all the valid file types to discern the schema of the data. This process can take a few minutes depending on the size of your data. Once processed, querying the manifest property returns the schema. As you can see from below, the schema is similar to the subset we observed earlier in this sample.

In [8]:

data_item.manifest['datasets'][0] #for brevity only a portion is printed

Out[8]:

{'format': {'extension': 'shp', 'type': 'shapefile'},
 'geometry': {'geometryType': 'esriGeometryPoint',
  'spatialReference': {'wkid': 4326}},
 'name': 'full_dataset',
 'schema': {'fields': [{'name': 'serial_num', 'type': 'esriFieldTypeString'},
   {'name': 'season', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'num', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'basin', 'type': 'esriFieldTypeString'},
   {'name': 'sub_basin', 'type': 'esriFieldTypeString'},
   {'name': 'name', 'type': 'esriFieldTypeString'},
   {'name': 'iso_time', 'type': 'esriFieldTypeString'},
   {'name': 'nature', 'type': 'esriFieldTypeString'},
   {'name': 'latitude', 'type': 'esriFieldTypeDouble'},
   {'name': 'longitude', 'type': 'esriFieldTypeDouble'},
   {'name': 'wind_wmo_', 'type': 'esriFieldTypeDouble'},
   {'name': 'pres_wmo_', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'center', 'type': 'esriFieldTypeString'},
   {'name': 'wind_wmo1', 'type': 'esriFieldTypeDouble'},
   {'name': 'pres_wmo1', 'type': 'esriFieldTypeDouble'},
   {'name': 'track_type', 'type': 'esriFieldTypeString'},
   {'name': 'size', 'type': 'esriFieldTypeString'},
   {'name': 'Wind', 'type': 'esriFieldTypeBigInteger'}]},
 'time': {'fields': [{'formats': ['yyyy-MM-dd HH:mm:ss'], 'name': 'iso_time'}],
  'timeReference': {'timeZone': 'UTC'},
  'timeType': 'instant'}}

Perform data aggregation using reconstruct tracks tool¶

When you add a big data file share, a corresponding item gets created in your GIS. You can search for it like a regular item and query its layers.

In [9]:

search_result = gis.content.search("", item_type = "big data file share")
search_result

Out[9]:

[<Item title:"bigDataFileShares_hurricanes_1m_168yrs" type:Big Data File Share owner:admin>,
 <Item title:"bigDataFileShares_NYC_taxi" type:Big Data File Share owner:admin>,
 <Item title:"bigDataFileShares_hurricanes" type:Big Data File Share owner:admin>,
 <Item title:"bigDataFileShares_Chicago_accidents" type:Big Data File Share owner:admin>,
 <Item title:"bigDataFileShares_hurricanes_all" type:Big Data File Share owner:admin>,
 <Item title:"bigDataFileShares_NYCdata" type:Big Data File Share owner:admin>]

In [10]:

data_item = search_result[4]
data_item

Out[10]:

bigDataFileShares_hurricanes_all

Big Data File Share by admin
Last Modified: November 21, 2016
0 comments, 0 views

In [11]:

years_50 = data_item.layers[0]
years_50

Out[11]:

<Layer url:"https://yourserver.domain.com/webcontext/rest/services/DataStoreCatalogs/bigDataFileShares_hurricanes_all/BigDataCatalogServer/full_dataset">

Reconstruct tracks tool¶

The reconstruct_tracks() function is available in the arcgis.geoanalytics.summarize_data module. In this example, we are using this tool to aggregate the numerous points into line segments showing the tracks followed by the hurricanes. The tool creates a feature layer item as an output which can be accessed once the processing is complete.

In [12]:

from arcgis.geoanalytics.summarize_data import reconstruct_tracks

In [13]:

agg_result = reconstruct_tracks(years_50, 
                                track_fields='Serial_Num',
                                method='GEODESIC')

Submitted.
Executing...
Executing (ReconstructTracks): ReconstructTracks "Feature Set" Serial_Num Geodesic # # # # {"itemProperties":{"itemId":"9b2d3c8e7418468fb2a89e3f27db1639"},"serviceProperties":{"serviceUrl":"http://yourserver.domain.com/webcontext/rest/services/Hosted/Reconstructed_Tracks_IOBYC0/FeatureServer","name":"Reconstructed_Tracks_IOBYC0"}} #
Start Time: Wed Dec 14 16:36:10 2016
Using URL based GPRecordSet param: https://yourserver.domain.com/webcontext/rest/services/DataStoreCatalogs/bigDataFileShares_hurricanes_all/BigDataCatalogServer/full_dataset?token=AYU95RBC50dIrjFxzzUIdYeI4Tqf-cXBdtRbuOKnpo3vvW1bu7FF4EAnwfwef1AVMEdRGXPTxCGWkWf81iuWCIBpQBP8-xoVp1eRrSeNbVgpOFekTsUokT6OW_LGgH1yn6DHC-Uul6ndvMSycCyp8ENol2_UMn7ksRRrQh_26Kc.
{"messageCode":"BD_101028","message":"Starting new distributed job with 8 tasks.","params":{"totalTasks":"8"}}
{"messageCode":"BD_101029","message":"8/8 distributed tasks completed.","params":{"completedTasks":"8","totalTasks":"8"}}
Finished writing
  extent = Some(Envelope: [-103.0, -39.8, 80.0, 63.0])
  interval = Some(Interval(MutableInstant(1848-01-11 06:00:00.000),MutableInstant(1899-12-26 06:00:00.000)))
  count = 568
{"messageCode":"BD_0","message":"Feature service layer created: http://yourserver.domain.com/webcontext/rest/services/Hosted/Reconstructed_Tracks_IOBYC0/FeatureServer/0","params":{"serviceUrl":"http://yourserver.domain.com/webcontext/rest/services/Hosted/Reconstructed_Tracks_IOBYC0/FeatureServer/0"}}
{"messageCode":"BD_101051","message":"Possible issues were found while reading 'inputLayer'.","params":{"paramName":"inputLayer"}}
{"messageCode":"BD_101052","message":"Some records have either missing or invalid time values."}
      > GenericFeature(attributes=[1979260N16312,1979,16,NA,MM,UNNAMED,9/19/1979 12:00,TS,28.8,-51.5,30.0,0,atcf,17.071,-100.0,main,30000,30000],geometry={"x":-51.5,"y":28.8},time=null)
      > GenericFeature(attributes=[1979260N16312,1979,16,NA,MM,UNNAMED,9/19/1979 18:00,TS,30.2,-51.5,30.0,0,atcf,17.071,-100.0,main,30000,30000],geometry={"x":-51.5,"y":30.2},time=null)
      > GenericFeature(attributes=[1979260N16312,1979,16,NA,MM,UNNAMED,9/20/1979 0:00,TS,32.1,-51.2,30.0,0,atcf,17.071,-100.0,main,30000,30000],geometry={"x":-51.2,"y":32.1},time=null)
Succeeded at Wed Dec 14 16:36:28 2016 (Elapsed Time: 17.58 seconds)

Inspect the results¶

Let us create a map and load the processed result which is a feature service

In [14]:

processed_map = gis.map("USA")
processed_map

In [15]:

processed_map.add_layer(agg_result)

Thus we transformed a bunch of ponints into tracks that represents paths taken by the hurricanes over a period of 50 years. We can pull up another map and inspect the results a bit more closely

Our input data and the map widget is time enabled. Thus we can filter the data to represent the tracks from only the years 1860 to 1870

In [16]:

processed_map.set_time_extent('1860', '1870')

What can geoanalytics do for you?¶

With this sample we just scratched the surface of what big data analysis can do for you. ArcGIS Enterprise at 10.5 packs a powerful set of tools that let you derive a lot of value from your data. You can do so by asking the right questions, for instance, a weather dataset such as this could be used to answer a few interesting questions such as

did the number of hurricanes per season increase over the years?
give me the hurricanes that travelled longest distance
give me the ones that stayed for longest time. Do we see a trend?
how are wind speed and distance travelled correlated?
my assets are located in a tornado corridor. How many times in the past century, was there a hurricane within 50 miles from my assets?
my industry is dependent on tourism, which is heavily impacted by the vagaries of weather. From historical weather data, can I correlate my profits with major weather events? How well is my business insulated from freak weather events?
over the years do we see any shifts in major weather events - do we notice a shift in when the hurricane season starts?

The ArcGIS API for Python gives you a gateway to easily access the big data tools from your ArcGIS Enterprise. By combining it with other powerful libraries from the pandas and scipy stack and the rich visualization capabilities of the Jupyter notebook, you can extract a lot of value from your data, big or small.