This notebook demonstrates the current development version of MovingPandas.
For tutorials using the latest release visit https://github.com/movingpandas/movingpandas-examples.
import urllib
import os
import pandas as pd
import geopandas as gpd
from geopandas import GeoDataFrame, read_file
from shapely.geometry import Point, LineString, Polygon
from fiona.crs import from_epsg
from datetime import datetime, timedelta
from matplotlib import pyplot as plt
import sys
sys.path.append("..")
import movingpandas as mpd
mpd.show_versions()
import warnings
#warnings.simplefilter("ignore")
from movingpandas.trajectory import TimeZoneWarning
warnings.filterwarnings("ignore", category=TimeZoneWarning)
df = pd.DataFrame([
{'geometry':Point(0,0), 't':datetime(2018,1,1,12,0,0)},
{'geometry':Point(6,0), 't':datetime(2018,1,1,12,6,0)},
{'geometry':Point(6,6), 't':datetime(2018,1,1,12,10,0)},
{'geometry':Point(9,9), 't':datetime(2018,1,1,12,15,0)}
]).set_index('t')
geo_df = GeoDataFrame(df, crs=31256)
toy_traj = mpd.Trajectory(geo_df, 1)
toy_traj.df
toy_traj.to_line_gdf()
toy_traj.to_traj_gdf(wkt=True)
We can access key information about our trajectory by looking at the print output:
print(toy_traj)
Another useful piece of information is the sampling interval (median time difference between records):
toy_traj.get_sampling_interval()
We can also access the trajectories GeoDataFrame:
toy_traj.df
We can compute the distance, speed, and acceleration of movement along the trajectory (between consecutive points). The default distance units are meters (or CRS units, if the CRS units are not known or specified), and the default time units are seconds:
toy_traj.add_distance(overwrite=True).df
toy_traj.add_speed(overwrite=True).df
toy_traj.add_acceleration(overwrite=True).df
If you want to use different units, you can specify them. Allowed units include metric units from mm to km, imperial units from inch to mile, nautical miles, and non-standard units which are used as CRS distance units e.g. US Survey units.
toy_traj.add_distance(overwrite=True, name="distance (km)", units="km")
toy_traj.add_distance(overwrite=True, name="distance (yards)", units="yd")
toy_traj.add_speed(overwrite=True, name="speed (ft/min)", units=("ft", "min"))
toy_traj.add_speed(overwrite=True, name="speed (knots)", units=("nm", "h"))
toy_traj.add_acceleration(overwrite=True, name="acceleration (mph/s)", units=("mi", "h", "s"))
toy_traj.df
To visualize the trajectory, we can turn it into a linestring.
(The notebook environment automatically plots Shapely geometry objects like the LineString returned by to_linestring().)
toy_traj.to_linestring()
We can also visualize the speed values:
toy_traj.plot(column="speed", linewidth=5, capstyle='round', legend=True)
In contrast to the earlier example where we visualized the whole trajectory as one linestring, the trajectory plot() function draws each line segment individually and thus each can have a different color.
hvplot_defaults = {'tiles':None, 'frame_height':320, 'frame_width':320, 'cmap':'Viridis', 'colorbar':True}
toy_traj.hvplot(c="speed", **hvplot_defaults)
For example, let's have a look at the get_position_at() function:
toy_traj.get_position_at(datetime(2018,1,1,12,6,0), method="nearest")
To see its coordinates, we can look at the print output:
print(toy_traj.get_position_at(datetime(2018,1,1,12,6,0), method="nearest"))
The method parameter describes what the function should do if there is no entry in the trajectory GeoDataFrame for the specified timestamp.
For example, there is no entry at 2018-01-01 12:07:00
toy_traj.df
print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method="nearest"))
print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method="interpolated"))
print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method="ffill")) # from the previous row
print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method="bfill")) # from the following row
If the timestamp falls outside the time range between trajectory start and end time, we get an error:
try:
toy_traj.get_position_at(datetime(2018,1,1,13,0,0))
except ValueError as e:
print(f"ValueError: {e}")
pt = Point(1, 5)
line = LineString([(3,3), (3,9)])
pt_gdf = GeoDataFrame(pd.DataFrame([{'geometry':pt, 'id':1}]))
line_gdf = GeoDataFrame(pd.DataFrame([{'geometry':line, 'id':1}]))
ax = toy_traj.plot()
pt_gdf.plot(ax=ax, color='red')
line_gdf.plot(ax=ax, color='red')
print(f'Distance: {toy_traj.distance(pt)}')
print(f'Hausdorff distance: {toy_traj.hausdorff_distance(pt):.2f}')
pt.distance(Point(9, 9))
print(f'Distance: {toy_traj.distance(line)}')
print(f'Hausdorff distance: {toy_traj.hausdorff_distance(line)}')
First, let's extract the trajectory segment for a certain time period:
segment = toy_traj.get_segment_between(datetime(2018,1,1,12,6,0),datetime(2018,1,1,12,12,0))
print(segment)
ax = toy_traj.plot()
segment.plot(ax=ax, color='red', linewidth=5)
Now, let's extract the trajectory segment that intersects with a given polygon:
xmin, xmax, ymin, ymax = 2, 8, -10, 5
polygon = Polygon([(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), (xmin, ymin)])
polygon_gdf = GeoDataFrame(pd.DataFrame([{'geometry':polygon, 'id':1}]), crs=4326)
intersections = toy_traj.clip(polygon)
intersections
ax = toy_traj.plot()
polygon_gdf.plot(ax=ax, color='lightgray')
intersections.plot(ax=ax, color='red', linewidth=5)
The MovingPandas repository contains a demo GeoPackage file that can be loaded as follows:
%%time
gdf = read_file('data/demodata_geolife.gpkg')
print("Finished reading {} rows".format(len(df)))
After reading the trajectory point data from file, we want to construct the trajectories.
TrajectoryCollection is a convenience class that takes care of creating trajectories from a GeoDataFrame:
traj_collection = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
print(traj_collection)
traj_collection.plot(column='trajectory_id', legend=True, figsize=(9,5))
traj_collection.to_point_gdf()
traj_collection.to_line_gdf()
traj_collection.to_traj_gdf(wkt=True)
These GeoDataFrames can be exported to different file formats using GeoPandas, as documented in https://geopandas.org/docs/user_guide/io.html
traj_collection.to_traj_gdf(wkt=True).to_file("temp.gpkg", layer='trajectories', driver="GPKG")
read_file('temp.gpkg').plot()
my_traj = traj_collection.trajectories[1]
print(my_traj)
plot_defaults = {'linewidth':5, 'capstyle':'round', 'figsize':(9,3), 'legend':True}
my_traj.plot(column='speed', vmax=20, **plot_defaults)
To visualize trajectories in their geographical context, we can also create interactive plots with basemaps:
hvplot_defaults = {'tiles':'CartoLight', 'frame_height':400, 'frame_width':700, 'cmap':'Viridis', 'colorbar':True}
my_hvplot = my_traj.hvplot(c='speed', line_width=7.0, clim=(0,20), **hvplot_defaults)
my_hvplot
And even put other layers on top:
my_hvplot * gpd.GeoDataFrame([my_traj.get_row_at(datetime(2009,6,29,8,0,0))]).hvplot(geo=True, size=200, color='red')
xmin, xmax, ymin, ymax = 116.3685035,116.3702945,39.904675,39.907728
polygon = Polygon([(xmin,ymin), (xmin,ymax), (xmax,ymax), (xmax,ymin), (xmin,ymin)])
intersections = []
for traj in traj_collection:
for intersection in traj.clip(polygon):
intersections.append(intersection)
print("Found {} intersections".format(len(intersections)))
intersections[2].plot(linewidth=5.0, capstyle='round')
Alternatively, using TrajectoryCollection:
clipped = traj_collection.clip(polygon)
clipped.trajectories[2].plot(linewidth=5.0, capstyle='round')
Gaps are quite common in trajectories. For example, GPS tracks may contain gaps if moving objects enter tunnels where GPS reception is lost. In other use cases, moving objects may leave the observation area for longer time before returning and continuing their recorded track.
Depending on the use case, we therefore might want to split trajectories at observation gaps that exceed a certain minimum duration:
my_traj = traj_collection.trajectories[1]
print(my_traj)
my_traj.plot(column='speed', vmax=20, **plot_defaults)
split = mpd.ObservationGapSplitter(my_traj).split(gap=timedelta(minutes=5))
for traj in split:
print(traj)
fig, axes = plt.subplots(nrows=1, ncols=len(split), figsize=(19,4))
for i, traj in enumerate(split):
traj.plot(ax=axes[i], column='speed', vmax=20, **plot_defaults)
split = mpd.StopSplitter(my_traj).split(min_duration=timedelta(minutes=1), max_diameter=30, min_length=500)
for traj in split:
print(traj)
fig, axes = plt.subplots(nrows=1, ncols=len(split), figsize=(19,4))
for i, traj in enumerate(split):
traj.plot(ax=axes[i], column='speed', vmax=20, **plot_defaults)
Alternatively, using the whole TrajectoryCollection:
split = mpd.ObservationGapSplitter(traj_collection).split(gap=timedelta(minutes=15))
print(split)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16,4))
traj_collection.plot(ax=axes[0], column='trajectory_id', **plot_defaults)
axes[0].set_title('Original Trajectories')
split.plot(ax=axes[1], column='trajectory_id', **plot_defaults)
axes[1].set_title('Split Trajectories')
To reduce the size of trajectory objects, we can generalize them, for example, using the Douglas-Peucker algorithm:
original_traj = traj_collection.trajectories[1]
print(original_traj)
original_traj.plot(column='speed', vmax=20, **plot_defaults)
Try different tolerance settings and observe the results in line geometry and therefore also length:
dp_generalized = mpd.DouglasPeuckerGeneralizer(original_traj).generalize(tolerance=0.001)
dp_generalized.plot(column='speed', vmax=20, **plot_defaults)
print('Original length: %s'%(original_traj.get_length()))
print('Generalized length: %s'%(dp_generalized.get_length()))
An alternative generalization method is to down-sample the trajectory to ensure a certain time delta between records:
time_generalized = mpd.MinTimeDeltaGeneralizer(original_traj).generalize(tolerance=timedelta(minutes=1))
time_generalized.plot(column='speed', vmax=20, **plot_defaults)
time_generalized.to_point_gdf().head(10)
original_traj.to_point_gdf().head(10)
tdtr_generalized = mpd.TopDownTimeRatioGeneralizer(original_traj).generalize(tolerance=0.001)
Let's compare this to the basic Douglas-Peucker result:
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(19,4))
tdtr_generalized.plot(ax=axes[0], column='speed', vmax=20, **plot_defaults)
dp_generalized.plot(ax=axes[1], column='speed', vmax=20, **plot_defaults)
We use the speed-based OutlierCleaner
that cuts away spikes in the trajectory when the speed exceeds the provided speed threshold value. If no speed threshold is provided, p95*alpha is used (where p95 is the 95th percentile of speed values and alpha is 3).
cleaned = mpd.OutlierCleaner(split).clean(alpha=2) # .clean(v_max=100, units=("km", "h"))
cleaned.add_speed(units=("km", "h"), overwrite=True)
cleaned
hvplot_defaults = {'tiles':'CartoLight', 'frame_height':400, 'frame_width':500, 'cmap':'Viridis', 'colorbar':True}
( split.trajectories[10].hvplot(title='Trajectory Cleaning - Before & After', label='Before', color='red', line_width=4, **hvplot_defaults) *
cleaned.trajectories[10].hvplot(label='After', c='speed', tiles=None, line_width=4 ) )
We use the KalmanSmootherCV
trajectory smoother to smooth all trajectories based on the assumption of a nearly-constant velocity (CV) model. The process_noise_std
and measurement_noise_std
parameters can be used to tune the smoother:
process_noise_std
governs the uncertainty associated with the adherence of the new (smooth) trajectories to the CV model assumption; higher values relax the assumption, therefore leading to less-smooth trajectories, and vice-versa.measurement_noise_std
controls the assumed error in the original trajectories; higher values dictate that the original trajectories are expected to be noisier (and therefore, less reliable), thus leading to smoother trajectories, and vice-versa.Try tuning these parameters and observe the resulting trajectories:
smooth = mpd.KalmanSmootherCV(split).smooth(process_noise_std=0.1, measurement_noise_std=10)
print(smooth)
Let's visually compare the smooth and original (split) trajectories.
hvplot_defaults = {'tiles':'CartoLight', 'frame_height':320, 'frame_width':320, 'cmap':'Viridis', 'colorbar':True}
kwargs = {**hvplot_defaults, 'line_width':4}
( split.trajectories[9].hvplot(title='Original Trajectories', **kwargs) +
smooth.trajectories[9].hvplot(title='Smooth Trajectories', **kwargs))
And finally, let's compare the calculated speeds:
kwargs = {**hvplot_defaults, 'c':'speed', 'line_width':7, 'clim':(0,2)}
smooth.add_speed(overwrite=True)
(split.trajectories[9].hvplot(title='Original Trajectories', **kwargs) +
smooth.trajectories[9].hvplot(title='Smooth Trajectories', **kwargs))
As expected, the smooth trajectories are significantly less rugged (i.e. noisy) and exhibit smoother velocity transitions.