- Data source and anlysis process are from Dataquest--by Lu Tang
import pandas as pd
airlines = pd.read_csv('airlines.csv')
airports = pd.read_csv('airports.csv')
routes = pd.read_csv('routes.csv')
airports.head(2)
id | name | city | country | code | icao | latitude | longitude | altitude | offset | dst | timezone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Goroka | Goroka | Papua New Guinea | GKA | AYGA | -6.081689 | 145.391881 | 5282 | 10.0 | U | Pacific/Port_Moresby |
1 | 2 | Madang | Madang | Papua New Guinea | MAG | AYMD | -5.207083 | 145.788700 | 20 | 10.0 | U | Pacific/Port_Moresby |
airlines.head(2)
id | name | alias | iata | icao | callsign | country | active | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Private flight | \N | - | NaN | NaN | NaN | Y |
1 | 2 | 135 Airways | \N | NaN | GNL | GENERAL | United States | N |
routes.head(2)
airline | airline_id | source | source_id | dest | dest_id | codeshare | stops | equipment | |
---|---|---|---|---|---|---|---|---|---|
0 | 2B | 410 | AER | 2965 | KZN | 2990 | NaN | 0 | CR2 |
1 | 2B | 410 | ASF | 2966 | KZN | 2990 | NaN | 0 | CR2 |
In most cases, we want to visualize latitude and longitude points on two-dimensional maps. Two-dimensional maps are faster to render, easier to view on a computer and distribute, and are more familiar to the experience of popular mapping software like Google Maps. Latitude and longitude values describe points on a sphere, which is three-dimensional. To plot the values on a two-dimensional plane, we need to convert the coordinates to the Cartesian coordinate system using a map projection.
A map projection transforms points on a sphere to a two-dimensional plane. When projecting down to the two-dimensional plane, some properties are distorted. Each map projection makes trade-offs in what properties to preserve and you can read about the different trade-offs here. We'll use the Mercator projection, because it is commonly used by popular mapping software.
Before we convert our flight data to Cartesian coordinates and plot it, let's learn more about the basemap toolkit. Basemap is an extension to Matplotlib that makes it easier to work with geographic data. The documentation for basemap provides a good high-level overview of what the library does:
The matplotlib basemap toolkit is a library for plotting 2D data on maps in Python. Basemap does not do any plotting on it’s own, but provides the facilities to transform coordinates to one of 25 different map projections.
Basemap makes it easy to convert from the spherical coordinate system (latitudes & longitudes) to the Mercator projection. While basemap uses Matplotlib to actually draw and control the map, the library provides many methods that enable us to work with maps quickly.
Because basemap uses matplotlib, you'll want to import matplotlib.pyplot into your environment when you use Basemap.
Steps:
- Create a new basemap instance with the specific map projection we want to use and how much of the map we want included.
- Convert spherical coordinates to Cartesian coordinates using the basemap instance.
- Use the matplotlib and basemap methods to customize the map.
- Display the map.
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
# Create a new basemap instance
m = Basemap(projection='merc',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-180,urcrnrlon=180)
As we mentioned before, we need to convert latitude and longitude values to Cartesian coordinates to display them on a two-dimensional map. We can pass in a list of latitude and longitude values into the basemap instance and it will return back converted lists of longitude and latitude values using the projection we specified earlier.
The constructor only accepts list values, so we'll need to use Series.tolist() to convert the longitude and latitude columns from the airports dataframe to lists. Then, we pass them to the basemap instance with the longitude values first then the latitude values:
x, y = m(longitudes, latitudes)
The basemap object will return 2 list objects, which we assign to x and y.
# convert latitude and longitude values to Cartesian coordinates
longitudes = airports["longitude"].tolist()
latitudes = airports["latitude"].tolist()
x, y = m(longitudes, latitudes)
A scatter plot is the simplest way to plot points on a map, where each point is represented as an (x, y) coordinate pair. To create a scatter plot from a list of x and y coordinates, we use the basemap.scatter() method.
m.scatter(x,y) The basemap.scatter() method has similar parameters to the pyplot.scatter(). For example, we can customize the size of each marker using the s parameter:
# plot the map
m.scatter(x, y, s=4)
plt.show()
# put everything together with coastlines
fig, ax = plt.subplots(figsize=(15,20))
plt.title("Scaled Up Earth With Coastlines")
m = Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180, urcrnrlon=180)
longitudes = airports["longitude"].tolist()
latitudes = airports["latitude"].tolist()
x, y = m(longitudes, latitudes)
m.scatter(x, y, s=1)
m.drawcoastlines()
plt.show()
geo_routes = pd.read_csv("geo_routes.csv")
print(geo_routes.info())
geo_routes.head(5)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 67428 entries, 0 to 67427 Data columns (total 8 columns): airline 67428 non-null object source 67428 non-null object dest 67428 non-null object equipment 67410 non-null object start_lon 67428 non-null float64 end_lon 67428 non-null float64 start_lat 67428 non-null float64 end_lat 67428 non-null float64 dtypes: float64(4), object(4) memory usage: 4.1+ MB None
airline | source | dest | equipment | start_lon | end_lon | start_lat | end_lat | |
---|---|---|---|---|---|---|---|---|
0 | 2B | AER | KZN | CR2 | 39.956589 | 49.278728 | 43.449928 | 55.606186 |
1 | 2B | ASF | KZN | CR2 | 48.006278 | 49.278728 | 46.283333 | 55.606186 |
2 | 2B | ASF | MRV | CR2 | 48.006278 | 43.081889 | 46.283333 | 44.225072 |
3 | 2B | CEK | KZN | CR2 | 61.503333 | 49.278728 | 55.305836 | 55.606186 |
4 | 2B | CEK | OVB | CR2 | 61.503333 | 82.650656 | 55.305836 | 55.012622 |
To better understand the flight routes, we can draw great circles to connect starting and ending locations on a map. A great circle is the shortest circle connecting 2 points on a sphere.
We use the basemap.drawgreatcircle() method to display a great circle between 2 points. The basemap.drawgreatcircle() method requires four parameters in the following order:
- lon1 - longitude of the starting point.
- lat1 - latitude of the starting point.
- lon2 - longitude of the ending point.
- lat2 - latitude of the ending point.
Unfortunately, basemap struggles to create great circles for routes that have an absolute difference of larger than 180 degrees for either the latitude or longitude values. This is because the basemap.drawgreatcircle() method isn't able to create great circles properly when they go outside of the map boundaries.
# plotting for geo_routes
fig, ax = plt.subplots(figsize=(15,20))
m = Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180, urcrnrlon=180)
m.drawcoastlines()
def create_great_circles(df):
# Iterate over the rows in the dataframe using DataFrame.iterrows()
for index, row in df.iterrows():
end_lat, start_lat = row['end_lat'], row['start_lat']
end_lon, start_lon = row['end_lon'], row['start_lon']
# Draw a great circle using the four geographic coordinates only if:
# The absolute difference between the latitude values (end_lat and start_lat) is less than 180.
# If the absolute difference between the longitude values (end_lon and start_lon) is less than 180.
if abs(end_lat - start_lat) < 180:
if abs(end_lon - start_lon) < 180:
# use the basemap.drawgreatcircle() method to display a great circle between 2 points
m.drawgreatcircle(start_lon, start_lat, end_lon, end_lat)
# Create a filtered dataframe containing just the routes that start at the DFW airport.
dfw = geo_routes[geo_routes['source'] == "DFW"]
# Pass dfw into create_great_circles()
create_great_circles(dfw)
plt.show()