Transport for NSW (TfNSW) has very recently launched its Open Data website providing API access to its many static and realtime data sets. This notebook demonstrates how to access the current position of all tracked buses on the Sydney network.
Some familiarity with the Python programming language and HTTP requests will be helpful.
An application is a bunch of settings that govern how you can interact with the various TfNSW APIs. For this demonstration, add a new application. Here are the basic steps.
(By now you have probably clicked something and been redirected back to the Dashboard page instead of your destination. This is happening to a lot of people at the moment (20 April) and TfNSW is working on it. All you can do is retrace your steps and try again.)
Now that your application is created, you'll need to select Edit from the Actions dropdown menu next to your application's name. Go to the Auth tab and copy down the API Key and Shared Secret.
Every API request needs to be authorised to a particular application. For our purposes, the simplest way to do this is to pass TfNSW our API key and shared secret, and have them generate a token we can use to authorise our requests.
As detailed on the API basics page, we can get a bearer token back in response to a POST request like:
https://apikey:sharedsecret@api.transport.nsw.gov.au/auth/oauth/v2/token?grant_type=client_credentials&scope=user
We'll use Python's Requests for this:
import requests
# Replace with your own information:
api_key = '17xx11111112222222333333344444455555'
shared_secret = 'a1b2c3d4e5f6b9999988877766655544'
# Leave these:
payload = 'grant_type=client_credentials&scope=user'
auth_url = 'api.transport.nsw.gov.au/auth/oauth/v2/token'
# Send a POST request to get the token back:
token_response = requests.post(('https://{}:{}@{}').format(api_key, shared_secret, auth_url), params=payload)
TfNSW will send back something like this:
{u'access_token': u'e1311756-ed35-456d-9f68-d5d970df2c2d',
u'expires_in': 3600,
u'scope': u'user',
u'token_type': u'Bearer'}
The access_token is what we need. Our requests have to have an HTTP header named Authorization with the value Bearer e1311756-ed35-456d-9f68-d5d970df2c2d (or whatever your bearer token is). So let's create that header now:
bearer_token = "Bearer " + token_response.json()['access_token']
print(bearer_token)
# Set the headers for our next request:
headers = {"Authorization":bearer_token}
Bearer e1311756-ed35-456d-9f68-d5d970df2c2d
Here's where we make the actual request. As we can see from the API explorer, we're sending our request to:
https://api.transport.nsw.gov.au/v1/gtfs/vehiclepos/buses
Depending on server load, this request may take some time to fulfill:
bus_positions = requests.get('https://api.transport.nsw.gov.au/v1/gtfs/vehiclepos/buses', headers=headers)
print("Retrieved {} bytes").format(len(bus_positions.content))
Retrieved 186739 bytes
Realtime bus data comes back in a binary (machine-readable) format called GTFS-realtime. This makes perfect sense but can be a bit of a bugger to deal with. You'll need the following Python libraries:
Just calling pip install gtfs-realtime-bindings protobuf worked for me.
Let's import the GTFS-realtime library, then create a new FeedMessage object (note: this step could fail if the API returns a truncated response, which happens sometimes):
from google.transit import gtfs_realtime_pb2
feed = gtfs_realtime_pb2.FeedMessage()
feed.ParseFromString(bus_positions.content)
We can now look in the feed object, which contains entities that are the individual vehicles. Let's take the first five entities and list their latitude, longitude, bearing (direction) and speed:
for entity in feed.entity[:5]:
print(entity.vehicle.position.latitude,
entity.vehicle.position.longitude,
entity.vehicle.position.bearing,
entity.vehicle.position.speed
)
(-33.730384826660156, 150.97987365722656, 249.0, 1.2000000476837158) (-33.766117095947266, 151.01039123535156, 253.0, 22.200000762939453) (-33.759037017822266, 151.04502868652344, 83.0, 19.200000762939453) (-34.049461364746094, 150.76402282714844, 129.0, 0.0) (-34.049415588378906, 150.75608825683594, 204.0, 9.800000190734863)
The feed only contains vehicle positions, but other GTFS-realtime feeds can also include trip updates and alerts. Here's a more complete list of fields and what they contain in the context of buses (though you should refer to the GTFS-realtime specification for full details):
Let's put the feed into a file we can use elsewhere:
import csv
positions_output = [] # a list of lists for bus position data
# put each bus's key data into the list
for entity in feed.entity:
positions_output.append([entity.vehicle.timestamp,
entity.id,
entity.vehicle.trip.route_id,
entity.vehicle.trip.trip_id,
entity.vehicle.trip.start_time,
entity.vehicle.trip.start_date,
entity.vehicle.position.latitude,
entity.vehicle.position.longitude,
entity.vehicle.position.bearing,
entity.vehicle.position.speed,
entity.vehicle.position.speed*3.6, #speed in km/h, for convenience
entity.vehicle.trip.schedule_relationship,
entity.vehicle.congestion_level,
entity.vehicle.occupancy_status,
entity.vehicle.trip.route_id[5:] # extracting the route number with string slicing
])
# write the bus position data to the positions.csv
with open("positions.csv", "wb") as f:
writer = csv.writer(f)
writer.writerow(['timestamp','vehicle_id','route_id','trip_id',
'start_time','start_date','latitude','longitude',
'bearing','speed_ms','speed_kmh', 'schedule_relationship','congestion_level',
'occupancy_status','route_number'])
writer.writerows(positions_output)
Having CSV data is great, but we can go a step further and add the vehicles to a map directly. For this I'm using the Bokeh module (pip install bokeh) and adapting the Mapping Geo Data tutorial.
Bokeh works with GeoJSON data, so we'll have to convert our list of buses to a GeoJSON object, e.g.:
{
"type": "FeatureCollection",
"features": [
{
"geometry": {
"type": "Point",
"coordinates": [
150.97987365723,
-33.73038482666
]
},
"type": "Feature",
"properties": {
"route": "619"
}
}
]
}
We'll create a feature_collection dictionary and add each bus as a point, with its route as a property, then convert that dictionary to a JSON string:
# create the dictionary
feature_collection = {}
feature_collection['type'] = "FeatureCollection"
feature_collection['features'] = []
for bus in positions_output:
if bus[7] > 0: # included to filter out any buses which report their position as [0,0]
this_object = {"type":"Feature",
"geometry":{"type":"Point",
"coordinates": [bus[7], bus[6]]
},
"properties": {"route":bus[14]}
}
feature_collection['features'].append(this_object)
# dump the dictionary to a json string
import json
bus_geojson = json.dumps(feature_collection)
print('Result: "{}..."'.format(bus_geojson[:200])) # print a little bit of the result; you could also write to a file
Result: "{"type": "FeatureCollection", "features": [{"geometry": {"type": "Point", "coordinates": [150.97987365722656, -33.730384826660156]}, "type": "Feature", "properties": {"route": "619"}}, {"geometry": {"..."
Now let's plot the bus_geojson object to Google Maps. The following block will save an HTML file like this one you can open and explore. Bokeh will do inline plotting for notebooks like this, but I was having trouble getting it to display, so instead I've embedded an image of the resulting map.
The code below creates a map centred on Sydney, loads bus_geojson as the data source, and adds a blue circle for each bus. Think about how you could extend this: filtering by route number, adding hover labels with route number, changing colour to represent speed, or using an arrow to display the bus's direction.
from bokeh.plotting import save
from bokeh.io import output_file
from bokeh.models import ( GeoJSONDataSource, GMapPlot, GMapOptions, ColumnDataSource, Circle,
DataRange1d, PanTool, WheelZoomTool, BoxSelectTool, HoverTool
)
map_options = GMapOptions(lat=-33.87, lng=151.1, map_type="roadmap", zoom=11)
geo_source = GeoJSONDataSource(geojson=bus_geojson)
circle = Circle(x="x", y="y", size=4, fill_color="blue", fill_alpha=0.8, line_color=None)
plot = GMapPlot(
x_range=DataRange1d(), y_range=DataRange1d(), map_options=map_options, title="Sydney",
plot_width=900, plot_height=900
)
plot.add_glyph(geo_source, circle)
plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool())
output_file('bus-positions.html')
save(plot)
This is just a brief example of what you can do with a single API. The other TfNSW APIs offer loads of data, far more than I can go into here and which I'm honestly still just exploring.
The Open Data service may be in its infancy, but is already displaying great promise. I hope once the early bugs and missing/truncated data issues are worked out, that many people will find interesting and useful purposes for the service.