#!/usr/bin/env python # coding: utf-8 # How to access realtime bus positions from Transport for NSW # ======= # # Transport for NSW (TfNSW) has very recently launched its [Open Data website](https://opendata.transport.nsw.gov.au/site/en_us/home.html) providing API access to its many static and realtime data sets. This notebook demonstrates how to access the current position of all tracked buses on the Sydney network. # # Some familiarity with the Python programming language and HTTP requests will be helpful. # # Getting access to the Open Data site # ------- # # * [Register for your free account](https://opendata.transport.nsw.gov.au/app/registration.html) and confirm your email address. # * [Follow the user guide closely!](https://opendata.transport.nsw.gov.au/site/en_us/gs-intro.html) Don't be like me and assume you can work the controls out; you'll miss something simple and wonder why your API requests are all being denied. # # Creating your application # ------- # # An application is a bunch of settings that govern how you can interact with the various TfNSW APIs. For this demonstration, [add a new application](https://opendata.transport.nsw.gov.au/app/applications.html#add). Here are the basic steps. # # 1. Information tab # * Add a name and description. # 2. API Management tab # * Click **Add** next to **Public Transport - Realtime - Vehicle Positions**. # * Accept the Terms and Conditions. # 3. Auth # * You must provide a Callback/Redirect URL (this could even be http://127.0.0.1 I believe). # * Your scope must be **user**. # * Your application type must be **Confidential**. # * Click **Create**. # # (By now you have probably clicked something and been redirected back to the Dashboard page instead of your destination. This is happening to a lot of people at the moment (20 April) and TfNSW is working on it. All you can do is retrace your steps and try again.) # # Getting authorised # ------- # # Now that your application is created, you'll need to select **Edit** from the Actions dropdown menu next to your application's name. Go to the **Auth** tab and copy down the **API Key** and **Shared Secret**. # # Every API request needs to be authorised to a particular application. For our purposes, the simplest way to do this is to pass TfNSW our API key and shared secret, and have them generate a token we can use to authorise our requests. # # [As detailed on the API basics page](https://opendata.transport.nsw.gov.au/site/en_us/gs-api-basics.html), we can get a bearer token back in response to a POST request like: # # https://apikey:sharedsecret@api.transport.nsw.gov.au/auth/oauth/v2/token?grant_type=client_credentials&scope=user # # We'll use Python's [Requests](http://docs.python-requests.org/en/master/) for this: # In[1]: import requests # Replace with your own information: api_key = '17xx11111112222222333333344444455555' shared_secret = 'a1b2c3d4e5f6b9999988877766655544' # Leave these: payload = 'grant_type=client_credentials&scope=user' auth_url = 'api.transport.nsw.gov.au/auth/oauth/v2/token' # Send a POST request to get the token back: token_response = requests.post(('https://{}:{}@{}').format(api_key, shared_secret, auth_url), params=payload) # TfNSW will send back something like this: # # {u'access_token': u'e1311756-ed35-456d-9f68-d5d970df2c2d', # u'expires_in': 3600, # u'scope': u'user', # u'token_type': u'Bearer'} # # The access_token is what we need. Our requests have to have an [HTTP header](https://www.httpwatch.com/httpgallery/headers/) named **Authorization** with the value **Bearer e1311756-ed35-456d-9f68-d5d970df2c2d** (or whatever your bearer token is). So let's create that header now: # In[2]: bearer_token = "Bearer " + token_response.json()['access_token'] print(bearer_token) # Set the headers for our next request: headers = {"Authorization":bearer_token} # Getting the bus data # ------- # Here's where we make the actual request. As we can see from the [API explorer](https://opendata.transport.nsw.gov.au/app/api-explorer.html), we're sending our request to: # # https://api.transport.nsw.gov.au/v1/gtfs/vehiclepos/buses # # Depending on server load, this request may take some time to fulfill: # In[3]: bus_positions = requests.get('https://api.transport.nsw.gov.au/v1/gtfs/vehiclepos/buses', headers=headers) print("Retrieved {} bytes").format(len(bus_positions.content)) # ## Parsing the response # Realtime bus data comes back in a binary (machine-readable) format called [GTFS-realtime](https://developers.google.com/transit/gtfs-realtime/). This makes perfect sense but can be a bit of a bugger to deal with. You'll need the following Python libraries: # # * [gtfs-realtime-bindings](https://github.com/google/gtfs-realtime-bindings/blob/master/python/README.md): for handling the GTFS-realtime format # * [protobuf](https://developers.google.com/protocol-buffers/): for handling Protocol Buffers generally # # Just calling **pip install gtfs-realtime-bindings protobuf** worked for me. # # Let's import the GTFS-realtime library, then create a new FeedMessage object (note: this step could fail if the API returns a truncated response, which happens sometimes): # In[4]: from google.transit import gtfs_realtime_pb2 feed = gtfs_realtime_pb2.FeedMessage() feed.ParseFromString(bus_positions.content) # We can now look in the **feed** object, which contains **entities** that are the individual vehicles. Let's take the first five entities and list their latitude, longitude, bearing (direction) and speed: # In[5]: for entity in feed.entity[:5]: print(entity.vehicle.position.latitude, entity.vehicle.position.longitude, entity.vehicle.position.bearing, entity.vehicle.position.speed ) # Field reference # ------- # # The feed only contains vehicle positions, but other GTFS-realtime feeds can also include trip updates and alerts. Here's a more complete list of fields and what they contain in the context of buses (though you should [refer to the GTFS-realtime specification](https://developers.google.com/transit/gtfs-realtime/reference) for full details): # # * entity.id: The vehicle's feed-unique identifying code # * entity.vehicle.timestamp: Moment at which the vehicle's position was measured # * entity.vehicle.trip.route\_id: The bus's route. The digits after the underscore are the route number, e.g. 2441\_**M50**. Corresponds with a separate GTFS list of routes. # * entity.vehicle.trip.trip_id: A trip usually along a route. Corresponds with a separate GTFS list of trips. # * entity.vehicle.trip.start_time: The *scheduled* start time. # * entity.vehicle.trip.start_date: The *scheduled* start date. # * entity.vehicle.position.latitude: Latitude last reported. # * entity.vehicle.position.longitude: Longitude last reported. # * entity.vehicle.position.bearing: The vehicle direction, in degrees east of true north. # * entity.vehicle.position.speed: The instantaneous vehicle speed, in **meters per second** (not kph). # * entity.vehicle.trip.schedule_relationship: Whether the vehicle is a regular service (0), added (1), unscheduled (2) or cancelled (3). # * entity.vehicle.congestion_level: How bad the traffic around this vehicle is. (I'm not sure whether this field is actually used; even in high-traffic times it seems to be 1 (RUNNING_SMOOTHLY).) [Full values here.](https://developers.google.com/transit/gtfs-realtime/reference#CongestionLevel) # * entity.vehicle.occupancy_status: How full the vehicle is, from 0 (empty) to 7 (not accepting passengers). # # Exporting to a CSV # ------- # # Let's put the feed into a file we can use elsewhere: # In[6]: import csv positions_output = [] # a list of lists for bus position data # put each bus's key data into the list for entity in feed.entity: positions_output.append([entity.vehicle.timestamp, entity.id, entity.vehicle.trip.route_id, entity.vehicle.trip.trip_id, entity.vehicle.trip.start_time, entity.vehicle.trip.start_date, entity.vehicle.position.latitude, entity.vehicle.position.longitude, entity.vehicle.position.bearing, entity.vehicle.position.speed, entity.vehicle.position.speed*3.6, #speed in km/h, for convenience entity.vehicle.trip.schedule_relationship, entity.vehicle.congestion_level, entity.vehicle.occupancy_status, entity.vehicle.trip.route_id[5:] # extracting the route number with string slicing ]) # write the bus position data to the positions.csv with open("positions.csv", "wb") as f: writer = csv.writer(f) writer.writerow(['timestamp','vehicle_id','route_id','trip_id', 'start_time','start_date','latitude','longitude', 'bearing','speed_ms','speed_kmh', 'schedule_relationship','congestion_level', 'occupancy_status','route_number']) writer.writerows(positions_output) # Visualising data # ------- # Having CSV data is great, but we can go a step further and add the vehicles to a map directly. For this I'm using the [Bokeh](http://bokeh.pydata.org/en/0.10.0/index.html) module (pip install bokeh) and adapting the [Mapping Geo Data](http://bokeh.pydata.org/en/0.11.1/docs/user_guide/geo.html) tutorial. # # Bokeh works with [GeoJSON](http://geojson.org/) data, so we'll have to convert our list of buses to a GeoJSON object, e.g.: # # { # "type": "FeatureCollection", # "features": [ # { # "geometry": { # "type": "Point", # "coordinates": [ # 150.97987365723, # -33.73038482666 # ] # }, # "type": "Feature", # "properties": { # "route": "619" # } # } # ] # } # # We'll create a feature_collection dictionary and add each bus as a point, with its route as a property, then convert that dictionary to a JSON string: # In[7]: # create the dictionary feature_collection = {} feature_collection['type'] = "FeatureCollection" feature_collection['features'] = [] for bus in positions_output: if bus[7] > 0: # included to filter out any buses which report their position as [0,0] this_object = {"type":"Feature", "geometry":{"type":"Point", "coordinates": [bus[7], bus[6]] }, "properties": {"route":bus[14]} } feature_collection['features'].append(this_object) # dump the dictionary to a json string import json bus_geojson = json.dumps(feature_collection) print('Result: "{}..."'.format(bus_geojson[:200])) # print a little bit of the result; you could also write to a file # Now let's plot the bus_geojson object to Google Maps. The following block will save [an HTML file like this one](https://gist.githubusercontent.com/timbennett/7ec739fc619459316859d3875b76a76b/raw/f88c87af59e26edb5d27c50c86b911c52faa9613/sample%2520bus-positions.html) you can open and explore. Bokeh will do inline plotting for notebooks like this, but I was having trouble getting it to display, so instead I've embedded an image of the resulting map. # # The code below creates a map centred on Sydney, loads bus_geojson as the data source, and adds a blue circle for each bus. Think about how you could extend this: filtering by route number, adding hover labels with route number, changing colour to represent speed, or using an arrow to display the bus's direction. # In[8]: from bokeh.plotting import save from bokeh.io import output_file from bokeh.models import ( GeoJSONDataSource, GMapPlot, GMapOptions, ColumnDataSource, Circle, DataRange1d, PanTool, WheelZoomTool, BoxSelectTool, HoverTool ) map_options = GMapOptions(lat=-33.87, lng=151.1, map_type="roadmap", zoom=11) geo_source = GeoJSONDataSource(geojson=bus_geojson) circle = Circle(x="x", y="y", size=4, fill_color="blue", fill_alpha=0.8, line_color=None) plot = GMapPlot( x_range=DataRange1d(), y_range=DataRange1d(), map_options=map_options, title="Sydney", plot_width=900, plot_height=900 ) plot.add_glyph(geo_source, circle) plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool()) output_file('bus-positions.html') save(plot) # ![Map of Sydney displaying bus positions](https://cloud.githubusercontent.com/assets/1192790/14663957/944e1e2a-0709-11e6-87f2-6aa91b13ef3d.png "Map of Sydney displaying bus positions") # ## Conclusion # # This is just a brief example of what you can do with a single API. The other TfNSW APIs offer loads of data, far more than I can go into here and which I'm honestly still just exploring. # # The Open Data service may be in its infancy, but is already displaying great promise. I hope once the early bugs and missing/truncated data issues are worked out, that many people will find interesting and useful purposes for the service.