How to access realtime bus positions from Transport for NSW

Transport for NSW (TfNSW) has very recently launched its Open Data website providing API access to its many static and realtime data sets. This notebook demonstrates how to access the current position of all tracked buses on the Sydney network.

Some familiarity with the Python programming language and HTTP requests will be helpful.

Getting access to the Open Data site

Creating your application

An application is a bunch of settings that govern how you can interact with the various TfNSW APIs. For this demonstration, add a new application. Here are the basic steps.

  1. Information tab
    • Add a name and description.
  2. API Management tab
    • Click Add next to Public Transport - Realtime - Vehicle Positions.
    • Accept the Terms and Conditions.
  3. Auth
    • You must provide a Callback/Redirect URL (this could even be http://127.0.0.1 I believe).
    • Your scope must be user.
    • Your application type must be Confidential.
    • Click Create.

(By now you have probably clicked something and been redirected back to the Dashboard page instead of your destination. This is happening to a lot of people at the moment (20 April) and TfNSW is working on it. All you can do is retrace your steps and try again.)

Getting authorised

Now that your application is created, you'll need to select Edit from the Actions dropdown menu next to your application's name. Go to the Auth tab and copy down the API Key and Shared Secret.

Every API request needs to be authorised to a particular application. For our purposes, the simplest way to do this is to pass TfNSW our API key and shared secret, and have them generate a token we can use to authorise our requests.

As detailed on the API basics page, we can get a bearer token back in response to a POST request like:

https://apikey:[email protected]/auth/oauth/v2/token?grant_type=client_credentials&scope=user

We'll use Python's Requests for this:

In [1]:
import requests

# Replace with your own information:
api_key = '17xx11111112222222333333344444455555'         
shared_secret = 'a1b2c3d4e5f6b9999988877766655544'            
# Leave these:
payload = 'grant_type=client_credentials&scope=user'
auth_url = 'api.transport.nsw.gov.au/auth/oauth/v2/token'

# Send a POST request to get the token back:
token_response = requests.post(('https://{}:{}@{}').format(api_key, shared_secret, auth_url), params=payload)

TfNSW will send back something like this:

{u'access_token': u'e1311756-ed35-456d-9f68-d5d970df2c2d',
 u'expires_in': 3600,
 u'scope': u'user',
 u'token_type': u'Bearer'}

The access_token is what we need. Our requests have to have an HTTP header named Authorization with the value Bearer e1311756-ed35-456d-9f68-d5d970df2c2d (or whatever your bearer token is). So let's create that header now:

In [2]:
bearer_token = "Bearer " + token_response.json()['access_token']
print(bearer_token)

# Set the headers for our next request:
headers = {"Authorization":bearer_token}
Bearer e1311756-ed35-456d-9f68-d5d970df2c2d

Getting the bus data

Here's where we make the actual request. As we can see from the API explorer, we're sending our request to:

https://api.transport.nsw.gov.au/v1/gtfs/vehiclepos/buses

Depending on server load, this request may take some time to fulfill:

In [3]:
bus_positions = requests.get('https://api.transport.nsw.gov.au/v1/gtfs/vehiclepos/buses', headers=headers)
print("Retrieved {} bytes").format(len(bus_positions.content))
Retrieved 186739 bytes

Parsing the response

Realtime bus data comes back in a binary (machine-readable) format called GTFS-realtime. This makes perfect sense but can be a bit of a bugger to deal with. You'll need the following Python libraries:

Just calling pip install gtfs-realtime-bindings protobuf worked for me.

Let's import the GTFS-realtime library, then create a new FeedMessage object (note: this step could fail if the API returns a truncated response, which happens sometimes):

In [4]:
from google.transit import gtfs_realtime_pb2
feed = gtfs_realtime_pb2.FeedMessage()
feed.ParseFromString(bus_positions.content)

We can now look in the feed object, which contains entities that are the individual vehicles. Let's take the first five entities and list their latitude, longitude, bearing (direction) and speed:

In [5]:
for entity in feed.entity[:5]:
    print(entity.vehicle.position.latitude,
          entity.vehicle.position.longitude,
          entity.vehicle.position.bearing,
          entity.vehicle.position.speed
         )
(-33.730384826660156, 150.97987365722656, 249.0, 1.2000000476837158)
(-33.766117095947266, 151.01039123535156, 253.0, 22.200000762939453)
(-33.759037017822266, 151.04502868652344, 83.0, 19.200000762939453)
(-34.049461364746094, 150.76402282714844, 129.0, 0.0)
(-34.049415588378906, 150.75608825683594, 204.0, 9.800000190734863)

Field reference

The feed only contains vehicle positions, but other GTFS-realtime feeds can also include trip updates and alerts. Here's a more complete list of fields and what they contain in the context of buses (though you should refer to the GTFS-realtime specification for full details):

  • entity.id: The vehicle's feed-unique identifying code
  • entity.vehicle.timestamp: Moment at which the vehicle's position was measured
  • entity.vehicle.trip.route_id: The bus's route. The digits after the underscore are the route number, e.g. 2441_M50. Corresponds with a separate GTFS list of routes.
  • entity.vehicle.trip.trip_id: A trip usually along a route. Corresponds with a separate GTFS list of trips.
  • entity.vehicle.trip.start_time: The scheduled start time.
  • entity.vehicle.trip.start_date: The scheduled start date.
  • entity.vehicle.position.latitude: Latitude last reported.
  • entity.vehicle.position.longitude: Longitude last reported.
  • entity.vehicle.position.bearing: The vehicle direction, in degrees east of true north.
  • entity.vehicle.position.speed: The instantaneous vehicle speed, in meters per second (not kph).
  • entity.vehicle.trip.schedule_relationship: Whether the vehicle is a regular service (0), added (1), unscheduled (2) or cancelled (3).
  • entity.vehicle.congestion_level: How bad the traffic around this vehicle is. (I'm not sure whether this field is actually used; even in high-traffic times it seems to be 1 (RUNNING_SMOOTHLY).) Full values here.
  • entity.vehicle.occupancy_status: How full the vehicle is, from 0 (empty) to 7 (not accepting passengers).

Exporting to a CSV

Let's put the feed into a file we can use elsewhere:

In [6]:
import csv
positions_output = [] # a list of lists for bus position data

# put each bus's key data into the list
for entity in feed.entity:
    positions_output.append([entity.vehicle.timestamp,
                             entity.id,
                             entity.vehicle.trip.route_id, 
                             entity.vehicle.trip.trip_id,
                             entity.vehicle.trip.start_time,
                             entity.vehicle.trip.start_date,
                             entity.vehicle.position.latitude, 
                             entity.vehicle.position.longitude,
                             entity.vehicle.position.bearing,
                             entity.vehicle.position.speed,
                             entity.vehicle.position.speed*3.6, #speed in km/h, for convenience
                             entity.vehicle.trip.schedule_relationship, 
                             entity.vehicle.congestion_level,
                             entity.vehicle.occupancy_status,
                             entity.vehicle.trip.route_id[5:] # extracting the route number with string slicing
                            ])

# write the bus position data to the positions.csv
with open("positions.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerow(['timestamp','vehicle_id','route_id','trip_id',
                     'start_time','start_date','latitude','longitude',
                     'bearing','speed_ms','speed_kmh', 'schedule_relationship','congestion_level',
                     'occupancy_status','route_number'])
    writer.writerows(positions_output)

Visualising data

Having CSV data is great, but we can go a step further and add the vehicles to a map directly. For this I'm using the Bokeh module (pip install bokeh) and adapting the Mapping Geo Data tutorial.

Bokeh works with GeoJSON data, so we'll have to convert our list of buses to a GeoJSON object, e.g.:

{
  "type": "FeatureCollection",
  "features": [
    {
      "geometry": {
        "type": "Point",
        "coordinates": [
          150.97987365723,
          -33.73038482666
        ]
      },
      "type": "Feature",
      "properties": {
        "route": "619"
      }
    }
  ]
}

We'll create a feature_collection dictionary and add each bus as a point, with its route as a property, then convert that dictionary to a JSON string:

In [7]:
# create the dictionary
feature_collection = {}
feature_collection['type'] = "FeatureCollection"
feature_collection['features'] = []
for bus in positions_output:
    if bus[7] > 0: # included to filter out any buses which report their position as [0,0]
        this_object = {"type":"Feature",
                       "geometry":{"type":"Point",
                                   "coordinates": [bus[7], bus[6]]
                                  },
                       "properties": {"route":bus[14]}
                      }
        feature_collection['features'].append(this_object)

# dump the dictionary to a json string
import json
bus_geojson = json.dumps(feature_collection)

print('Result: "{}..."'.format(bus_geojson[:200])) # print a little bit of the result; you could also write to a file
Result: "{"type": "FeatureCollection", "features": [{"geometry": {"type": "Point", "coordinates": [150.97987365722656, -33.730384826660156]}, "type": "Feature", "properties": {"route": "619"}}, {"geometry": {"..."

Now let's plot the bus_geojson object to Google Maps. The following block will save an HTML file like this one you can open and explore. Bokeh will do inline plotting for notebooks like this, but I was having trouble getting it to display, so instead I've embedded an image of the resulting map.

The code below creates a map centred on Sydney, loads bus_geojson as the data source, and adds a blue circle for each bus. Think about how you could extend this: filtering by route number, adding hover labels with route number, changing colour to represent speed, or using an arrow to display the bus's direction.

In [8]:
from bokeh.plotting import save
from bokeh.io import output_file
from bokeh.models import ( GeoJSONDataSource, GMapPlot, GMapOptions, ColumnDataSource, Circle, 
                           DataRange1d, PanTool, WheelZoomTool, BoxSelectTool, HoverTool
                         )

map_options = GMapOptions(lat=-33.87, lng=151.1, map_type="roadmap", zoom=11)

geo_source = GeoJSONDataSource(geojson=bus_geojson)
circle = Circle(x="x", y="y", size=4, fill_color="blue", fill_alpha=0.8, line_color=None)
plot = GMapPlot(
    x_range=DataRange1d(), y_range=DataRange1d(), map_options=map_options, title="Sydney",
    plot_width=900, plot_height=900
)
plot.add_glyph(geo_source, circle)
plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool())
output_file('bus-positions.html')
save(plot)

Map of Sydney displaying bus positions

Conclusion

This is just a brief example of what you can do with a single API. The other TfNSW APIs offer loads of data, far more than I can go into here and which I'm honestly still just exploring.

The Open Data service may be in its infancy, but is already displaying great promise. I hope once the early bugs and missing/truncated data issues are worked out, that many people will find interesting and useful purposes for the service.