Using the Twitter STREAMING API

A basic example

In the previous exercise you learned how to harvest tweets already posted by using the REST api. In this exercise we will continue harvesting tweets posted in (semi) real time. It is just a basic example to get your started

The lines below you know already from the previous excersise:

In [ ]:
from twython import TwythonStreamer
import string, json, pprint
import urllib
from datetime import datetime
from datetime import date
from time import *
import string, os, sys, subprocess, time
import psycopg2

# get access to the twitter API

## just some date and time hack to generate an unique filename if needed
output_file = 'result_' +'%Y%m%d-%H%M%S') + '.csv' 

Setting up a streaming class

The new thing is that we are not going to use the Twython interface from the library but the TwythonStreamer interface. In the code below your see a python class (MyStreamer) which inherits from the TwythonStreamer interface.

This class has a number of functions. The main ones are: on_succes and on_error. The on_succes is called when data has been successfully recieved from the stream. The parameter data (a dictionary thanks to Twython) contains the tweet which you can parse-out the way you did previously.

In [6]:
#Class to process JSON data comming from the twitter stream API. Extract relevant fields
class MyStreamer(TwythonStreamer):
    def on_success(self, data):
         tweet_lat = 0.0
         tweet_lon = 0.0
         tweet_name = ""
         retweet_count = 0

         if 'id' in data:
               tweet_id = data['id']
         if 'text' in data:
               tweet_text = data['text'].encode('utf-8').replace("'","''").replace(';','')
         if 'coordinates' in data:    
               geo = data['coordinates']
               if not geo is None:
                    latlon = geo['coordinates']
                    tweet_lon = latlon[0]
                    tweet_lat= latlon[1]
         if 'created_at' in data:
                    dt = data['created_at']
                    tweet_datetime = datetime.strptime(dt, '%a %b %d %H:%M:%S +0000 %Y')

         if 'user' in data:
                    users = data['user']
                    tweet_name = users['screen_name']

         if 'retweet_count' in data:
                    retweet_count = data['retweet_count']
         if tweet_lat != 0:
                    #some elementary output to console    
                    string_to_write = str(tweet_datetime)+", "+str(tweet_lat)+", "+str(tweet_lon)+": "+str(tweet_text)
                    print string_to_write
    def on_error(self, status_code, data):
         print "OOPS FOUTJE: " +str(status_code)

Fiitering the stream

Ok. To do it nicely in way of Python; below you see the main procedure where the MyStreamer class is instantiated (with all authentication tokens) and next only capture those tweets within a certain bounding box. Have a look at for more information on what and how to filter the incoming tweet stream

In [ ]:
##main procedure
def main():
        print 'Connecting to twitter: will take a minute'
    except ValueError:
        print 'OOPS! that hurts, something went wrong while making connection with Twitter: '+str(ValueError)
    #global target
    # Filter based on bounding box see twitter api documentation for more info
    except ValueError:
        print 'OOPS! that hurts, something went wrong while getting the stream from Twitter: '+str(ValueError)

if __name__ == '__main__':

Ok just for granted. A basic function to write tweets to a file but probably you figured that out yourself.

In [ ]:
def write_tweet(t):
    target = open(output_file, 'a')

Beyond the basics

If you are bored and need a challenge, it would be nice not to write to a dull text file but to a real PostGis database.

It is installed in OSGEO life (see for a quick start with it: The cool thing is that you directly can connect to Qgis and/or do spatial queries etc directly on the database

To be able to connect to a PostGIS database you need to import the psycopg2 library. Below you can see how to make a connection to the database (replace by your own database name etc.). If setting up the connection to the database does not work, have a look at this psycopg2 tutorial or this psycopg2 tutorial with PostreSQL.

In [ ]:
    conn = psycopg2.connect("dbname=GISDB user=postgres password=admin" )
    cur = conn.cursor()
except DBError:
    "oops error: "+str(DBError)

Once you have a connection and a cursor to the database you can execute all kind of SQL statements such as:

In [ ]:
##morgen even nakijken, nog niet goede format

cur.execute("""insert into whatevertable (tweet_id_field, tweet_text_field ...) 
            values ('"""+str(tweet_id)+"""','"""+ tweet_text"""','"""etc)""")

Don't forget to close the connection to the database when you don't need it anymore.