Using the Twitter STREAMING API

A basic example

In the previous excersise you learned how to harvest tweets allready posted by using the REST api. In this exersise we will continue harvesting tweet posted in (semi) real time. It it just a basic example to get your started

The lines below you know allready from the previous excersise

In [ ]:
from twython import TwythonStreamer
import string, json, pprint
import urllib
from datetime import datetime
from datetime import date
from time import *
import string, os, sys, subprocess, time
import psycopg2

# get access to the twitter API

## just some date and time hack to generate an unique filename if needed
output_file = 'result_''%Y%m%d-%H%M%S')+'.csv' 

Setting up a streaming class

The new thing is that we not going to use the Twython interface from the library but the TwythonStreamer interface. In the code below your see a python class (MyStreamer) which inherits from the TwytonStreamer interface.

This class has a number of function. The main ones are: on_succes and on_error. The on_succes is called when data has been successfully recieved from the stream. the parameter data (a dictionary thanks to Twython) contains the tweet which you can parse-out the way you did previously.

In [6]:
#Class to process JSON data comming from the twitter stream API. Extract relevant fields
class MyStreamer(TwythonStreamer):
    def on_success(self, data):
         tweet_lat = 0.0
         tweet_lon = 0.0
         tweet_name = ""
         retweet_count = 0

         if 'id' in data:
               tweet_id = data['id']
         if 'text' in data:
               tweet_text = data['text'].encode('utf-8').replace("'","''").replace(';','')
         if 'coordinates' in data:    
               geo = data['coordinates']
               if not geo is None:
                    latlon = geo['coordinates']
                    tweet_lon = latlon[0]
                    tweet_lat= latlon[1]
         if 'created_at' in data:
                    dt = data['created_at']
                    tweet_datetime = datetime.strptime(dt, '%a %b %d %H:%M:%S +0000 %Y')

         if 'user' in data:
                    users = data['user']
                    tweet_name = users['screen_name']

         if 'retweet_count' in data:
                    retweet_count = data['retweet_count']
         if tweet_lat != 0:
                    #some elementary output to console    
                    string_to_write = str(tweet_datetime)+", "+str(tweet_lat)+", "+str(tweet_lon)+": "+str(tweet_text)
                    print string_to_write
    def on_error(self, status_code, data):
         print "OOPS FOUTJE: " +str(status_code)

Fiitering the stream

Ok. To do it nicely the Python way; below you see the main procedure where the MyStreamer class is instatiated (with all authentication tokens) and next only capture those tweets within a certain bounding box. Have a look at for more information on what and how to filter the incomming tweet stream

In [ ]:
##main procedure
def main():
        print 'Connecting to twitter: will take a minute'
    except ValueError:
        print 'OOPS! that hurts, something went wrong while making connection with Twitter: '+str(ValueError)
    #global target
    # Filter based on bounding box see twitter api documentation for more info
    except ValueError:
        print 'OOPS! that hurts, something went wrong while getting the stream from Twitter: '+str(ValueError)

if __name__ == '__main__':

Ok just for granted. A basic function to write tweets to a file but probably you figured out yourself.

In [ ]:
def write_tweet(t):
    target = open(output_file, 'a')

Beyond the basics

If you are bored and need a challenge it would be nice not to write to a dull text file but to a real PostGis database.

It is installed in OSGEO life (see for a quick start with it: The cool thing is is ofcourse that you directly can connect to Qgis and/or do spatial queries etc directly on the database

To be able to connect to a PostGIS database you need to import the psycopg2 library Below you can see how to make a connection to the database (replace by your own database name etc.)

In [ ]:
    conn = psycopg2.connect("dbname=GISDB user=postgres password=admin" )
    cur = conn.cursor()
except DBError:
    "oops error: "+str(DBError)

Once you have a connection and a cursor to the database you can execute all kind of SQL statemets such as

In [ ]:
##morgen even nakijken, nog niet goede format

cur.execute("""insert into whatevertable (tweet_id_field, tweet_text_field ...) 
            values ('"""+str(tweet_id)+"""','"""+ tweet_text"""','"""etc)""")

And don forget to close the connection to the database when you don't need it anymore