In the previous exercise you learned how to harvest tweets already posted by using the REST api. In this exercise we will continue harvesting tweets posted in (semi) real time. It is just a basic example to get your started
The lines below you know already from the previous excersise:
from twython import TwythonStreamer import string, json, pprint import urllib from datetime import datetime from datetime import date from time import * import string, os, sys, subprocess, time import psycopg2 # get access to the twitter API APP_KEY = APP_SECRET = OAUTH_TOKEN = OAUTH_TOKEN_SECRET = ## just some date and time hack to generate an unique filename if needed output_file = 'result_' + datetime.now().strftime('%Y%m%d-%H%M%S') + '.csv'
The new thing is that we are not going to use the
Twython interface from the library but the
TwythonStreamer interface. In the code below your see a python class (MyStreamer) which inherits from the TwythonStreamer interface.
This class has a number of functions. The main ones are:
on_succes is called when data has been successfully recieved from the stream. The parameter
data (a dictionary thanks to Twython) contains the tweet which you can parse-out the way you did previously.
#Class to process JSON data comming from the twitter stream API. Extract relevant fields class MyStreamer(TwythonStreamer): def on_success(self, data): tweet_lat = 0.0 tweet_lon = 0.0 tweet_name = "" retweet_count = 0 if 'id' in data: tweet_id = data['id'] if 'text' in data: tweet_text = data['text'].encode('utf-8').replace("'","''").replace(';','') if 'coordinates' in data: geo = data['coordinates'] if not geo is None: latlon = geo['coordinates'] tweet_lon = latlon tweet_lat= latlon if 'created_at' in data: dt = data['created_at'] tweet_datetime = datetime.strptime(dt, '%a %b %d %H:%M:%S +0000 %Y') if 'user' in data: users = data['user'] tweet_name = users['screen_name'] if 'retweet_count' in data: retweet_count = data['retweet_count'] if tweet_lat != 0: #some elementary output to console string_to_write = str(tweet_datetime)+", "+str(tweet_lat)+", "+str(tweet_lon)+": "+str(tweet_text) print string_to_write #write_tweet(string_to_write) def on_error(self, status_code, data): print "OOPS FOUTJE: " +str(status_code) #self.disconnect
Ok. To do it nicely in way of Python; below you see the main procedure where the
MyStreamer class is instantiated (with all authentication tokens) and next only capture those tweets within a certain bounding box. Have a look at https://twython.readthedocs.org/en/latest/api.html#streaming-interface for more information on what and how to filter the incoming tweet stream
##main procedure def main(): try: stream = MyStreamer(APP_KEY, APP_SECRET,OAUTH_TOKEN, OAUTH_TOKEN_SECRET) print 'Connecting to twitter: will take a minute' except ValueError: print 'OOPS! that hurts, something went wrong while making connection with Twitter: '+str(ValueError) #global target # Filter based on bounding box see twitter api documentation for more info try: stream.statuses.filter(locations='3.00,50.00,7.35,53.65') except ValueError: print 'OOPS! that hurts, something went wrong while getting the stream from Twitter: '+str(ValueError) if __name__ == '__main__': main()
Ok just for granted. A basic function to write tweets to a file but probably you figured that out yourself.
def write_tweet(t): target = open(output_file, 'a') target.write(t) target.write('\n') target.close()
If you are bored and need a challenge, it would be nice not to write to a dull text file but to a real PostGis database.
It is installed in OSGEO life (see for a quick start with it: http://live.osgeo.org/en/quickstart/postgis_quickstart.html. The cool thing is that you directly can connect to Qgis and/or do spatial queries etc directly on the database
To be able to connect to a PostGIS database you need to import the
psycopg2 library. Below you can see how to make a connection to the database (replace by your own database name etc.). If setting up the connection to the database does not work, have a look at this psycopg2 tutorial or this psycopg2 tutorial with PostreSQL.
try: conn = psycopg2.connect("dbname=GISDB user=postgres password=admin" ) cur = conn.cursor() except DBError: "oops error: "+str(DBError)
Once you have a connection and a cursor to the database you can execute all kind of SQL statements such as:
##morgen even nakijken, nog niet goede format cur.execute("""insert into whatevertable (tweet_id_field, tweet_text_field ...) values ('"""+str(tweet_id)+"""','"""+ tweet_text"""','"""etc)""")
Don't forget to close the connection to the database when you don't need it anymore.