Twitter search stream

Using: Python Twitter Tools imports
In [1]:
from time import time
from twitter import Twitter, TwitterStream, OAuth
import re
Some regular expressions to remove urls, users and hashtags (can exclude if wanted)
In [2]:
http_pattern = re.compile('(https?://\S+)', re.IGNORECASE)
user_pattern = re.compile('(@\S+)', re.IGNORECASE)
hash_pattern = re.compile('(#\S+)', re.IGNORECASE)
The tokens are necessary for user authentication, you will need to create an application on Once you have the tokens... replace XXXXXX below. Function returning an OAuth object
In [3]:
def twitter_auth():
    # these tokens are necessary for user authentication
    # (created within the twitter developer API pages)
    consumer_key = "XXXXXX"
    consumer_secret = "XXXXXXXXXXXX"
    access_key = "XXXXXXXXXXXXXXXXXX"
    access_secret = "XXXXXXXXXXXXXXXXXX"

    # create twitter API object
    auth = OAuth(access_key, access_secret, consumer_key, consumer_secret)
    return auth
Clean the text part of the tweet json string
In [4]:
def clean_tweet(status):
    tweet = status['text']
    tweet = re.sub(http_pattern, '', tweet)
    tweet = re.sub(user_pattern, '', tweet)
    tweet = re.sub(hash_pattern, '', tweet)    
    #replace new lines, tabs, spaces with a space
    tweet = tweet.replace('\n', ' ')
    tweet = tweet.replace('\t', ' ')
    tweet = tweet.replace('\\s+', ' ')
    #can contain some weird char
    tweet = tweet.encode('ascii', 'ignore')
    return tweet
Create a twitter stream, search the statuses for you search terms (can use upto 400, before having to chat to twitter)
In [5]:
def download_stream(search_terms, filename,search_time=30):
    stream = TwitterStream(auth=twitter_auth(), secure=True)
    tweet_iter = stream.statuses.filter(track=search_terms)
    end_time = time() + search_time
    #open for append, so it can be run over and over
    with open(filename, "a") as tfile:
        for itweet in tweet_iter:
            #I only wanted english tweets
            if itweet['lang'] == 'en':
                tweet = clean_tweet(itweet)
                if len(tweet) == 0:
                tfile.write(tweet + '\n')
                print tweet
                if time() > end_time:
                    print "... done ... ran for %i seconds ..." % search_time
In [6]:
search_str = ("python,java,artificial intelligence,machine learning,data mining,programming,software,"
               "software development,software design,natural language processing,linear regression,"
               "deep learning,k-means,open source,source code,api,web service,matplotlib,scikit-learn"
               "ML,ensemble learning,clustering,classification,SVM,nearest neighbors,random forest,PCA,"
               "decision tree,image recognition")
download_stream(search_str, 'test.txt', search_time=10)
 I love it.. it has the blackberry software so it can be controlled through active directory.. BYOD is killing IT people
RT  The NSA has used covert relationships with tech companies to insert vulnerabilities into consumer security software http:/
can use it to manage your installed andor running software program
[GET] Quick Deals Generator   Software Lets You Create Deal Pages In Less Than 5 Minutes! DOWNLOAD 
 [GET] Quick Deals Generator   Software Lets You Create Deal Pages In Less Than 5 Minutes! DOWNLOAD
RT  Unitrends is looking for: Software Developer (User Interface)  
DJ Facebook Chatrooms Chat Software
... done ... ran for 10 seconds ...