This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.
First, you need the twitter Python package for this recipe. You can install it with pip install twitter
. (https://pypi.python.org/pypi/twitter)
Then, you need to obtain authentication codes in order to access Twitter data. The procedure is free. In addition to a Twitter account, you also need to create an Application on the Twitter Developers website. Then, you will be able to retrieve the OAuth authentication codes that are required for this recipe. (https://dev.twitter.com/apps)
Note that access to the Twitter API is not unlimited. Most methods can only be called a few times within a given time window. Unless you study small networks or look at small portions of large networks, you will need to throttle your requests. In this recipe, we only consider a small portion of the network, so that the API limit should not be reached. Otherwise, you will have to wait a few minutes before the next time window starts. (https://dev.twitter.com/docs/rate-limiting/1.1/limits)
import math
import json
import twitter
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
twitter.txt
text file in the current folder with the four authentication keys. You will find those in your Twitter Developers Application page, OAuth tool section. (https://dev.twitter.com/apps)(CONSUMER_KEY, CONSUMER_SECRET,
OAUTH_TOKEN, OAUTH_TOKEN_SECRET) = open('twitter.txt', 'r').read().splitlines()
Twitter
instance that will give us access to the Twitter API.auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET,
CONSUMER_KEY, CONSUMER_SECRET)
tw = twitter.Twitter(auth=auth)
twitter
library defines a direct mapping between the REST API and the attributes of the Twitter
instance. Here, we execute the account/verify_credentials
REST request to obtain the user id of the authenticated user (me here, or you if you execute this notebook yourself!).me = tw.account.verify_credentials()
myid = me['id']
def get_followers_ids(uid=None):
# Retrieve the list of followers' ids of the specified user.
return tw.followers.ids(user_id=uid)['ids']
# We get the list of my followers.
my_followers_ids = get_followers_ids()
users/lookup
batch request is limited to 100 users per call, and that only a small number of calls are allowed within a time window, we only look at a subset of all the followers.def get_users_info(users_ids, max=500):
n = min(max, len(users_ids))
# Get information about those users, using batch requests.
users = [tw.users.lookup(user_id=users_ids[100*i:100*(i+1)])
for i in range(int(math.ceil(n/100.)))]
# We flatten this list of lists.
users = [item for sublist in users for item in sublist]
return {user['id']: user for user in users}
users_info = get_users_info(my_followers_ids)
# Let's save this dictionary on the disk.
with open('my_followers.json', 'w') as f:
json.dump(users_info, f, indent=1)
adjacency = {myid: my_followers_ids}
my_followers_python = [user for user in users_info.values()
if 'python' in user['description'].lower()]
my_followers_python_best = sorted(my_followers_python,
key=lambda u: u['followers_count'])[::-1][:10]
The request for retrieving the followers of a given user is rate-limited. Let's check how many calls remaining we have.
tw.application.rate_limit_status(resources='followers') \
['resources']['followers']['/followers/ids']
for user in my_followers_python_best:
# The call to get_followers_ids is rate-limited.
adjacency[user['id']] = list(set(get_followers_ids(
user['id'])).intersection(my_followers_ids))
g = nx.Graph(adjacency)
# We only restrict the graph to the users for which we
# were able to retrieve the profile.
g = g.subgraph(users_info.keys())
# We also save this graph on disk.
with open('my_graph.json', 'w') as f:
json.dump(nx.to_dict_of_lists(g), f, indent=1)
# We remove isolated nodes for simplicity.
g.remove_nodes_from([k for k, d in g.degree().items()
if d <= 1])
# Since I am connected to all nodes, by definition,
# we can remove me for simplicity.
g.remove_nodes_from([myid])
len(g.nodes()), len(g.edges())
# Update the dictionary.
deg = g.degree()
for user in users_info.values():
fc = user['followers_count']
sc = user['statuses_count']
# Is this user a Pythonista?
user['python'] = 'python' in user['description'].lower()
# We compute the node size as a function of the
# number of followers.
user['node_size'] = math.sqrt(1 + 10 * fc)
# The color is function of its activity on Twitter.
user['node_color'] = 10 * math.sqrt(1 + sc)
# We only display the name of the most followed users.
user['label'] = user['screen_name'] if fc > 2000 else ''
draw
function to display the graph. We need to specify the node sizes and colors as lists, and the labels as a dictionary.node_size = [users_info[uid]['node_size'] for uid in g.nodes()]
node_color = [users_info[uid]['node_color'] for uid in g.nodes()]
labels = {uid: users_info[uid]['label'] for uid in g.nodes()}
plt.figure(figsize=(6,6));
nx.draw(g, cmap=plt.cm.OrRd, alpha=.8,
node_size=node_size, node_color=node_color,
labels=labels, font_size=4, width=.1);
You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).
IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).