Below is a who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin OTC. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin OTC rate other members in a scale of -10 (total distrust) to +10 (total trust) in steps of 1.
Let's see if we can produce some meaningful insights by visualizing a snapshot of the network.
One month sample is used with a broad range of ratings.
Uses a multi-edged directed graph to display ratings. Fruchterman Reingold force-directed algorithm was used for layout.
Weighted Signed Directed Bitcoin OTC web of trust network
http://snap.stanford.edu/data/soc-sign-bitcoinotc.html
Each line has one rating, sorted by time, with the following format:
SOURCE, TARGET, RATING, TIME
import os
import requests
import pandas as pd
url = 'https://snap.stanford.edu/data/soc-sign-bitcoinotc.csv.gz'
fname = os.path.basename(url)
if not os.path.isfile(fname):
print(f"Downloading...\n{url}")
r = requests.get(url, allow_redirects=True)
open(fname, 'wb').write(r.content)
else:
print(f"Already downloaded:\n{url}")
# parse the file
cols = ['source', 'target', 'rating', 'time']
df = pd.read_csv(fname, names=cols, header=None)
df['time'] = pd.to_datetime(df.time * 1e9)
df = df.set_index('time')
df.shape
Already downloaded: https://snap.stanford.edu/data/soc-sign-bitcoinotc.csv.gz
(35592, 3)
What's the distribution of values look like month-to-month? Can we subsample to visualize a reasonable number of nodes?
# ratings overview by month
df.resample('M').rating.value_counts().unstack().sample(5)
rating | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | ||||||||||||||||||||
2013-01-31 | 59.0 | 1.0 | NaN | NaN | NaN | 10.0 | 2.0 | 4.0 | 6.0 | 15.0 | 457.0 | 147.0 | 76.0 | 40.0 | 27.0 | 10.0 | 4.0 | 15.0 | 2.0 | 21.0 |
2014-10-31 | 3.0 | NaN | NaN | 2.0 | NaN | 2.0 | NaN | NaN | 3.0 | 10.0 | 120.0 | 40.0 | 12.0 | 11.0 | 17.0 | 3.0 | 2.0 | 3.0 | NaN | 2.0 |
2012-10-31 | 134.0 | 1.0 | 2.0 | 1.0 | NaN | 10.0 | NaN | 18.0 | 1.0 | 19.0 | 609.0 | 146.0 | 57.0 | 26.0 | 26.0 | 9.0 | 7.0 | 1.0 | 3.0 | 18.0 |
2012-02-29 | 7.0 | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | 2.0 | 5.0 | 288.0 | 107.0 | 30.0 | 15.0 | 21.0 | 1.0 | 5.0 | 3.0 | NaN | 5.0 |
2016-01-31 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 17.0 | 13.0 | NaN | 4.0 | 4.0 | 1.0 | 1.0 | NaN | NaN | NaN |
# later adoption years have a broader range of values
# this month has a good spread of ratings
df_ = df.loc['2014-03']
print(f"Ratings: {len(df_)}")
df_.sample(10)
Ratings: 330
source | target | rating | |
---|---|---|---|
time | |||
2014-03-26 23:58:48.204959744 | 4244 | 4235 | -5 |
2014-03-15 02:53:26.228549888 | 5460 | 4649 | 1 |
2014-03-15 06:24:29.530480128 | 5403 | 5422 | 1 |
2014-03-20 22:16:47.716380160 | 35 | 5468 | 1 |
2014-03-31 15:20:48.911369984 | 5484 | 5363 | -10 |
2014-03-16 20:39:53.108669952 | 4291 | 3919 | 1 |
2014-03-14 02:17:13.370089984 | 1437 | 775 | 1 |
2014-03-13 20:31:22.824110080 | 2125 | 4833 | 2 |
2014-03-07 16:04:25.605910016 | 1565 | 905 | -1 |
2014-03-28 20:22:10.796990208 | 5479 | 5483 | 1 |
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
A directed graph class that can store multiedges. Multiedges are multiple edges between two nodes. Each edge can hold optional data or attributes.
https://networkx.github.io/documentation/stable/reference/classes/multidigraph.html
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import networkx as nx
import math
DG = nx.MultiDiGraph()
edges = [(t.source, t.target, float(t.rating)) for t in df_.itertuples()]
DG.add_weighted_edges_from(edges)
print(f"Nodes: {DG.number_of_nodes()}")
print(f"Edges: {DG.number_of_edges()}")
Nodes: 211 Edges: 330
We expect trusted nodes to transact more with each other, appear larger, and have on average more positive reviews and thusshorter edges. These should cluster together.
Nodes with lower trust should appear smaller and have longer edges. With the force-directed algorithm, these should be pushed away from trusted clusters.
# build a lookup of review counts and average trust rating
review_counts = df_.groupby('target').rating.count()
average_reviews = df_.groupby('target').rating.mean()
def get_size(user_id, m=200, min=300):
s = review_counts.get(user_id)
if s is not None:
return min * s
else:
return min
def get_color(user_id):
s = average_reviews.get(user_id)
if s is None:
return 0.5
# need a scaling function to translate -10-10 to 0.0-1.0 for colors maps
return np.interp(s, (-10, 10), (0, 1))
plt.figure(figsize=(14, 14))
plt.title("Bitcoin Who-Trusts-Whom Network (1-Month Sample)", fontsize=18)
untrusted = mpatches.Patch(color='red', label='Average Negative Reputation')
trusted = mpatches.Patch(color='green', label='Average Positive Reputation')
size = mpatches.Patch(color='white', label='Node Size - Rating Count')
edges = mpatches.Patch(color='white', label='Edge Length - Mistrust of Reviewing Node')
plt.legend(handles=[trusted, untrusted, size, edges], loc='lower right')
pos = nx.spring_layout(DG, k=0.25)
sizes = [get_size(n) for n in DG]
colors = [get_color(n) for n in DG]
nc = nx.draw_networkx_nodes(
DG, pos, nodelist=DG.nodes(), node_size=sizes, linewidths=2.0,
node_color=colors, cmap=plt.cm.RdYlGn, alpha=0.8
)
ec = nx.draw_networkx_edges(DG, pos, arrows=True, alpha=0.08)
ax = plt.axis('off')
plt.show()