This notebook is a workaround to build indivual page reports. It defines the main functions to display a report (display_report(pagename)
). That allow other notebooks to import and explore the various data about the computed data from wikipedia pages and subsequent relationshipts (networks of users-pages, pages-pages and users-users). It also include synthesis of time wise analysis like page views analytics.
This page is mainly used by the [page explorer](page explorer.ipynb) notebook.
%run "libraries.ipynb"
%config InlineBackend.figure_formats=['svg']
import networkx as nx
from IPython.display import display, HTML
# list of page names
pages = codecs.open("data/pagenames.txt","r", "utf-8-sig").readlines()
pages = map(lambda x: x.strip(), pages)
# page graph obtained by projecting page-editor bi-partite graph
pages_graph = nx.read_gexf("data/pages-linked-by-coeditors.gexf")
# page graph obtained by projecting page-editor bi-partite graph
pages_editors_graph = nx.read_gexf("data/pages-editors.gexf")
def table_to_html(data, cols=[]):
html = "<table>"
html += "<tr>"
for column_content in cols:
html += "<th>%s</th>" % (column_content)
html += "</tr>"
for d in data:
html += "<tr>"
for column_content in d:
html += "<td>%s</td>" % (column_content)
html += "</tr>"
html += "</table>"
return HTML(html)
def display_top_editors(page):
top_editors = pages_editors_graph["p:%s" % (page)]
top_editors = sorted(top_editors.items(), key=lambda x: (-x[1]["revisions"], -pages_editors_graph.node[x[0]]["revisions"]))
# print top_editors
data = []
for name, edits in top_editors[0:10]:
data.append(["<a href=\"http://en.wikipedia.org/wiki/User:{0}\" target=\"_blank\">{0}</a>".format(name.split(":")[1]),
edits["revisions"],
pages_editors_graph.node[name]["revisions"]])
display(table_to_html(data, ["editor name", "edits on that page", "edits over the corpus"]))
display_top_editors("Pi")
editor name | edits on that page | edits over the corpus |
---|---|---|
Noleander | 1005 | 1006 |
Disavian | 134 | 135 |
Arthur Rubin | 111 | 245 |
Henning Makholm | 111 | 134 |
Anythingyouwant | 109 | 237 |
Jitse Niesen | 84 | 405 |
Michael Hardy | 81 | 2879 |
Melchoir | 78 | 127 |
Joseph Lindenberg | 77 | 91 |
Glenn L | 70 | 116 |
def display_pageviews_revisions(page):
pageviews = pd.DataFrame.from_csv("data/pageviews/%s.weekly.csv" % (page))
revisions = pd.DataFrame.from_csv("data/revisions/%s.weekly.csv" % (page))
pageviews.plot(figsize=(12, 2), subplots=False, linewidth="0.5", ylim=0, colormap="Spectral", rot=0)
revisions.plot(figsize=(12, 2), linewidth="0.5", ylim=0)
plt.show()
display_pageviews_revisions("Pi")
def display_local_graph(page):
g1 = nx.read_gexf("data/reading_maps/pages-coedited-reduced-3.gexf")
nbunch = [ page ]
nbunch.extend( list(g1.to_undirected()[page]))
g2 = g1.subgraph(nbunch)
#nx.draw_spring(g2)
pos = nx.spring_layout(g2,iterations=50)
nx.draw_networkx_nodes(g2, pos)
nx.draw_networkx_edges(g2, pos)
nx.draw_networkx_labels(g2, pos)
plt.axis('off')
plt.show()
display_local_graph("Paraboloid")
def display_report(page):
display(HTML("<h2>%s</h2>" % (page)))
#display(HTML("<div style=\"float:left\">"))
display(HTML("<h3>co-edited pages</h3>"))
nb = sorted(pages_graph["p:%s" % (page)].items(),
key=lambda (k,x): -int(x["coeditors"]))
data = []
# calculate rank in neighbor top co-edited ranking
for name, info in nb:
nb_mirror = sorted(pages_graph[name].items(),
key=lambda (k,x): -int(x["coeditors"]))
nb_mirror = [ x[0] for x in nb_mirror ]
editors = pages_editors_graph[name]
info["editors"] = len(editors)
info["exclusive editors"] = len([n for n in editors if len(pages_editors_graph[n]) == 1 ])
info["ranking"] = nb_mirror.index("p:%s" % (page)) + 1
#print nb
for name, info in nb[0:10]:
data.append([ u"<a href=\"http://en.wikipedia.org/wiki/{0}\" target=\"_blank\">{0}</a>".format(name.split(":")[1]),
info["editors"],
info["coeditors"],
float(info["coeditors"]) / float(info["editors"]),
info["exclusive editors"],
info["ranking"]])
display(table_to_html(data, ["page name", "editors", "co-editors", "co-editors/editors", "exclusive editors" ,"ranking"]))
#display(HTML("</div>"))
#display(HTML("<div style=\"float:left\">"))
display(HTML("<h3>ranked first in</h3>"))
nb_list = [ x[0] for x in nb ]
data = []
nb2 = sorted(nb, key=lambda (x): x[1]["ranking"])
for name, info in nb2[0:10]:
editors = pages_editors_graph[name]
info["editors"] = len(editors)
info["exclusive editors"] = len([n for n in editors if len(pages_editors_graph[n]) == 1 ])
data.append([ u"<a href=\"http://en.wikipedia.org/wiki/{0}\" target=\"_blank\">{0}</a>".format(name.split(":")[1]),
info["editors"],
info["coeditors"],
float(info["coeditors"]) / float(info["editors"]),
info["exclusive editors"],
info["ranking"]])
display(table_to_html(data, ["page name", "editors", "co-editors", "co-editors/editors", "exclusive editors" ,"ranking"]))
# display(HTML("</div>"))
# display(HTML("<div style=\"clear:both\"></div>"))
display(HTML("<h3>pageviews and revisions</h3>"))
display_pageviews_revisions(page)
display(HTML("<h3>top editors</h3>"))
display_top_editors(page)
display(HTML("<h3>local subgraph</h3>"))
display_local_graph(page)
#display_report("3-sphere")
if __name__ == "__main__":
display_report("3-sphere")
page name | editors | co-editors | co-editors/editors | exclusive editors | ranking |
---|---|---|---|---|---|
Pi | 1799 | 47 | 0.0261256253474 | 886 | 56 |
Mathematics | 1509 | 33 | 0.0218687872763 | 676 | 86 |
Sphere | 444 | 30 | 0.0675675675676 | 132 | 66 |
Ellipse | 490 | 29 | 0.0591836734694 | 147 | 80 |
Topology | 420 | 28 | 0.0666666666667 | 118 | 80 |
Symmetry | 390 | 28 | 0.0717948717949 | 107 | 83 |
Pythagorean theorem | 947 | 28 | 0.0295670538543 | 330 | 122 |
Möbius transformation | 144 | 26 | 0.180555555556 | 27 | 25 |
Circle | 807 | 25 | 0.0309789343247 | 253 | 120 |
Tetrahedron | 322 | 25 | 0.0776397515528 | 75 | 87 |
page name | editors | co-editors | co-editors/editors | exclusive editors | ranking |
---|---|---|---|---|---|
Burmester's theory | 16 | 4 | 0.25 | 5 | 12 |
Honeycomb (geometry) | 47 | 12 | 0.255319148936 | 4 | 14 |
Roman surface | 40 | 12 | 0.3 | 6 | 17 |
Point groups in three dimensions | 57 | 16 | 0.280701754386 | 7 | 18 |
Regular Polytopes (book) | 16 | 7 | 0.4375 | 0 | 19 |
Oval (projective plane) | 26 | 9 | 0.346153846154 | 3 | 20 |
Toric variety | 25 | 7 | 0.28 | 4 | 21 |
N-sphere | 154 | 25 | 0.162337662338 | 41 | 24 |
Leech lattice | 50 | 13 | 0.26 | 10 | 24 |
Point group | 51 | 10 | 0.196078431373 | 9 | 24 |
editor name | edits on that page | edits over the corpus |
---|---|---|
Fropuff | 22 | 156 |
Cloudswrest | 17 | 41 |
Zundark | 15 | 264 |
Tomruen | 12 | 3860 |
AxelBoldt | 10 | 362 |
Sam nead | 8 | 34 |
Charles Matthews | 4 | 830 |
Rgdboer | 4 | 601 |
AugPi | 4 | 423 |
KSmrq | 3 | 143 |