Toggle navigation
JUPYTER
FAQ
View as Slides
View as Code
View on GitHub
Execute on Binder
Download Notebook
wiki_econ_capability
Measuring Editor Collaborativeness WIkimania 2014.ipynb
Notebook
Measuring Editor Collaborativeness With Economic Modelling
¶
##Max Klein [@notconfusing](https://twitter.com/notconfusing) ##Wikimania 2014
Audience Poll
¶
How many people here are:
wikipedia researchers?
familiar with network/graph theory?
sort of understand pagerank?
New Editors
¶
Leave when they encounter uncooperative Wikipedians
Suggestions Need Input
¶
I wanted to make an input-less suggester
So new editors can browse it.
What To Show New Editors?
¶
Can we fit them into the more functional parts of Wikipedia?
Collaboration
¶
What is collaboration on a Wikipedia page?
How do you measure it?
Collaboration
¶
How more contributing editor exeperience effect article quality?
More experience not neccessarily better.
Collaboration Flipside
¶
File this away for later
How does editing more articles effect editor expertise?
Economic modelling?
¶
These two economics papers got me thinking:
The building blocks of economic complexity. Hidalgo & Hausman
A Network Analysis of Countries’ Export Flows: Firm Grounds for the Building Blocks of the Economy
Principle
¶
They both use an
network science
algorithm on a
bi-partite
graph, to
rank countries
economic perfomance.
Key Insight
¶
Lower GDPs
Just Agriculture
Ubiquitous products only
Switzerland
Agriculture & Watches
Ubiquitous and Rare products
So what?
¶
Infer the GDP rankings of the world economy just by knowing
Which countries
export which products
not even the quantities of the exports
It's notconfusing
¶
"Unlike laws and sausages, those who like Wikis and Tofu should inquire into how they are being made."
Bipartite Network
¶
A bi-partite network is where there are two distinct types of nodes in a graph.
In this case, countries and products.
Basically, it's the Page Rank Algroithm
¶
Except
we have
two node-types
And an
extra variable
for improtance of highly connected nodes
I'll explain more later
Lay terms
¶
If you know
who
exports
what
Then you can rank Countries (In economics)
Translations
¶
Instead of
countries
exporting
products
What about
editors
writing
articles
Translations
¶
Instead of
rich countries
Translations
¶
Instead of
rich countries
What about
super users
Translations
¶
Instead of
ubiqitous products
Translations
¶
Instead of
ubiqitous products
What about
highly edited articles
Translations
¶
Instead of
global economy
What about
a wikipedia category
Editor Article Matrix
¶
It's Triangular
the power users are editing most of the articles in the category.
Iterative Algorithm
¶
A nonmathematical explanation:
Imagine everyone in the room starts with £1
Distribute your money evenly to all your friends
Round 2, some people may have more or less than £1
but again distribute all your money evenly to all your friends.
Repeat over and over again.
Eventually converges.
http://www.scottaaronson.com/blog/?p=1820
Iterative Algorithm - One More Variable
¶
Same scenario as above except:
You don't distribute your money evenly
You can give your popular friends
disproportionately larger percentage
:
or
disproportionately
less
.
Iterative Algorithm - Notation
¶
In this experiement those are controlled by
$\alpha$
(article popularity exponent) and
$\beta$
(editor portfolio size exponent) levels.
Editors rise and fall over time
¶
End Result of Algorithm
¶
A ranking for Editors
A ranking for Articles
Exogenous Rankings
¶
Getting unrelated metrics for:
Editors
Articles
Exogenous Editor Rankings
¶
Edit count bad
Use @halfak and @staeiou
"Labour Hours"
Labour Hours: Sum of Edit Sessions
Edit Session: The
start
and
end
times of
all
the
edits
that occur
within 1 hour
of another edit.
Exogenous Article Rankings
¶
Mix of:
ratio of mark-up to readable text
number of headings
article length
citations per article length
outgoing intrawiki links.
Calibration
¶
Find the values of
$\alpha$
and
$\beta$
which
maximize
:
The rank
correlation between model and exogenous
rankings
Calibration on Feminist Writers
¶
High Correlations
¶
We find correlations around
.6 to .9
Even __better than the Economics GDP papers __around
.4
Snapshotting
¶
Took 13 Snapshots of each Category
Rank Accuracy
¶
This really works...
Increases over time
Most collaborative
¶
Question: in which category do power editors improve article quality?
American male novelists
2013 films
American women novelists
Nobel Peace Prize laureates
Sexual acts
Economic theories
Feminist writers
Yoga
Military history of the US
Counterculture festivals
Computability theory
Bicycle parts
Most collaborative
¶
Question: in which category do power editors improve article quality?
Military history of the US
Least collaborative
¶
Question: in which category do power editors hurt article quality?
American male novelists
2013 films
American women novelists
Nobel Peace Prize laureates
Sexual acts
Economic theories
Feminist writers
Yoga
Military history of the US
Counterculture festivals
Computability theory
Bicycle parts
Least collaborative
¶
Question: in which category do power editors hurt article quality?
Sexual acts
Full Category Rankings
¶
Edit Count or Touches
¶
Forest Not Trees
¶
If you accept this $\beta$ measure as a collaborativeness measure how can we use it?
Detect dysfunction
¶
For learning
Arguing is not neccessarily bad.
For intervention?
Detect Where The Wiki is Working
¶
At least where your time invested relates to article quality
Even superlinearly so
A Potential Use
¶
Make a carousel of friendly places for new users
Measuring Editor Collaborativeness With Economic Modelling
¶
Max Klein
@notconfusing
¶
Wikimania 2014
¶