Answering Big Questions with Wikidata

Wikiconference USA Workshop

Friday 30 May 2014

By Max Klein

Twitter @notconfusing

Blog notconfusing.com

Slides http://tiny.cc/answerswikidata

Overview

  1. Motivate
  2. Examples
  3. The shape of Wikidata data.
  4. Extended walkthrough.
    1. Dependencies - Technological and Otherwise.
    2. Wikidata Toolkit
    3. Export and a Python munging pipeline.
    4. Future Directions

Audience Gauge

Who's comfortable with Wikidata as a concept?

Who's comfortable with Wikidata's underlying structure?

Who's comfortable looking at code examples?

Who's ready to start hacking in this session?

Motivations and Past Examples

Answer BIG questions‽

  • Interwiki Comparison
    • Compare coverage between Wikis
      • Which is the most "unique" Wikipedia?
      • Which is the least sex-biased Wikipedia?
  • Aggregate "wikiverse" modelling.
    • The word for every language in every language?
  • New types of Queries
    • How rich is our data?
    • What does a map of all the subways in the world look like?

Interwiki Comparison

The most unique Wikipedias

Interwiki Comparison

The least sex-biased Wikipedias

Aggregate Wikiverse Modelling

The word for every language in every language?

Reminder. It's not just Wikipedias, but Commons, Wikisource, Wikivoyage (and more to come.)

New Types of Queries

What does the world look like according to data? Attribution Denny Vrandećič

Wikidata - The Data-en-ing

"Wikidata data is complex." ~ Markus Krötzsch

Each item includes:

  • Labels
  • Descriptions
  • Aliases
  • Sitelinks
  • Properties

Wikidata the Data-en-ing

These are the less semantic properties.

  • Labels
    • What the thing is called (in every language)
  • Descriptions
    • What the thing is, to distinguish if multiple things have the same label.
    • e.g. Chinatowns, but in different cities.
    • (No more disambiguation (with parens (in titles))) (in every language).
  • Aliases
    • Other labels that might refer to this item.
    • Charles Babbage / Lewis Carroll.
  • Sitelinks
    • The Wikipedia articles in different languages that are associated with this item.
    • Now also connecting Wikivoygae, Commons, and Wikisource.
    • And this also shows Featured article badges.

Wikidata the Data-en-ing

Properties. Prepare for the semantics.

So the triple reads:

[This item / page] [property] [value]

  • Properties
    • They can be many.
      • Not necessarily agreeing
    • Each has a value
      • Which have different datatypes
      • Which have references
        • Which are also triples
          • Which are read, this predicate, [property] [value]
            • E.g. this predicate, stated in, a reputable journal.
    • And can also be prefered or deprecated.
      • Known as the "snak"

Wikidata Toolkit - The Need

  • We can use the Javascript API - in fact that's what the interfact uses.

  • But what about if you want lots of data?

    • Maybe all the data‽
  • Then you just have undocumented XML dumps.

    • Which is where Wikidata Toolkit comes in.

Wikidata Toolkit

  • its from an Individual Engagement Grant
  • Principal investigator is Markus Krötzsch
  • The point is WDTK gives you Wikidata as Java objects
  • Why Java?
    • Because its for researchers
    • And not necessarily Wikimedia hackers

Wikidata Toolkit

Features

  • Uses daily incremental dumps
    • Fresh enough
  • Local, in-memory processing
    • Fast
  • Offline mode
    • As stale as you like

Wikidata Toolkit

Progress

  • Started February 2014
    • I used to use this python script that was slow and polluted this time in 2013
  • 2 programmers on the project
  • Currently at 0.1.0
    • Next version's focus is on serialization.
  • Can already get RDF dumps at tools.wmflabs.org/wikidatardfdumps (check link)
    • Considered "derived dumps"
    • have simple-statements
    • includes subclass-of and instance-of.

Wikidata Toolkit

"Standard Queiries"

There are no real 'standard queries'.

So you could also imagine queries like, "which sources do I have to believe in order to beleive this statement."

Don't stifle creativity.

So in order to support "tree shaped regular conjunctive path queries", and "star-shaped" queries you get the freedom of the programming language rather than SQL-like chains.

How many people came here expecting to look at code today?

To follow along, start downloading these packages.

Additional assumptions

  • You know what Wikidata is, in general.

Wikidata Toolkit Example

go over all pages. If it has this property, bin it in two dimensions. By the way, I failed java classes twice in University, each time ending my formal education in programming.

I understand this is the worst way to do it.

Modify DumpProcessingExample

Just need minimally to edit one class which is the ItemStatisticsProcessor in DumpProcessingExample

    static class ItemStatisticsProcessor implements EntityDocumentProcessor {

        long countItems = 0;

        HashMap<String,Integer> lang_sexes = new HashMap<String,Integer>(); 


        @Override
        public void processItemDocument(ItemDocument itemDocument) {
            this.countItems++;
            for (StatementGroup sg : itemDocument.getStatementGroups()) {
                for (Statement si: sg.getStatements()) {
                    String PID = si.getClaim().getMainSnak().getPropertyId().getId().toString();
                    if (PID.equals("P21")) {
                        for (String lang_string : itemDocument.getSiteLinks().keySet()) { 
                            /* should do this a better way at some point*/

                            String ms = si.getClaim().getMainSnak().toString();
                            String[] parts = ms.split("http://www.wikidata.org/wiki/Wikidata:Main_Page/");
                            String VID = parts[2].substring(0, parts[2].length()-1);
                            String lang_sex_key = lang_string + "--" + VID;
                            if (this.lang_sexes.get(lang_sex_key) != null ) {
                                this.lang_sexes.put(lang_sex_key, this.lang_sexes.get(lang_sex_key) + 1 );
                            }
                            else{
                                this.lang_sexes.put(lang_sex_key, 1);
                            }
                        }
                    }
                }
            }

Output to JSON

There's actually some more you need to edit to get the json out, but I'll let you see my document at this github link

Here's one I made earlier...

Ahh lovely json and python

Start the live high-wire demo...

In [3]:
import json
from collections import defaultdict
import pandas as pd
import pywikibot
import decimal
NOPLACES = decimal.Decimal(10) ** 0
TWOPLACES = decimal.Decimal(10) ** -2
%pylab inline
Populating the interactive namespace from numpy and matplotlib
In [4]:
jsonfile = open('lang_sex.json','r')
bigdict = json.load(jsonfile)
lang_sex = defaultdict(dict)
for keystring, count in bigdict.iteritems():
    lang, sex = keystring.split('--')
    lang_sex[lang][sex] = count
In [5]:
sex_df = pd.DataFrame.from_dict(lang_sex, orient='index')
sex_df = sex_df.fillna(value=0.0)
sex_df
Out[5]:
Q43445 Q1097630 Q746411 Q639354 Q1052281 Q44148 Q6581097 Q6581072 Q2449503 Q48270 Q8441 Q658
abwiki 0 0 0 0 0 0 58 7 0 0 0 0
acewiki 0 0 0 0 0 0 179 34 0 0 0 0
afwiki 0 0 0 1 0 1 3066 402 0 0 0 0
afwikiquote 0 0 0 0 0 0 88 5 0 0 0 0
akwiki 0 0 0 0 0 0 13 2 0 0 0 0
alswiki 0 0 0 1 0 0 1567 196 0 0 0 0
amwiki 0 0 0 0 0 0 483 56 0 0 0 0
angwiki 0 0 0 0 0 0 205 52 0 0 0 0
anwiki 0 0 0 0 0 0 2881 575 0 0 0 0
arcwiki 0 0 0 0 0 0 90 12 0 0 0 0
arwiki 1 1 0 1 2 2 21320 3868 0 1 0 0
arwikiquote 0 0 0 0 0 0 154 11 0 0 0 0
arwikisource 0 0 0 0 0 0 15 2 0 0 0 0
arzwiki 0 0 0 0 0 0 1692 682 1 0 0 0
astwiki 0 0 0 0 0 0 1224 228 0 0 0 0
aswiki 0 0 0 0 0 0 175 51 0 0 0 0
avwiki 0 0 0 0 0 0 29 0 0 0 0 0
aywiki 0 0 0 0 0 0 187 17 0 0 0 0
azwiki 1 1 0 0 1 1 4747 1002 0 0 0 0
azwikiquote 0 0 0 0 0 0 454 30 0 0 0 0
azwikisource 0 0 0 0 0 0 6 0 0 0 0 0
barwiki 0 0 0 0 0 0 836 134 0 0 0 0
bat_smgwiki 0 1 0 0 0 0 638 123 0 0 0 0
bawiki 0 0 0 0 0 0 337 31 0 0 0 0
bclwiki 0 1 0 0 0 0 414 74 0 0 0 0
be_x_oldwiki 0 0 0 1 0 1 5179 724 0 0 0 0
bewiki 0 0 0 1 0 1 8817 1240 0 0 0 0
bewikiquote 0 0 0 0 0 0 31 0 0 0 0 0
bewikisource 0 0 0 0 0 0 20 1 0 0 0 0
bgwiki 2 1 0 1 2 4 18512 3536 0 0 0 0
bgwikiquote 0 0 0 1 0 0 1592 334 0 0 0 0
bgwikisource 0 0 0 0 0 0 22 2 0 0 0 0
bhwiki 0 0 0 0 0 0 21 3 0 0 0 0
biwiki 0 0 0 0 0 0 37 10 0 0 0 0
bjnwiki 0 0 0 0 0 0 26 14 0 0 0 0
bmwiki 0 0 0 0 0 0 21 3 0 0 0 0
bnwiki 1 0 0 1 1 0 3515 676 0 0 0 0
bnwikisource 0 0 0 0 0 0 4 0 0 0 0 0
bowiki 0 0 0 0 0 0 316 44 0 0 0 0
bpywiki 0 0 0 0 0 0 57 14 0 0 0 0
brwiki 0 0 0 1 0 0 4342 1438 0 0 0 0
brwikiquote 0 0 0 0 0 0 52 2 0 0 0 0
brwikisource 0 0 0 0 0 0 19 2 0 0 0 0
bswiki 1 0 0 0 0 2 3558 610 0 0 0 0
bswikiquote 0 0 0 0 0 0 994 147 0 0 0 0
bswikisource 0 0 0 0 0 0 6 0 0 0 0 0
bugwiki 0 0 0 0 0 0 1 3 0 0 0 0
bxrwiki 0 0 0 0 0 0 109 9 0 0 0 0
cawiki 1 0 0 1 4 5 33846 5005 1 0 0 0
cawikiquote 0 0 0 0 0 0 534 59 0 0 0 0
cawikisource 0 0 0 0 0 0 174 6 0 0 0 0
cbk_zamwiki 0 0 0 0 0 0 119 47 0 0 0 0
cdowiki 0 0 0 0 0 0 36 4 0 0 0 0
cebwiki 0 0 0 0 0 0 404 72 0 0 0 0
cewiki 0 0 0 0 0 0 249 16 0 0 0 0
chrwiki 0 0 0 0 0 0 19 18 0 0 0 0
chwiki 0 0 0 0 0 0 11 1 0 0 0 0
chywiki 0 0 0 0 0 0 23 9 0 0 0 0
ckbwiki 0 1 0 0 1 0 1520 180 0 0 0 0
commonswiki 1 0 0 1 2 6 18146 4221 1 1 0 0
... ... ... ... ... ... ... ... ... ... ... ...

396 rows × 12 columns

In [6]:
#norm_sex is joke on heteronormativity
norm_sex = sex_df.apply(lambda row: row / row.sum(), axis=1)
norm_sex
Out[6]:
Q43445 Q1097630 Q746411 Q639354 Q1052281 Q44148 Q6581097 Q6581072 Q2449503 Q48270 Q8441 Q658
abwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.892308 0.107692 0.000000 0.000000 0 0
acewiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.840376 0.159624 0.000000 0.000000 0 0
afwiki 0.000000 0.000000 0 0.000288 0.000000 0.000288 0.883573 0.115850 0.000000 0.000000 0 0
afwikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.946237 0.053763 0.000000 0.000000 0 0
akwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.866667 0.133333 0.000000 0.000000 0 0
alswiki 0.000000 0.000000 0 0.000567 0.000000 0.000000 0.888322 0.111111 0.000000 0.000000 0 0
amwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.896104 0.103896 0.000000 0.000000 0 0
angwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.797665 0.202335 0.000000 0.000000 0 0
anwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.833623 0.166377 0.000000 0.000000 0 0
arcwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.882353 0.117647 0.000000 0.000000 0 0
arwiki 0.000040 0.000040 0 0.000040 0.000079 0.000079 0.846166 0.153516 0.000000 0.000040 0 0
arwikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.933333 0.066667 0.000000 0.000000 0 0
arwikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.882353 0.117647 0.000000 0.000000 0 0
arzwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.712421 0.287158 0.000421 0.000000 0 0
astwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.842975 0.157025 0.000000 0.000000 0 0
aswiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.774336 0.225664 0.000000 0.000000 0 0
avwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0 0
aywiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.916667 0.083333 0.000000 0.000000 0 0
azwiki 0.000174 0.000174 0 0.000000 0.000174 0.000174 0.825135 0.174170 0.000000 0.000000 0 0
azwikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.938017 0.061983 0.000000 0.000000 0 0
azwikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0 0
barwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.861856 0.138144 0.000000 0.000000 0 0
bat_smgwiki 0.000000 0.001312 0 0.000000 0.000000 0.000000 0.837270 0.161417 0.000000 0.000000 0 0
bawiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.915761 0.084239 0.000000 0.000000 0 0
bclwiki 0.000000 0.002045 0 0.000000 0.000000 0.000000 0.846626 0.151329 0.000000 0.000000 0 0
be_x_oldwiki 0.000000 0.000000 0 0.000169 0.000000 0.000169 0.877053 0.122608 0.000000 0.000000 0 0
bewiki 0.000000 0.000000 0 0.000099 0.000000 0.000099 0.876528 0.123273 0.000000 0.000000 0 0
bewikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0 0
bewikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.952381 0.047619 0.000000 0.000000 0 0
bgwiki 0.000091 0.000045 0 0.000045 0.000091 0.000181 0.839242 0.160305 0.000000 0.000000 0 0
bgwikiquote 0.000000 0.000000 0 0.000519 0.000000 0.000000 0.826155 0.173326 0.000000 0.000000 0 0
bgwikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.916667 0.083333 0.000000 0.000000 0 0
bhwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.875000 0.125000 0.000000 0.000000 0 0
biwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.787234 0.212766 0.000000 0.000000 0 0
bjnwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.650000 0.350000 0.000000 0.000000 0 0
bmwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.875000 0.125000 0.000000 0.000000 0 0
bnwiki 0.000238 0.000000 0 0.000238 0.000238 0.000000 0.838102 0.161183 0.000000 0.000000 0 0
bnwikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0 0
bowiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.877778 0.122222 0.000000 0.000000 0 0
bpywiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.802817 0.197183 0.000000 0.000000 0 0
brwiki 0.000000 0.000000 0 0.000173 0.000000 0.000000 0.751081 0.248746 0.000000 0.000000 0 0
brwikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.962963 0.037037 0.000000 0.000000 0 0
brwikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.904762 0.095238 0.000000 0.000000 0 0
bswiki 0.000240 0.000000 0 0.000000 0.000000 0.000480 0.853033 0.146248 0.000000 0.000000 0 0
bswikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.871166 0.128834 0.000000 0.000000 0 0
bswikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0 0
bugwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.250000 0.750000 0.000000 0.000000 0 0
bxrwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.923729 0.076271 0.000000 0.000000 0 0
cawiki 0.000026 0.000000 0 0.000026 0.000103 0.000129 0.870905 0.128786 0.000026 0.000000 0 0
cawikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.900506 0.099494 0.000000 0.000000 0 0
cawikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.966667 0.033333 0.000000 0.000000 0 0
cbk_zamwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.716867 0.283133 0.000000 0.000000 0 0
cdowiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.900000 0.100000 0.000000 0.000000 0 0
cebwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.848739 0.151261 0.000000 0.000000 0 0
cewiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.939623 0.060377 0.000000 0.000000 0 0
chrwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.513514 0.486486 0.000000 0.000000 0 0
chwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.916667 0.083333 0.000000 0.000000 0 0
chywiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.718750 0.281250 0.000000 0.000000 0 0
ckbwiki 0.000000 0.000588 0 0.000000 0.000588 0.000000 0.893067 0.105758 0.000000 0.000000 0 0
commonswiki 0.000045 0.000000 0 0.000045 0.000089 0.000268 0.810849 0.188614 0.000045 0.000045 0 0
... ... ... ... ... ... ... ... ... ... ... ...

396 rows × 12 columns

In [7]:
#Tranforming QIDs into English labels.
enwp = pywikibot.Site('en','wikipedia')
wikidata = enwp.data_repository()

def english_label(qid):
    page = pywikibot.ItemPage(wikidata, qid)
    data = page.get()
    return data['labels']['en']
In [8]:
sex_qs = [str(q) for q in norm_sex.columns]
sex_labels = [english_label(sex_q) for sex_q in sex_qs]

norm_sex.columns = sex_labels
norm_sex
VERBOSE:pywiki:Found 1 wikidata:wikidata processes running, including this one.
Out[8]:
female animal intersex kathoey Female transgender female male animal male female transgender male genderqueer man sodium
abwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.892308 0.107692 0.000000 0.000000 0 0
acewiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.840376 0.159624 0.000000 0.000000 0 0
afwiki 0.000000 0.000000 0 0.000288 0.000000 0.000288 0.883573 0.115850 0.000000 0.000000 0 0
afwikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.946237 0.053763 0.000000 0.000000 0 0
akwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.866667 0.133333 0.000000 0.000000 0 0
alswiki 0.000000 0.000000 0 0.000567 0.000000 0.000000 0.888322 0.111111 0.000000 0.000000 0 0
amwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.896104 0.103896 0.000000 0.000000 0 0
angwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.797665 0.202335 0.000000 0.000000 0 0
anwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.833623 0.166377 0.000000 0.000000 0 0
arcwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.882353 0.117647 0.000000 0.000000 0 0
arwiki 0.000040 0.000040 0 0.000040 0.000079 0.000079 0.846166 0.153516 0.000000 0.000040 0 0
arwikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.933333 0.066667 0.000000 0.000000 0 0
arwikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.882353 0.117647 0.000000 0.000000 0 0
arzwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.712421 0.287158 0.000421 0.000000 0 0
astwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.842975 0.157025 0.000000 0.000000 0 0
aswiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.774336 0.225664 0.000000 0.000000 0 0
avwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0 0
aywiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.916667 0.083333 0.000000 0.000000 0 0
azwiki 0.000174 0.000174 0 0.000000 0.000174 0.000174 0.825135 0.174170 0.000000 0.000000 0 0
azwikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.938017 0.061983 0.000000 0.000000 0 0
azwikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0 0
barwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.861856 0.138144 0.000000 0.000000 0 0
bat_smgwiki 0.000000 0.001312 0 0.000000 0.000000 0.000000 0.837270 0.161417 0.000000 0.000000 0 0
bawiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.915761 0.084239 0.000000 0.000000 0 0
bclwiki 0.000000 0.002045 0 0.000000 0.000000 0.000000 0.846626 0.151329 0.000000 0.000000 0 0
be_x_oldwiki 0.000000 0.000000 0 0.000169 0.000000 0.000169 0.877053 0.122608 0.000000 0.000000 0 0
bewiki 0.000000 0.000000 0 0.000099 0.000000 0.000099 0.876528 0.123273 0.000000 0.000000 0 0
bewikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0 0
bewikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.952381 0.047619 0.000000 0.000000 0 0
bgwiki 0.000091 0.000045 0 0.000045 0.000091 0.000181 0.839242 0.160305 0.000000 0.000000 0 0
bgwikiquote 0.000000 0.000000 0 0.000519 0.000000 0.000000 0.826155 0.173326 0.000000 0.000000 0 0
bgwikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.916667 0.083333 0.000000 0.000000 0 0
bhwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.875000 0.125000 0.000000 0.000000 0 0
biwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.787234 0.212766 0.000000 0.000000 0 0
bjnwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.650000 0.350000 0.000000 0.000000 0 0
bmwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.875000 0.125000 0.000000 0.000000 0 0
bnwiki 0.000238 0.000000 0 0.000238 0.000238 0.000000 0.838102 0.161183 0.000000 0.000000 0 0
bnwikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0 0
bowiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.877778 0.122222 0.000000 0.000000 0 0
bpywiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.802817 0.197183 0.000000 0.000000 0 0
brwiki 0.000000 0.000000 0 0.000173 0.000000 0.000000 0.751081 0.248746 0.000000 0.000000 0 0
brwikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.962963 0.037037 0.000000 0.000000 0 0
brwikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.904762 0.095238 0.000000 0.000000 0 0
bswiki 0.000240 0.000000 0 0.000000 0.000000 0.000480 0.853033 0.146248 0.000000 0.000000 0 0
bswikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.871166 0.128834 0.000000 0.000000 0 0
bswikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0 0
bugwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.250000 0.750000 0.000000 0.000000 0 0
bxrwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.923729 0.076271 0.000000 0.000000 0 0
cawiki 0.000026 0.000000 0 0.000026 0.000103 0.000129 0.870905 0.128786 0.000026 0.000000 0 0
cawikiquote 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.900506 0.099494 0.000000 0.000000 0 0
cawikisource 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.966667 0.033333 0.000000 0.000000 0 0
cbk_zamwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.716867 0.283133 0.000000 0.000000 0 0
cdowiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.900000 0.100000 0.000000 0.000000 0 0
cebwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.848739 0.151261 0.000000 0.000000 0 0
cewiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.939623 0.060377 0.000000 0.000000 0 0
chrwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.513514 0.486486 0.000000 0.000000 0 0
chwiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.916667 0.083333 0.000000 0.000000 0 0
chywiki 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.718750 0.281250 0.000000 0.000000 0 0
ckbwiki 0.000000 0.000588 0 0.000000 0.000588 0.000000 0.893067 0.105758 0.000000 0.000000 0 0
commonswiki 0.000045 0.000000 0 0.000045 0.000089 0.000268 0.810849 0.188614 0.000045 0.000045 0 0
... ... ... ... ... ... ... ... ... ... ... ...

396 rows × 12 columns

In [9]:
sex_df['total'] = sex_df.sum(axis=1)

female_sorted_10000_items = norm_sex[sex_df['total']>10000].sort('female', ascending=True)
In [11]:
female_sorted_10000_items.plot(kind='bar', stacked=True, legend=True, figsize=(13,8), ylim=(0,1),
                         title= '''Comoposition of Wikidata Prorerty:P21 "Sex or Gender" by Language 
    (Languages with over 1,000 associated P21)''')
Out[11]:
<matplotlib.axes.AxesSubplot at 0x7f272c8852d0>

Go put that on your fridge!

Step 3 become embroiled in gender-politics debates

or optionally come to the Sunday hackathon and we look at producing your idea, or help work on doing this same analysis but using a time component, probably by decade.

Answering Big Questions with Wikidata

Wikiconference USA Workshop

Friday 30 May 2014

By Max Klein

Twitter @notconfusing

Blog notconfusing.com

Slides http://tiny.cc/answerswikidata

In [ ]: