This task looks at getting a better understanding of the skin preferences opted by our users.
Note:
Values like 'chick','simple,','classic' etc. that fall back to the default skin have been modified to reflect that.
The users in each section are all the users who met the edit threshold when their edits from the past year were summed across all projects. Each user's skin was checked on the wiki where they made the most edits during the year (for both the edit threshold and home wiki identification, Wikidata edits were each treated as 1/10th of an edit to account for the greater granularity of edits there).
Skin | Percentage of users (with 5 or more edits) | Percentage of users (with 30 or more edits) | Percentage of users (with 600 or more edits) |
---|---|---|---|
vector | 97.2% | 95.5% | 91.4% |
monobook | 2.0% | 3.4% | 7.2% |
modern | 0.3% | 0.4% | 0.7% |
timeless | 0.3% | 0.5% | 0.5% |
cologneblue | 0.1% | 0.1% | 0.1% |
minerva | 0.1% | 0.1% | 0.0% |
import pandas as pd
import numpy as np
import datetime as dt
from wmfdata import hive, mariadb
You are using wmfdata v1.0.1, but v1.0.3 is available. To update, run `pip install --upgrade git+https://github.com/neilpquinn/wmfdata/wmfdata.git@release`. To see the changes, refer to https://github.com/neilpquinn/wmfdata/blob/release/CHANGELOG.md
We will first begin by taking all users that were active editors i.e. had 5 or more content edits overall in the last year from May 2019 to May 2020. We will pick the wiki where each user had the most edits and treat the preference there as their global preference.
We will use their user ids to query the mariadb table user_properties to get their skin preferences.
HIVE_SNAPSHOT = "2020-05"
START_OF_DATA = "2019-05-01"
END_OF_DATA = "2020-06-01"
#all active editors from the past one year
active_editor_query = """
WITH yr_proj_edits as (
select
event_user_text as user,
event_user_id as user_id,
wiki_db as proj,
sum(if(wiki_db = "wikidatawiki", 0.1, 1)) as content_edits,
max(event_timestamp) as latest_edit
from wmf.mediawiki_history
where
-- REGISTERED
event_user_is_anonymous = false and
-- NON-BOT
size(event_user_is_bot_by) = 0 and
not array_contains(event_user_groups, "bot") and
-- CONTENT EDITS
event_entity = "revision" and
event_type = "create" and
page_namespace_is_content = true and
-- FROM THE LAST YEAR
event_timestamp >= "{START_OF_DATA}" and event_timestamp < "{END_OF_DATA}" and
-- FROM THE LATEST SNAPSHOT
snapshot = "{hive_snapshot}"
-- PER USER, PER WIKI
group by event_user_text, event_user_id, wiki_db
)
-- FINAL SELECT OF
select
user as user_name,
user_id as user_id,
proj as wiki,
global_edits
from
-- JOINED TO THEIR HOME WIKI AND GLOBAL EDITS
(
select
user,
user_id,
proj,
-- in the unlikely event that wikis are tied by edit count and latest edit,
-- row_number() will break it somehow
row_number() over (partition by user order by content_edits desc, latest_edit desc) as rank,
sum(content_edits) over (partition by user) as global_edits
from yr_proj_edits
) yr_edits
where
rank = 1
and global_edits>=5
"""
active_editor = hive.run(
active_editor_query.format(
hive_snapshot = HIVE_SNAPSHOT,
START_OF_DATA= START_OF_DATA,
END_OF_DATA=END_OF_DATA
)
)
Total_active_ed = active_editor['user_id'].count()
print('Total number of editors for whom we will be checking skin preferences:' , Total_active_ed)
Total number of editors for whom we will be checking skin preferences: 587111
#Querying user_properties for getting the skin preferences set by the active editors we got in the above query
query='''
SELECT
up_value AS skin,
COUNT(*) AS users
FROM user_properties
WHERE up_user in ({users})
AND up_property = "skin"
GROUP BY up_value
'''
We will be using the user_properties table to identify skin preferences set by the active editors. Note that if a user has not set a preferred skin then there will be no record for that user in this table and skin preference is default i.e. Vector
# Looping through each wiki for the list of users for each skin
wikis=active_editor['wiki'].unique()
up_skin=list()
for wiki in wikis:
user_ids = active_editor[active_editor['wiki'] == wiki]["user_id"]
user_list = ','.join([str(u) for u in user_ids])
prefs = mariadb.run(
query.format(users=user_list),
wiki
)
up_skin.append(prefs)
skin= pd.concat(up_skin)
skin_users = skin['users'].sum()
print('Total number of editors for whom we have preferences set in the user_properties table:' , skin_users)
Total number of editors for whom we have preferences set in the user_properties table: 29587
There are a huge number of users who do not have data for skin preference in the user_preference table indicating that they are set to the default 'Vector' skin OR due to being deleted from the user_preference table. For this analysis, let's default them to 'Vector'.
vectors=np.subtract(Total_active_ed,skin_users)
vectors
557524
new_row={'skin':'vector', 'users':vectors}
skin=skin.append(new_row,ignore_index=True)
We are making the following considerations from the information given on this page :
skin_aliases = {
"":"vector",
"0":"vector",
"1":"vector",
"simple":"vector",
"nostalgia":"vector",
"chick":"vector",
"standard":"vector",
"classic":"vector",
"2":"cologneblue",
"myskin":"monobook",
"minervaneue":"minerva"
}
skin= skin.replace({"skin": skin_aliases})
user_skin=skin.groupby('skin').sum()
user_skin
users | |
---|---|
skin | |
cologneblue | 427 |
minerva | 606 |
modern | 1499 |
monobook | 11727 |
timeless | 1916 |
vector | 570936 |
pct_user_skin=(100. * user_skin / user_skin.sum()).round(1).astype(str) + '%'
pct_user_skin.sort_values(by=['users'],ascending=False)
users | |
---|---|
skin | |
vector | 97.2% |
monobook | 2.0% |
modern | 0.3% |
timeless | 0.3% |
cologneblue | 0.1% |
minerva | 0.1% |
Now let's look at users that were active editors and had 30 or more content edits in the lsat one year from May 2019 to May 2020. We will pick the wiki where each user had the most edits and treat the preference there as their global preference.
#all active editors from the past one year
active_editor_30_query = """
WITH yr_proj_edits as (
select
event_user_text as user,
event_user_id as user_id,
wiki_db as proj,
sum(if(wiki_db = "wikidatawiki", 0.1, 1)) as content_edits,
max(event_timestamp) as latest_edit
from wmf.mediawiki_history
where
-- REGISTERED
event_user_is_anonymous = false and
-- NON-BOT
size(event_user_is_bot_by) = 0 and
not array_contains(event_user_groups, "bot") and
-- CONTENT EDITS
event_entity = "revision" and
event_type = "create" and
page_namespace_is_content = true and
-- FROM THE LAST YEAR
event_timestamp >= "{START_OF_DATA}" and event_timestamp < "{END_OF_DATA}" and
-- FROM THE LATEST SNAPSHOT
snapshot = "{hive_snapshot}"
-- PER USER, PER WIKI
group by event_user_text, event_user_id, wiki_db
)
-- FINAL SELECT OF
select
user as user_name,
user_id as user_id,
proj as wiki,
global_edits
from
-- JOINED TO THEIR HOME WIKI AND GLOBAL EDITS
(
select
user,
user_id,
proj,
-- in the unlikely event that wikis are tied by edit count and latest edit,
-- row_number() will break it somehow
row_number() over (partition by user order by content_edits desc, latest_edit desc) as rank,
sum(content_edits) over (partition by user) as global_edits
from yr_proj_edits
) yr_edits
where
rank = 1
and global_edits>=30
"""
active_editor_30 = hive.run(
active_editor_30_query.format(
hive_snapshot = HIVE_SNAPSHOT,
START_OF_DATA= START_OF_DATA,
END_OF_DATA=END_OF_DATA
)
)
Total_active_ed_30 = active_editor_30['user_id'].count()
print('Total number of editors (with greater than 30 edits) for whom we will be checking skin preferences:' ,
Total_active_ed_30)
Total number of editors with greater than 30 edits for whom we will be checking skin preferences: 148973
# Looping through each wiki for the list of users for each skin
wikis=active_editor_30['wiki'].unique()
up_skin=list()
for wiki in wikis:
user_ids = active_editor_30[active_editor_30['wiki'] == wiki]["user_id"]
user_list = ','.join([str(u) for u in user_ids])
prefs = mariadb.run(
query.format(users=user_list),
wiki
)
up_skin.append(prefs)
skin_30= pd.concat(up_skin)
skin_users_30 = skin_30['users'].sum()
print('Total number of editors (with 30 or more edits) for whom we have preferences set in the user_properties table:'
, skin_users_30)
Total number of editors (with 30 or more edits) for whom we have preferences set in the user_properties table: 10001
vectors_30=np.subtract(Total_active_ed_30,skin_users_30)
vectors_30
138972
new_row={'skin':'vector', 'users':vectors_30}
skin_30=skin_30.append(new_row,ignore_index=True)
skin_30= skin_30.replace({"skin": skin_aliases})
user_skin_30=skin_30.groupby('skin').sum()
user_skin_30
users | |
---|---|
skin | |
cologneblue | 153 |
minerva | 134 |
modern | 648 |
monobook | 5014 |
timeless | 740 |
vector | 142284 |
pct_user_skin_30=(100. * user_skin_30 / user_skin_30.sum()).round(1).astype(str) + '%'
pct_user_skin_30.sort_values(by=['users'],ascending=False)
users | |
---|---|
skin | |
vector | 95.5% |
monobook | 3.4% |
timeless | 0.5% |
modern | 0.4% |
cologneblue | 0.1% |
minerva | 0.1% |
Lastly, let's look at users that were very active editors and had 600 or more content edits in the lsat one year from May 2019 to May 2020. We will pick the wiki where each user had the most edits and treat the preference there as their global preference.
#all active editors from the past one year
active_editor_600_query = """
WITH yr_proj_edits as (
select
event_user_text as user,
event_user_id as user_id,
wiki_db as proj,
sum(if(wiki_db = "wikidatawiki", 0.1, 1)) as content_edits,
max(event_timestamp) as latest_edit
from wmf.mediawiki_history
where
-- REGISTERED
event_user_is_anonymous = false and
-- NON-BOT
size(event_user_is_bot_by) = 0 and
not array_contains(event_user_groups, "bot") and
-- CONTENT EDITS
event_entity = "revision" and
event_type = "create" and
page_namespace_is_content = true and
-- FROM THE LAST YEAR
event_timestamp >= "{START_OF_DATA}" and event_timestamp < "{END_OF_DATA}" and
-- FROM THE LATEST SNAPSHOT
snapshot = "{hive_snapshot}"
-- PER USER, PER WIKI
group by event_user_text, event_user_id, wiki_db
)
-- FINAL SELECT OF
select
user as user_name,
user_id as user_id,
proj as wiki,
global_edits
from
-- JOINED TO THEIR HOME WIKI AND GLOBAL EDITS
(
select
user,
user_id,
proj,
-- in the unlikely event that wikis are tied by edit count and latest edit,
-- row_number() will break it somehow
row_number() over (partition by user order by content_edits desc, latest_edit desc) as rank,
sum(content_edits) over (partition by user) as global_edits
from yr_proj_edits
) yr_edits
where
rank = 1
and global_edits>=600
"""
active_editor_600 = hive.run(
active_editor_600_query.format(
hive_snapshot = HIVE_SNAPSHOT,
START_OF_DATA= START_OF_DATA,
END_OF_DATA=END_OF_DATA
)
)
Total_active_ed_600 = active_editor_600['user_id'].count()
print('Total number of editors (with greater than 600 edits) for whom we will be checking skin preferences:' ,
Total_active_ed_600)
Total number of editors (with greater than 600 edits) for whom we will be checking skin preferences: 24080
# Looping through each wiki for the list of users for each skin
wikis=active_editor_600['wiki'].unique()
up_skin=list()
for wiki in wikis:
user_ids = active_editor_600[active_editor_600['wiki'] == wiki]["user_id"]
user_list = ','.join([str(u) for u in user_ids])
prefs = mariadb.run(
query.format(users=user_list),
wiki
)
up_skin.append(prefs)
skin_600= pd.concat(up_skin)
skin_users_600 = skin_600['users'].sum()
print('Total number of editors (with 600 or more edits) for whom we have preferences set in the user_properties table:'
, skin_users_600)
Total number of editors (with 600 or more edits) for whom we have preferences set in the user_properties table: 2536
vectors_600=np.subtract(Total_active_ed_600,skin_users_600)
vectors_600
21544
new_row={'skin':'vector', 'users':vectors_600}
skin_600=skin_600.append(new_row,ignore_index=True)
skin_600= skin_600.replace({"skin": skin_aliases})
user_skin_600=skin_600.groupby('skin').sum()
user_skin_600
users | |
---|---|
skin | |
cologneblue | 27 |
minerva | 9 |
modern | 171 |
monobook | 1734 |
timeless | 130 |
vector | 22009 |
pct_user_skin_600=(100. * user_skin_600 / user_skin_600.sum()).round(1).astype(str) + '%'
pct_user_skin_600.sort_values(by=['users'],ascending=False)
users | |
---|---|
skin | |
vector | 91.4% |
monobook | 7.2% |
modern | 0.7% |
timeless | 0.5% |
cologneblue | 0.1% |
minerva | 0.0% |
These results are more or less consistent with the results we got with the analysis that was done in 2017 Statistics about /active/ users of skins on the Wikimedia cluster i.e. not a lot has changed in terms of users' choice of skin on the wikis.
We have kept the edit bucket similar to the last analysis for ease of comparison.