Druing initial check, we uncovered, "...53.4% of the users (both registered and anonymous) who were bucketed ended up in wikitext default bucket. It turns out that it would be incredibly unlikely (p << 10^-15) to get an imbalance this big if our random assignment was actually 50%–50%. So there's clearly an serious issue somewhere that we need to understand."
Previous bucketing results:
bucket | users |
---|---|
source default | 1,302,187 |
visual default | 1,214,917 |
What might be causing contributors not being assigned to test buckets in a more balanced way?
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
library(tidyverse); library(lubridate)
library(scales); library(data.table)
})
query <-
"
SELECT
event.editing_session_id as edit_attempt_id,
wiki,
event.platform as platform,
useragent.browser_family as browser_family,
useragent.os_family as os_family,
event.editor_interface as interface,
if(event.user_id != 0, concat(wiki, '-', event.user_id), event.anonymous_user_token) as user_id,
event.user_id = 0 as user_is_anonymous_byid,
if(event.anonymous_user_token is NULL, false, true) as user_is_anonymous_bytoken,
event.user_id != 0 as user_is_registered,
event.action as action,
event.init_timing as init_timing,
event.bucket,
geocoded_data['country'] as country,
event.user_editcount as user_edit_count
FROM event.editattemptstep
WHERE
event.bucket in ('default-visual', 'default-source') and
year = 2019 and (
month = 7 and day >= 14 or
month >= 8)
"
sessions = wmf::query_hive(query)
#recheck overall user bucket number to confirm any changes imbalance
sessions_all <- sessions %>%
group_by(bucket) %>%
summarise(users = n_distinct(user_id),
attempts = n_distinct(edit_attempt_id))
sessions_all
bucket | users | attempts |
---|---|---|
<chr> | <int> | <int> |
default-source | 1907187 | 2721255 |
default-visual | 1784439 | 2539846 |
Confirmed that a larger percentage of users (about 51.6%) are still being included in the wikitext as default bucket.
## Break down by anonymous users to identify any discrepancy that occurs there
sessions_anonymous <- sessions %>%
filter(user_is_anonymous_byid == 'true') %>%
group_by(bucket) %>%
summarise(users = n_distinct(user_id),
attempts = n_distinct(edit_attempt_id))
sessions_anonymous
bucket | users | attempts |
---|---|---|
<chr> | <int> | <int> |
default-source | 1891449 | 2663795 |
default-visual | 1769443 | 2482893 |
We also see an imbalance when looking at just anonymous users. About 51.7% of users are included in the wikitext as default bucket, similar to when looking at all users.
## Look at user_isanonymous_bytoken to confirm it matches with user_is_anonymous_byid
sessions_anonymous_bytoken <- sessions %>%
filter(user_is_anonymous_bytoken == "true") %>%
group_by(bucket) %>%
summarise(users = n_distinct(user_id),
attempts = n_distinct(edit_attempt_id))
sessions_anonymous_bytoken
bucket | users | attempts |
---|---|---|
<chr> | <int> | <int> |
default-source | 1891449 | 2663795 |
default-visual | 1769443 | 2482893 |
Confirmed those two are equal.
## Break down by registered users to identify any discrepancy that occurs there
sessions_registered <- sessions %>%
filter(user_is_registered == "true") %>%
group_by(bucket) %>%
summarise(users = n_distinct(user_id),
attempts = n_distinct(edit_attempt_id))
sessions_registered
bucket | users | attempts |
---|---|---|
<chr> | <int> | <int> |
default-source | 9137 | 47809 |
default-visual | 8819 | 47925 |
Unable to isolate imbalance to just anonymous or registred users.
We also see an imbalance when looking at just anonymous users or registered users. About 51.7% of anonymous users and 50.9% of registered users are included in the wikitext as default bucket, similar to when looking at all users.
sessions_bybrowser <- sessions %>%
group_by(bucket, browser_family) %>%
summarise(users = n_distinct(user_id),
attempts = n_distinct(edit_attempt_id)) %>%
arrange(browser_family)
head(sessions_bybrowser)
bucket | browser_family | users | attempts |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
default-source | Amazon Silk | 262 | 395 |
default-visual | Amazon Silk | 280 | 402 |
default-source | Android | 4667 | 6604 |
default-visual | Android | 3887 | 6178 |
default-source | Baidu Browser | 1 | 1 |
default-source | bingbot | 398 | 689 |
sessions_bybrowser$users <- as.numeric(sessions_bybrowser$users)
sessions_bybrowser$attempts <- as.numeric(sessions_bybrowser$attempts)
sessions_bybrowser$bucket[sessions_bybrowser$bucket == "default-source"] <- "default_source"
sessions_bybrowser$bucket[sessions_bybrowser$bucket == "default-visual"] <- "default_visual"
# Look at test bucket user counts for top browsers
sessions_bybrowser_plot <- sessions_bybrowser %>%
filter(browser_family %in% c('Chrome Mobile', 'Mobile Safari', 'Samsung Internet', 'Chrome',
'Chrome Mobile WebView', 'UC Browser', 'Opera Mobile', 'Firefox Mobile',
'Chrome Mobile iOS', 'Facebook', 'Android', 'Mobile Safari UI/WKWebView',
'Edge Mobile', 'Edge')) %>% # filter to top browsers for plot visibility
ggplot(aes(x= browser_family, y = users, fill = bucket)) +
geom_col() +
scale_y_continuous("user counts") +
labs(title = "Editor test bucket user counts by browser") +
ggthemes::theme_tufte(base_size = 10, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position= "none")
sessions_bybrowser_plot
For the major browser types, there is a similar imbalance.
sessions_bybrowser_percent_imbalance <- sessions_bybrowser %>%
select(-4) %>% ##look at user counts only
spread(bucket,users) %>%
#mutate(percent_diff = round(abs((default_source-default_visual)/default_visual *100), 3)) %>%
mutate(percent_wikitext_users = default_source/(default_source + default_visual) *100) %>%
arrange(desc(percent_wikitext_users))
head(sessions_bybrowser_percent_imbalance)
browser_family | default_source | default_visual | percent_wikitext_users |
---|---|---|---|
<chr> | <dbl> | <dbl> | <dbl> |
bingbot | 398 | 2 | 99.50000 |
BingPreview | 377 | 7 | 98.17708 |
NetFront NX | 18 | 1 | 94.73684 |
Chromium | 7 | 1 | 87.50000 |
oBot | 3 | 1 | 75.00000 |
Vivaldi | 3 | 1 | 75.00000 |
We do notice some much larger differences in buckets for the smaller browsers. Some of these are just due to the smaller population of overall users; however, there seems to be a signficant difference seeen for Bingbot and BingPreview browsers. These are both web crawling bot browsers, which might be a clue into what is occuring.
sessions_bycountry <- sessions %>%
group_by(bucket, country) %>%
summarise(users = n_distinct(user_id),
attempts = n_distinct(edit_attempt_id)) %>%
arrange(country, users)
head(sessions_bycountry)
bucket | country | users | attempts |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
default-visual | Afghanistan | 53 | 72 |
default-source | Afghanistan | 56 | 65 |
default-visual | Åland | 221 | 284 |
default-source | Åland | 239 | 313 |
default-visual | Albania | 810 | 1073 |
default-source | Albania | 871 | 1145 |
Unable to isolate to a specific country. A large number of users for each country (around 51%) are added to the wikitext as default bucket.
There is an imbalance across all countries as well with a higher percentage of users added to the wikitext editor as default. bucket
sessions_bywiki <- sessions %>%
group_by(bucket, wiki) %>%
summarise(users = n_distinct(user_id),
attempts = n_distinct(edit_attempt_id)) %>%
arrange(wiki)
sessions_bywiki
bucket | wiki | users | attempts |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
default-source | azwiki | 30959 | 45825 |
default-visual | azwiki | 27084 | 39485 |
default-source | bgwiki | 36032 | 54248 |
default-visual | bgwiki | 34397 | 50781 |
default-source | cawiki | 17405 | 23624 |
default-visual | cawiki | 16698 | 22615 |
default-source | dawiki | 40500 | 51981 |
default-visual | dawiki | 38869 | 49837 |
default-source | elwiki | 70126 | 104277 |
default-visual | elwiki | 66026 | 98229 |
default-source | etwiki | 9316 | 12977 |
default-visual | etwiki | 8893 | 11862 |
default-source | fiwiki | 82345 | 120946 |
default-visual | fiwiki | 79193 | 115314 |
default-source | hrwiki | 42845 | 58747 |
default-visual | hrwiki | 40293 | 55096 |
default-source | huwiki | 96830 | 140007 |
default-visual | huwiki | 93882 | 135732 |
default-source | mlwiki | 42919 | 64669 |
default-visual | mlwiki | 37837 | 55544 |
default-source | mswiki | 71688 | 97154 |
default-visual | mswiki | 65501 | 88663 |
default-source | nowiki | 52394 | 68146 |
default-visual | nowiki | 50291 | 65356 |
default-source | ptwiki | 691554 | 994634 |
default-visual | ptwiki | 645544 | 936121 |
default-source | rowiki | 76775 | 106667 |
default-visual | rowiki | 72683 | 100104 |
default-source | srwiki | 51947 | 78163 |
default-visual | srwiki | 48594 | 70192 |
default-source | svwiki | 125415 | 172066 |
default-visual | svwiki | 122535 | 167446 |
default-source | tawiki | 82113 | 124439 |
default-visual | tawiki | 76366 | 112996 |
default-source | thwiki | 250324 | 353956 |
default-visual | thwiki | 226800 | 319519 |
default-source | urwiki | 10885 | 16939 |
default-visual | urwiki | 9567 | 14704 |
default-source | zh_yuewiki | 18214 | 22139 |
default-visual | zh_yuewiki | 17209 | 21222 |
sessions_bywiki$users <- as.numeric(sessions_bywiki$users)
sessions_bywiki$attempts <- as.numeric(sessions_bywiki$attempts)
sessions_bywiki$bucket[sessions_bywiki$bucket == "default-source"] <- "default_source"
sessions_bywiki$bucket[sessions_bywiki$bucket == "default-visual"] <- "default_visual"
sessions_bywiki_percent_imbalance <- sessions_bywiki %>%
select(-4) %>% ##look at user counts only
spread(bucket,users) %>%
#mutate(percent_diff = round(abs((default_source-default_visual)/default_visual *100), 3)) %>%
mutate(percent_wikitext_users = default_source/(default_source + default_visual) *100) %>%
arrange(desc(percent_wikitext_users))
head(sessions_bywiki_percent_imbalance)
wiki | default_source | default_visual | percent_wikitext_users |
---|---|---|---|
<chr> | <dbl> | <dbl> | <dbl> |
azwiki | 30959 | 27084 | 53.33804 |
urwiki | 10885 | 9567 | 53.22218 |
mlwiki | 42919 | 37837 | 53.14652 |
thwiki | 250324 | 226800 | 52.46519 |
mswiki | 71688 | 65501 | 52.25492 |
tawiki | 82113 | 76366 | 51.81317 |
Unable to isolate the imblance to a particular wiki. Similar imblance trends seen.
Based on earlier result, we should see an imblance across all actions as well but just want to confirm.
sessions_byaction <- sessions %>%
group_by(bucket, action) %>%
summarise(users = n_distinct(user_id),
attempts = n_distinct(edit_attempt_id)) %>%
arrange(action)
sessions_byaction
bucket | action | users | attempts |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
default-source | abort | 1387151 | 1763603 |
default-visual | abort | 835965 | 998167 |
default-source | init | 1906751 | 2720469 |
default-visual | init | 1784041 | 2539217 |
default-source | loaded | 1903903 | 2708243 |
default-visual | loaded | 1256521 | 1708057 |
default-source | ready | 1904072 | 2708547 |
default-visual | ready | 1256752 | 1708386 |
default-source | saveAttempt | 32690 | 88018 |
default-visual | saveAttempt | 27793 | 73583 |
default-source | saveFailure | 8501 | 12529 |
default-visual | saveFailure | 7635 | 11097 |
default-source | saveIntent | 37941 | 98970 |
default-visual | saveIntent | 31076 | 79918 |
default-source | saveSuccess | 28372 | 78882 |
default-visual | saveSuccess | 24132 | 66269 |
Confirmed. There is an imbalance across all actions as well ranging from about 51% to 54%.
Potential areas to investigate further: load time issue? Can we confirm that both buckets assigned on the server side? Are they assigned at the same time an init event is recorded on the server? If for some reason, wikitext bucket is assigned on the server side and visual editor on the client side, than that delay might be why we are seeing fewing people in the visual editor buckets.