The New Discussion Tool was deployed as an opt-in beta feature to all logged-in users to improve contributors' workflows for starting new discussion threads on talk pages, across Wikipedia's 16 talk namespaces. See the project page for more details.
Deployment dates:
The purpose of this analysis is to understand how people are engaging with the New Discussion Tool beta feature to help us determine whether the New Discussion Tool is ready to be made available to all people by default at some sub-set of wikis. This analysis is intended to help us answer these questions:
Data for this analysis comes from a combination of the following sources:
For this analysis, we reviewed events logged from the data of deployment as a beta feature (18 February 2021) through the end of July (31 July 2021). For each metric, we calculated metrics for overall (across all Wikimedia projects), by experience level (users cumulative edit count), and by the specific Wikipedias we are considering opt-out deployments (Arabic and Czech Wikipedia).
library(IRdisplay)
display_html(
'<script>
code_show=true;
function code_toggle() {
if (code_show){
$(\'div.input\').hide();
} else {
$(\'div.input\').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()">
<input type="submit" value="Click here to toggle on/off the raw code.">
</form>'
)
# load required packages
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
library(tidyverse); library(glue); library(lubridate); library(scales)
})
Purpose: Do people using the New Discussion Tool find it disruptive?
We reviewed how many new discussion tool users explicitly[^1]turned off the feature after making at least one edit.
Data Desciption and Assumptions:
event.property = 'discussiontools-betaenable'
) that allows a user to explicitly turn on or off all discussion tool beta features. This includes both the reply tool and new discussion tool - these features are not turned off individually.[^1]: "Explicitly" turned on indicates users did not have the Automatically enable all new beta features preference checked. Note explicilty turned off could include users that were auto enrolled and then turned off the feature.
query <- "
--find users that opted out of the discussiontool beta feature
WITH opt_out_users AS (
SELECT
event.userid as opt_out_user,
wiki as opt_out_wiki,
min(event.saveTimestamp) as opt_out_time,
sum(cast(event.value = '\"0\"' as int)) as opt_outs
FROM
event.prefupdate
WHERE
event.property = 'discussiontools-betaenable' AND
event.value = '\"0\"' AND
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) >= '2021-05-18' AND
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) <= '2021-07-31'
GROUP BY
event.userid,
wiki
),
-- find users that made at least one edit with the new discussion tool
new_topic_users AS (
SELECT
event_user_id as new_topic_user,
wiki_db as new_topic_wiki,
min(mh.event_timestamp) as first_post,
CASE
WHEN min(event_user_revision_count) < 100 THEN 'under 100'
WHEN (min(event_user_revision_count) >=100 AND min(event_user_revision_count <= 500)) THEN '100-500'
ELSE 'over 500'
END AS edit_count_group,
min(event_user_revision_count)AS edit_count
FROM wmf.mediawiki_history AS mh
WHERE
ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic')
AND snapshot = '2021-07'
-- date of first deployment
AND event_timestamp >= '2021-02-18'
AND event_timestamp <= '2021-07-31'
-- only on desktop
AND NOT array_contains(revision_tags, 'iOS')
AND NOT array_contains(revision_tags, 'Android')
AND NOT array_contains(revision_tags, 'Mobile Web')
-- find all edits on talk pages
AND page_namespace_historical % 2 = 1
AND event_entity = 'revision' AND
event_type = 'create'
AND event_user_is_anonymous = FALSE
GROUP BY
event_user_id,
wiki_db
)
-- Main Query --
SELECT
new_topic_wiki AS wiki,
edit_count AS edit_count,
edit_count_group AS edit_count_group,
--find opt out users that opted out following new discussion tool post
SUM(CAST(opt_out_user IS NOT NULL AND first_post < opt_out_time AS INT)) AS opt_out_users,
SUM(CAST(new_topic_user IS NOT NULL AS int)) AS new_topic_contributor
FROM (
SELECT
new_topic_users.first_post,
new_topic_users.new_topic_user,
opt_out_users.opt_out_time,
new_topic_users.new_topic_wiki,
opt_out_users.opt_out_user,
new_topic_users.edit_count,
new_topic_users.edit_count_group
FROM new_topic_users
LEFT JOIN opt_out_users ON
new_topic_users.new_topic_user = opt_out_users.opt_out_user AND
new_topic_users.new_topic_wiki = opt_out_users.opt_out_wiki
WHERE
opt_out_users.opt_outs IS NULL OR
opt_out_users.opt_outs = 1
) sessions
GROUP BY
sessions.new_topic_wiki,
sessions.edit_count,
sessions.edit_count_group
"
opt_out_contributors <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
write_csv(opt_out_contributors, "Data/opt_out_contributors.csv")
opt_out_contributors_overall <- opt_out_contributors %>%
summarise(opt_out_users = sum(opt_out_users),
new_topic_contributors = sum(new_topic_contributor),
pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), "%")
)
opt_out_contributors_overall
opt_out_users | new_topic_contributors | pct_opt_out |
---|---|---|
<int> | <int> | <chr> |
427 | 5133 | 8.32% |
opt_out_contributors_byexp <- opt_out_contributors %>%
group_by(edit_count_group) %>%
summarise(opt_out_users = sum(opt_out_users),
new_topic_contributors = sum(new_topic_contributor),
pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), "%")
)
opt_out_contributors_byexp
`summarise()` ungrouping output (override with `.groups` argument)
edit_count_group | opt_out_users | new_topic_contributors | pct_opt_out |
---|---|---|---|
<chr> | <int> | <int> | <chr> |
100-500 | 43 | 681 | 6.31% |
over 500 | 237 | 3427 | 6.92% |
under 100 | 147 | 1025 | 14.34% |
Opt-out rates for all three groups are fairly low (below 15%) with Junior Contributors (editors with under 100 edits) having the highest opt-out rate.
Since the higher opt-out rate for junior contributors is somewhat unexpected, we further broke down the under 100 edit count group into smaller edit count groups (e.g 0-10 edits, 10-20 edits, 30-40 edits, etc) and reviewed the wikis with this highest Junior Contributors Opt-out rate. This was done to identify if the higher opt-out rate for Junior Contributors was due to a specific edit count group or wiki.
By Junior Contributor Experience Level
# Divide edit counts into groups
b <- c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
names <- c( '0-10 edits', '11-20 edits', '21-30 edits', '31-40 edits',
'41-50 edits', '51-60 edits', '61-70 edits', '71-80 edits', '81-90 edits', '91-100 edits')
jc_opt_out_contributors_byexp <- opt_out_contributors %>%
filter(edit_count <= 100) %>% # only review Junior Contributors
mutate(edit_count = cut(edit_count, breaks = b, labels = names)) %>%
group_by(edit_count) %>%
summarise(opt_out_users = sum(opt_out_users),
new_topic_contributors = sum(new_topic_contributor),
pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), "%")
)
jc_opt_out_contributors_byexp
`summarise()` ungrouping output (override with `.groups` argument)
edit_count | opt_out_users | new_topic_contributors | pct_opt_out |
---|---|---|---|
<fct> | <int> | <int> | <chr> |
0-10 edits | 53 | 320 | 16.56% |
11-20 edits | 29 | 190 | 15.26% |
21-30 edits | 14 | 107 | 13.08% |
31-40 edits | 10 | 97 | 10.31% |
41-50 edits | 16 | 82 | 19.51% |
51-60 edits | 10 | 73 | 13.7% |
61-70 edits | 7 | 77 | 9.09% |
71-80 edits | 4 | 70 | 5.71% |
81-90 edits | 12 | 62 | 19.35% |
91-100 edits | 3 | 44 | 6.82% |
Most all the Junior Contributor edit groups have around the same opt-out rate identified for all contributors with under 100 edits (~15%). There are slightly higher opt-out rates for contributors with under 50 edits but there is does not appear to be a specific group that contributed to the higher opt-out rate.
Wikis with the highest Junior Contributor Opt-Out Rate
jc_opt_out_contributors_bywiki <- opt_out_contributors %>%
filter(edit_count_group == 'under 100') %>% # only review Junior Contributors
group_by(wiki) %>%
summarise(opt_out_users = sum(opt_out_users),
new_topic_contributors = sum(new_topic_contributor),
pct_opt_out = round(opt_out_users/new_topic_contributors * 100, 2)
) %>%
filter(new_topic_contributors > 1) %>% # review wikis with more than 1 new topic contributor
arrange(desc(pct_opt_out))
head(jc_opt_out_contributors_bywiki, 20)
`summarise()` ungrouping output (override with `.groups` argument)
wiki | opt_out_users | new_topic_contributors | pct_opt_out |
---|---|---|---|
<chr> | <int> | <int> | <dbl> |
bnwiki | 2 | 4 | 50.00 |
zhwikibooks | 1 | 2 | 50.00 |
arwiki | 5 | 11 | 45.45 |
trwiki | 7 | 17 | 41.18 |
simplewiki | 3 | 9 | 33.33 |
svwiki | 1 | 3 | 33.33 |
kowiki | 3 | 10 | 30.00 |
mediawikiwiki | 3 | 10 | 30.00 |
mswiki | 1 | 4 | 25.00 |
thwiki | 1 | 4 | 25.00 |
ruwiki | 1 | 5 | 20.00 |
enwiki | 59 | 316 | 18.67 |
fawiki | 7 | 38 | 18.42 |
commonswiki | 4 | 22 | 18.18 |
viwiki | 2 | 11 | 18.18 |
jawiki | 5 | 28 | 17.86 |
eswiki | 10 | 70 | 14.29 |
zhwiki | 4 | 36 | 11.11 |
ptwiki | 3 | 28 | 10.71 |
itwiki | 6 | 62 | 9.68 |
A review by wiki appears also does not reveal any surprising trends . The higher opt-out rates are for wikis with only a few new discussion tool users; as result these rates do not accurately represent the population.
The rates for larger wikis are around 15 to 18%, similar to the overall opt-out rate identifed for Junior Contributors.
Since we are only able to access user-specific opt-out data for the last 90 days, this higher opt-out rate for Junior Contributors is likely because Senior Contributors are more likely to have already accessed and decided to opt out of the tool prior to this 90 days.
opt_out_contributors_byexp <- opt_out_contributors %>%
filter(wiki %in% c('arwiki', 'cswiki')) %>%
group_by(wiki) %>%
summarise(opt_out_users = sum(opt_out_users),
new_topic_contributors = sum(new_topic_contributor),
pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), "%"),.groups = 'drop'
)
opt_out_contributors_byexp
wiki | opt_out_users | new_topic_contributors | pct_opt_out |
---|---|---|---|
<chr> | <int> | <int> | <chr> |
arwiki | 9 | 54 | 16.67% |
cswiki | 0 | 27 | 0% |
Data: Based on data recorded in the mediawiki user_properties table
While we are unable to access user-specific preference change events that occured prior to 90 days ago (18 May 2021) in PrefUpdate, I reviewed the user properties database to determine the numbers of new discussion tool contributors that currently have the discussiontools-betaenable
set to disabled.
Note: This data reflects just the current nondefault status of user preference and does not provide any details on if the user enabled and disabled the feature multiple times or when they disabled it in relation to their edit. Also, there are contributors that have used the new discussion tool but don't have a preference set in the user properties table, indicated as "no local preference recorded" in the results below. Possible reasons for this include: (1) the user disabled the setting by selecting 'restore all default preferences' in their user preferences or (2) the user enabled discussion tools in their global preferences but not in their local preferences.
Please see summary of results below and "New_discussion_tool_opt_out_analysis.ipynb" located in the code repository for further details of current user discussionvtool preferences using the mediawiki user_properties table.
Overall
Current Discussion Tool Preference Status | Percent of New Discussion Tool Contributors |
---|---|
no local preference recorded | 27.14% |
explicitly disabled | 6.05% |
explicitly enabled | 66.81% |
By Edit Count Group
edit_count_group | Current Discussion Tool Preference Status | Percent of New Discussion Tool Contributors |
---|---|---|
under 100 | no local preference recorded | 28.32% |
explicitly disabled | 4.05% | |
explicitly enabled | 67.63% | |
100-500 | no local preference recorded | 28.87% |
explicitly disabled | 3.1% | |
explicitly enabled | 68.03% | |
over 500 | no local preference recorded | 26.5% |
explicitly disabled | 7.2% | |
explicitly enabled | 66.3% |
Arabic and Czech Wikipedias
Wiki | Current Discussion Tool Preference Status | Percent of New Discussion Tool Contributors |
---|---|---|
arwiki | explicitly disabled | 4.84% |
explicitly enabled | 95.16% | |
cswiki | explicitly disabled | 3.33% |
explicitly enabled | 96.67% |
From 18 May 2021 through 31 July 2021, 8.32% of contributors that saved at least one new discussion tool edit explicitly opted out of the new discussion tool, indicating that most users of the tool do not find it disruptive. Junior contributors (users with under 100 edits) had the highest opt out rate (15.04%).
Furher investigation indicates that the higher opt-out rate identified for Junior Contributors is likely due to the reviewed timeframe used for the opt-out analysis. We only retain user-specific data on preference updates for 90 days in PrefUpdate due to privacy concerns. As a result, the opt-out analysis only reflects preference changes between 18 May 2021 through 31 July 2021. It's more likely that Senior Contributors have already accessed and decided to opt-out of the tool prior to this 90 days. A review of data logged in the user properties table shows a slightly lower opt-out rate for Junior Contributors compared to Senior Contributors and still reflects an overall low opt-out rate across all three edit count groups, indicating no significant sign of disruption.
No new discussion tool contributors have opted out of Czech Wikipedia. There was an 18.03% opt out rate (based on PrefUpdate data) for Arabic Wikipedia. However, each of these wikis had a limited number of contributors that made a new discussion tool edit (61 new discussion tool contributors on Arabic Wikipedia and only 29 on Czech Wikipedia) so this data may not be reflective of the population.
Purpose: Do people NOT using the New Discussion Tool find it disruptive? How does the level of disruption introduced by people using the New Discussion Tool compare to the level of disruption introduced by people using the current experience?
For this analysis, we reviewed data recorded in mediawiki_history to identify the percent comments posted by the reply tool (identified by the revision tag: discussiontools-newtopic
) on talk pages that are reverted within 48 hours [^revert].
[^revert]: 48 hours is a common cutoff, as research suggests that, at least for the English Wikipedia, nearly all reverts take place within 48 hours. Source: Research: Revert. Mediawiki. https://meta.wikimedia.org/wiki/Research:Revert.
We compared the revert rate for comments published using the new discussion tool to the revert rate for comments made using full page editing (the current editing experience) during the same timeframe. Note: In this analysis, page edits can include any edit made on a talk page not using a discussion tool. This can include both edits to start a new topic and edits to existing comments.
## collect all revert edits for new discussion tool and page editing
query <-
"SELECT
wiki_db AS wiki,
event_user_id AS user_id,
CASE
WHEN min(event_user_revision_count) < 100 THEN 'under 100'
WHEN (min(event_user_revision_count) >=100 AND min(event_user_revision_count <= 500)) THEN '100-500'
ELSE 'over 500'
END AS edit_count,
max(size(event_user_is_bot_by) > 0 or size(event_user_is_bot_by_historical) > 0) as bot_by_group,
IF(ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic'), 'new-discussion-tool', 'page-edit') AS editor_type,
SUM(CAST(
revision_is_identity_reverted AND
revision_seconds_to_identity_revert <= 172800 -- 48 hours
AS int)) AS num_reverts,
COUNT(*) as num_comments
FROM wmf.mediawiki_history
WHERE
snapshot = '2021-07'
-- exclude reply tool talk page edits
AND NOT (ARRAY_CONTAINS(revision_tags, 'discussiontools-reply'))
-- include only desktop edits
AND NOT array_contains(revision_tags, 'iOS')
AND NOT array_contains(revision_tags, 'Android')
AND NOT array_contains(revision_tags, 'Mobile Web')
-- find all edits on talk pages
AND page_namespace_historical % 2 = 1
AND event_entity = 'revision'
AND event_type = 'create'
-- date deployed
AND event_timestamp >= '2021-02-18'
AND event_timestamp <= '2021-07-31' -- allow two days to avoid data censoring
-- user is not anonymous
AND event_user_is_anonymous = FALSE
GROUP BY
wiki_db,
event_user_id,
IF(ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic'), 'new-discussion-tool', 'page-edit')
"
new_dt_reverts <- wmfdata::query_hive(query)
# reformat user-id and adjust to include wiki to account for duplicate user id instances.
# Users can have the smae user_id on different wikis
new_dt_reverts$user_id <-
as.character(paste(new_dt_reverts$user_id,new_dt_reverts$wiki,sep ="-" ))
# set factor levels
new_dt_reverts$editor_type <-
factor(
new_dt_reverts$editor_type,
levels = c("page-edit", "new-discussion-tool"),
labels = c("Page editing", "New Discussion Tool")
)
new_dt_reverts$edit_count <-
factor(new_dt_reverts$edit_count,
levels = c("under 100", "100-500", "over 500"))
# overall revert rate for dt and page edits
new_dt_reverts_byexp <- new_dt_reverts %>%
filter(bot_by_group == 'false') %>%
group_by(editor_type) %>%
summarise(total_reverts = sum(num_reverts),
total_comments = sum(num_comments),
revert_rate =paste(round(total_reverts/total_comments * 100, 2), '%'), .groups = 'drop')
new_dt_reverts_byexp
editor_type | total_reverts | total_comments | revert_rate |
---|---|---|---|
<fct> | <int> | <int> | <chr> |
Page editing | 133142 | 6021037 | 2.21 % |
New Discussion Tool | 1053 | 38249 | 2.75 % |
# wiki revert rate for dt and page edits
new_dt_reverts_byexp<- new_dt_reverts %>%
filter(bot_by_group == 'false') %>%
group_by(edit_count, editor_type) %>%
summarise(total_reverts = sum(num_reverts),
total_comments = sum(num_comments),
revert_rate =paste(round(total_reverts/total_comments * 100, 2), '%'), .groups = 'drop')
new_dt_reverts_byexp
edit_count | editor_type | total_reverts | total_comments | revert_rate |
---|---|---|---|---|
<fct> | <fct> | <int> | <int> | <chr> |
under 100 | Page editing | 57971 | 733391 | 7.9 % |
under 100 | New Discussion Tool | 170 | 2732 | 6.22 % |
100-500 | Page editing | 4602 | 107163 | 4.29 % |
100-500 | New Discussion Tool | 63 | 1481 | 4.25 % |
over 500 | Page editing | 70569 | 5180483 | 1.36 % |
over 500 | New Discussion Tool | 820 | 34036 | 2.41 % |
# revert rate for dt and page edits by experience level
new_dt_reverts_bywiki <- new_dt_reverts %>%
filter(bot_by_group == 'false',
wiki %in% c('arwiki', 'cswiki')) %>%
group_by(wiki, editor_type) %>%
summarise(total_reverts = sum(num_reverts),
total_comments = sum(num_comments),
revert_rate =paste(round(total_reverts/total_comments * 100, 2), '%'), .groups = 'drop')
new_dt_reverts_bywiki
wiki | editor_type | total_reverts | total_comments | revert_rate |
---|---|---|---|---|
<chr> | <fct> | <int> | <int> | <chr> |
arwiki | Page editing | 2162 | 53863 | 4.01 % |
arwiki | New Discussion Tool | 27 | 1053 | 2.56 % |
cswiki | Page editing | 262 | 19364 | 1.35 % |
cswiki | New Discussion Tool | 3 | 272 | 1.1 % |
Overall, the revert rate for the new discussion tool is only slightly higher than the revert rate for page editing on talk pages (2.75% for the new discussion tool compared to 2.21% for page editing.
However, by experience level, the revert rate for the new discussion tool is lower than page editing for Junior Contributors. For editors with under 100 cumulative edits, there was a -21.3% percent decrease the revert rate for editors using the new discussion tool.
The new discussion tool also had a lower revert rate on both Arabic and Czech Wikipedia compared to page editing on those Wikipedias.
We are also interested in understanding who has been using the new Discussion Tool and how much they have been using it.
For this analysis, we reviewed two metrics:
We first reviewed the percent of distinct contributors that publish at least one new topic with the new discussion tool out of all talk page contributors [^3].
[^3]: This includes anyone that has made at least one talk page edit (including posting new comments or sections or editing exiting comments) on any of talk namespaces during the reviewed time period.
# Collect users new topic edits by user over deployment time period and remove bots
# use mediawiki-history as it includes all saved edits at 100 percent sampling rate
query <- "
SELECT
to_date(event_timestamp) as `date`,
wiki_db AS wiki,
event_user_id AS `user`,
max(size(event_user_is_bot_by) > 0 or size(event_user_is_bot_by_historical) > 0) as bot_by_group,
CASE
WHEN min(event_user_revision_count) < 100 THEN 'under 100'
WHEN (min(event_user_revision_count) >=100 AND min(event_user_revision_count <= 500)) THEN '100-500'
ELSE 'over 500'
END AS edit_count,
SUM(CAST(ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic') AS INT)) AS new_topic_edits,
COUNT(*) AS all_talk_edits
FROM wmf.mediawiki_history
WHERE
snapshot = '2021-07'
-- include only desktop edits
AND NOT array_contains(revision_tags, 'iOS')
AND NOT array_contains(revision_tags, 'Android')
AND NOT array_contains(revision_tags, 'Mobile Web')
-- review all talk namespaces
AND page_namespace_historical % 2 = 1
-- date of first deployment
AND event_timestamp >= '2021-02-18'
AND event_timestamp <= '2021-07-31'
AND event_entity = 'revision'
AND event_type = 'create'
-- remove logged out users
AND event_user_is_anonymous = FALSE
GROUP BY
to_date(event_timestamp),
wiki_db,
event_user_id
"
discussion_tool_users <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
write_csv(discussion_tool_users, file = 'Data/discussion_tool_users.csv')
discussion_tool_users$date <- as.Date(discussion_tool_users$date, format = "%Y-%m-%d")
# reformat user-id and adjust to include wiki to account for duplicate user id instances.
discussion_tool_users$user <-
as.character(paste(discussion_tool_users$user, discussion_tool_users$wiki, sep ="-"))
# set discussion tool factor levels
discussion_tool_users$edit_count <-
factor(discussion_tool_users$edit_count,
levels = c("under 100", "100-500", "over 500"))
# overall numbers since deployment
new_discussion_contributors <- discussion_tool_users %>%
filter(bot_by_group == 'false') %>% # remove bots
summarise(new_discussion_users = n_distinct(user[new_topic_edits >= 1]) ,
new_discussion_edits = sum(new_topic_edits))
new_discussion_contributors
new_discussion_users | new_discussion_edits |
---|---|
<int> | <int> |
5388 | 38261 |
Since deployment as a beta feature on 18 February 2021, a total of 5,388 distinct users have posted at least one new topic using the new discussion tool. There have been a total of 38,261 edits using the new discussion tool.
To put these numbers into context, we reviewed the percent of contributors that edited a talk page and made at least 1 new topic using the new discussion tool during the reviewed time. Note: For this calculation, we only reviewed the time period when the new discussion tool was available to all wikis.
# pct talk page users
new_discussion_contributors_pct <- discussion_tool_users %>%
filter(bot_by_group == 'false',
date >= '2021-03-17') %>% #day of deployment to all wikis
summarise(new_discussion_contributors = n_distinct(user[new_topic_edits >= 1]),
all_talk_contributors = n_distinct(user),
pct_new_discussion_users = paste0(round(new_discussion_contributors/all_talk_contributors * 100, 2), '%')
)
new_discussion_contributors_pct
new_discussion_contributors | all_talk_contributors | pct_new_discussion_users |
---|---|---|
<int> | <int> | <chr> |
5185 | 207384 | 2.5% |
# pct talk page users by experience levels
new_discussion_contributors_pct_byexp <- discussion_tool_users %>%
filter(bot_by_group == 'false',
date >= '2021-03-17') %>% #day of deployment to all wikis
mutate(all_new_discussion_contributors = n_distinct(user[new_topic_edits >= 1])) %>%
group_by(edit_count) %>%
summarise(new_discussion_contributors = n_distinct(user[new_topic_edits >= 1]),
all_talk_contributors = n_distinct(user),
pct_new_discussion_contributors = paste0(round(new_discussion_contributors/all_talk_contributors *100, 2), '%'),.groups = 'drop'
) %>%
distinct()
new_discussion_contributors_pct_byexp
edit_count | new_discussion_contributors | all_talk_contributors | pct_new_discussion_contributors |
---|---|---|---|
<fct> | <int> | <int> | <chr> |
under 100 | 1052 | 140859 | 0.75% |
100-500 | 877 | 24444 | 3.59% |
over 500 | 3480 | 48565 | 7.17% |
new_discussion_contributors_pct_bywikis <- discussion_tool_users %>%
filter(bot_by_group == 'false',
wiki %in% c('arwiki', 'cswiki')) %>% #no date filter needed as it was deployed at these wikis since deployment date
group_by(wiki) %>%
summarise(new_discussion_contributors = n_distinct(user[new_topic_edits >= 1]),
all_talk_contributors = n_distinct(user),
pct_new_discussion_contributors = paste0(round(new_discussion_contributors/all_talk_contributors * 100, 2), '%'),.groups = 'drop'
)
new_discussion_contributors_pct_bywikis
wiki | new_discussion_contributors | all_talk_contributors | pct_new_discussion_contributors |
---|---|---|---|
<chr> | <int> | <int> | <chr> |
arwiki | 62 | 7081 | 0.88% |
cswiki | 30 | 1674 | 1.79% |
Overall, 2.5% of all talk page contributors have posted at least one new topic using the new discussion tool since March 17th (when available at all wikis as an opt-in beta feature) through the end of July.
Senior contributors are the more frequent users of the tool. 7.2% of users with over 500 edits that edited a talk page during the reviewed time period made an edit with the new discussion tool.
Usage of the new discussion tool on Arabic and Czech Wikipedias are somewhat low with only 0.88% of talk page editors on Arabic Wikipedia and 1.79% of all talk page editors on Czech Wikipedias making an edit with the new discussion tool.
For the analysis below, we also reviewed the percent of distinct contributors that publish at least one new topic with the new discussion tool but only reviewed contributors that created a new topic on talk page during the reviewed time period.
We used data EditAttemptStep for this analysis as it allows us distinguish edits to existing sections from edits associated with the creation of new sections.
query <-
"
SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) as `date`,
wiki AS wiki,
event.user_id AS `user`,
CASE
WHEN min(event.user_editcount) < 100 THEN 'under 100'
WHEN (min(event.user_editcount) >=100 AND min(event.user_editcount <= 500)) THEN '100-500'
ELSE 'over 500'
END AS edit_count,
-- new page section edits
SUM(CAST(event.integration = 'page' AND (event.init_mechanism = 'url-new' OR event.init_mechanism == 'new') AS INT)) AS page_edit,
-- new discussion tool edits
SUM(CAST(event.integration ='discussiontools' AS INT)) AS dt_edit
FROM event_sanitized.editattemptstep
WHERE
-- section edits
event.action = 'init'
AND event.init_type = 'section'
AND year = 2021
-- review events following deployment
AND dt >= '2021-02-18'
AND dt <= '2021-07-31'
-- review all talk namespaces
AND event.platform = 'desktop'
AND event.page_ns % 2 = 1
AND event.user_id != 0
GROUP BY
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),
wiki,
event.user_id
"
new_section_contributors <- wmfdata::query_hive(query)
new_section_contributors$date <- as.Date(new_section_contributors$date, format = "%Y-%m-%d")
# reformat user-id and adjust to include wiki to account for duplicate user id instances.
new_section_contributors$user <-
as.character(paste(new_section_contributors$user, new_section_contributors$wiki, sep ="-"))
# set edit count factor levels
new_section_contributors$edit_count <-
factor(new_section_contributors$edit_count,
levels = c("under 100", "100-500", "over 500"))
new_topic_edits <- new_section_contributors %>%
# date released to all wikis
filter(date >= '2021-03-17') %>%
summarize(page_editors = n_distinct(user[page_edit >= 1]),
dt_editor = n_distinct(user[dt_edit >=1]),
pct_dt_editors = paste0(round(dt_editor/(dt_editor + page_editors) * 100,2), '%')
)
new_topic_edits
page_editors | dt_editor | pct_dt_editors |
---|---|---|
<int> | <int> | <chr> |
19659 | 5688 | 22.44% |
new_topic_edits_byexperience <- new_section_contributors %>%
# date released to all wikis
filter(date >= '2021-03-17') %>%
group_by(edit_count) %>%
summarize(page_editors = n_distinct(user[page_edit >= 1]),
dt_editor = n_distinct(user[dt_edit >=1]),
pct_dt_editors = paste0(round(dt_editor/(dt_editor + page_editors) * 100,2), '%'),.groups = 'drop'
)
new_topic_edits_byexperience
edit_count | page_editors | dt_editor | pct_dt_editors |
---|---|---|---|
<fct> | <int> | <int> | <chr> |
under 100 | 10364 | 1459 | 12.34% |
100-500 | 1949 | 947 | 32.7% |
over 500 | 7595 | 3496 | 31.52% |
new_topic_edits_bywiki <- new_section_contributors %>%
# date released to all wikis
filter(wiki %in% c('arwiki', 'cswiki')) %>%
group_by(wiki) %>%
summarize(page_editors = n_distinct(user[page_edit >= 1]),
dt_editor = n_distinct(user[dt_edit >=1]),
pct_dt_editors = paste0(round(dt_editor/(dt_editor + page_editors) * 100,2), '%'),.groups = 'drop'
)
new_topic_edits_bywiki
wiki | page_editors | dt_editor | pct_dt_editors |
---|---|---|---|
<chr> | <int> | <int> | <chr> |
arwiki | 387 | 93 | 19.38% |
cswiki | 126 | 32 | 20.25% |
During the reviewed time period, 22.4% of all contributors that created a new topic on a talk page posted at least one new topic using the new discussion tool.
Senior contributors more commonly used the tool at least once to create a new topic compared to Junior Contributors. Almost half (46.5%) of contributors with over 100 edits that created a new topic on a talk page posted at least one of their new topics using the new discussion tool.
Similar to the noted proportion across all Wikipedias, 19.4% of Arabic contributors and 20.3% of Czech contributors that posted a new topic used the new discussion tool at least once.
Purpose: How much are they using it? This metric helps us understand how many times people chose to use the New Discussion Tool in relation to the number of opportunities they had to use it. For this analysis, we limited our review to contributors that had accesss and used the tool at least once.
[^4]: This metric has some slight noise as there could be cases where the following people end up looking the same in the data. Person A: added two new topics to talk pages in the reviewed timeframe, one of which was with the new discussion tool; Person B: made a total of 150 new topics to talk pages, 75 of which were with the New Discussion tool.
new_dt_contributors_1edit <- new_section_contributors %>%
filter(date >= '2021-03-17') %>%
summarise(one_time_editors = n_distinct(user[dt_edit ==1]),
all_editors = n_distinct(user[dt_edit >= 1]),
pct_1_dt_edit = paste0(round(one_time_editors/all_editors * 100, 2), "%") )
new_dt_contributors_1edit
one_time_editors | all_editors | pct_1_dt_edit |
---|---|---|
<int> | <int> | <chr> |
5166 | 5688 | 90.82% |
Most contributors (90.82%) that used the new discussion tool posted just one new topic with the tool during the reviewed timeframe.
#Divide new discussion tool edits into groups
b <- c(0, 25, 50, 75, 100)
names <- c('1-25 percent', '26-50 percent', '51-75 percent', '76-100 percent')
new_dt_contributors_prop <- new_section_contributors %>%
filter(date >= '2021-03-17') %>%
filter(dt_edit >= 1,
page_edit + dt_edit > 1) %>% # only editors that have posted at least 1 new topic with the tool and posted more than 1 new topic
group_by(user) %>%
summarise(dt_edit = sum(dt_edit),
page_edit = sum(page_edit),
pct_dt_edit = dt_edit/(dt_edit + page_edit) * 100,
new_discussion_edits_group = cut(pct_dt_edit, breaks = b, labels = names) ,.groups = 'drop'
)
# Breakdown of contributors by percent use
prop_new_dt_overall <- new_dt_contributors_prop %>%
group_by(new_discussion_edits_group ) %>%
summarise(n_users = n(),.groups = 'drop') %>%
mutate(pct_new_discussion_contributors = paste0(round(n_users/sum(n_users) * 100, 2), "%")
)
prop_new_dt_overall
new_discussion_edits_group | n_users | pct_new_discussion_contributors |
---|---|---|
<fct> | <int> | <chr> |
26-50 percent | 57 | 2.63% |
51-75 percent | 45 | 2.08% |
76-100 percent | 2065 | 95.29% |
new_dt_contributors_prop_exp <- new_section_contributors %>%
filter(date >= '2021-03-17') %>%
filter(dt_edit >= 1,
page_edit + dt_edit > 1) %>% # only editors that have posted at least 1 new topic with the tool and posted more than 1 new topic
group_by(user, edit_count) %>%
summarise(dt_edit = sum(dt_edit),
page_edit = sum(page_edit),
pct_dt_edit = dt_edit/(dt_edit + page_edit) * 100,
new_discussion_edits_group = cut(pct_dt_edit, breaks = b, labels = names),.groups = 'drop'
)
# Breakdown of contributors by percent use
prop_new_dt_byexperience <- new_dt_contributors_prop_exp %>%
group_by(edit_count, new_discussion_edits_group) %>%
summarise(n_users = n()) %>%
mutate(pct_new_discussion_contributors = paste0(round(n_users/sum(n_users) * 100, 2), "%")
)
prop_new_dt_byexperience
`summarise()` regrouping output by 'edit_count' (override with `.groups` argument)
edit_count | new_discussion_edits_group | n_users | pct_new_discussion_contributors |
---|---|---|---|
<fct> | <fct> | <int> | <chr> |
under 100 | 26-50 percent | 19 | 5.18% |
under 100 | 51-75 percent | 19 | 5.18% |
under 100 | 76-100 percent | 329 | 89.65% |
100-500 | 26-50 percent | 11 | 3.81% |
100-500 | 51-75 percent | 6 | 2.08% |
100-500 | 76-100 percent | 272 | 94.12% |
over 500 | 26-50 percent | 30 | 1.91% |
over 500 | 51-75 percent | 24 | 1.53% |
over 500 | 76-100 percent | 1518 | 96.56% |
new_dt_contributors_1edit_bywiki <- new_section_contributors %>%
filter(wiki %in% c('arwiki', 'cswiki')) %>%
group_by(edit_count) %>%
summarise(one_time_editors = n_distinct(user[dt_edit ==1]),
all_editors = n_distinct(user[dt_edit >= 1]),
pct_1_dt_edit = paste0(round(one_time_editors/all_editors * 100, 2), "%"),.groups = 'drop' )
new_dt_contributors_1edit_bywiki
edit_count | one_time_editors | all_editors | pct_1_dt_edit |
---|---|---|---|
<fct> | <int> | <int> | <chr> |
under 100 | 36 | 47 | 76.6% |
100-500 | 8 | 10 | 80% |
over 500 | 64 | 69 | 92.75% |
new_dt_contributors_prop_wiki <- new_section_contributors %>%
filter(dt_edit >= 1,
page_edit + dt_edit > 1,
wiki %in% c('arwiki', 'cswiki')) %>% # only editors that have posted at least 1 new topic with the tool and posted more than 1 new topic
group_by(user, wiki) %>%
summarise(dt_edit = sum(dt_edit),
page_edit = sum(page_edit),
pct_dt_edit = dt_edit/(dt_edit + page_edit) * 100,
new_discussion_edits_group = cut(pct_dt_edit, breaks = b, labels = names),.groups = 'drop'
)
# Breakdown of contributors by percent use
prop_new_dt_bywiki <- new_dt_contributors_prop_wiki %>%
group_by(wiki, new_discussion_edits_group ) %>%
summarise(n_users = n(),.groups = NULL) %>%
mutate(percent_new_dt_users = paste0(round(n_users/sum(n_users) * 100, 2), "%")
)
prop_new_dt_bywiki
`summarise()` regrouping output by 'wiki' (override with `.groups` argument)
wiki | new_discussion_edits_group | n_users | percent_new_dt_users |
---|---|---|---|
<chr> | <fct> | <int> | <chr> |
arwiki | 26-50 percent | 2 | 5.13% |
arwiki | 51-75 percent | 1 | 2.56% |
arwiki | 76-100 percent | 36 | 92.31% |
cswiki | 51-75 percent | 3 | 13.64% |
cswiki | 76-100 percent | 19 | 86.36% |
Most contributors (90.82%) that used the new discussion tool posted just one new topic with the tool during the reviewed timeframe. Of the contributors that posted more than one new topic on a talk page, 95.3% of these contributors posted between 75 to 100 percent of their new topics using the new discussion tool, indicating that these contributors chose to use the tool when presented with an opportunity to start a new topic.
For all three levels of editor experience, over 89% of all contributors that posted more than one new topic used the new discussion tool to make between 76-100 percent of their new topics. Senior contributors made the highest proprotion of their new topic edits using the new discussion tool (96.56% made between 76-100 percent of their new topic edits) compared to Junior Contributors (89.65% made between 76-100 percent of their new topic edits).
The majority of contributors on on Arabic and Czech Wikipedia also 76-100 percent of their new topic using the new discussion tool.