In T256100, we added a skin version field and new values for the inputLocation and extraParams fields to record data as part of the following two AB tests on deployed search changes:
These two A/B tests together support four possible configurations:
skinVersion | inputLocation | extraParams | Description |
---|---|---|---|
legacy | "header-navigation" | Vector Legacy skin with Legacy search (current master) | |
latest | "header-navigation" | Vector Latest skin with Legacy search (current master) | |
latest | "header-moved" | Vector Latest skin with Legacy search and latest location | |
latest | "header-moved" | "WVUI" | Vector Latest skin with Latest search and latest location |
The new location of the search functionarily was deployed to all projects on 28 September 2020. The new location is available by default for anonymous users on our early adopter wikis, and by preference for all other users.
We are performing an A/B test of the new location with logged-in users on our early adopter wikis. 50% of logged-in users are seeing the new experience, while the other 50% are seeing the old experience. The test only applies to users on modern vector.
The SearchSatisfaction Schema will be used to track events from these changes.
Note: Bucketing is done on a search session basis.
I checked the following scenarios:
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
library(tidyverse); library(glue); library(lubridate); library(scales)
})
# collect data from SearchSatisfaction for the relevant fields.
query <-
"SELECT
event.inputLocation AS search_location,
event.searchSessionId AS search_session,
event.skinVersion AS vector_version,
event.action AS action_type,
event.source AS source_type,
event.skin AS skin_type,
wiki AS wiki,
Count(*) AS events
FROM event.searchSatisfaction
-- review a few days prior to check when data started to come in
WHERE year = 2020 and ((month = 09 and day >= 28) OR month >= 10)
-- remove bots
AND useragent.is_bot = false
GROUP BY
event.inputLocation,
event.searchSessionId,
event.skinVersion,
event.action,
event.source,
event.skin,
wiki"
search_sessions <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
# overall
search_sessions_bylocation <-search_sessions %>%
# limit to search events completed in two new locations
filter(search_location %in% c('header-moved', 'header-navigation'),
filter) %>%
group_by(search_location) %>%
summarise(unique_sessions = n_distinct(search_session),
total_events = sum(events))
search_sessions_bylocation
`summarise()` ungrouping output (override with `.groups` argument)
search_location | unique_sessions | total_events |
---|---|---|
<chr> | <int> | <int> |
header-moved | 35570589 | 256259728 |
header-navigation | 38855815 | 276620657 |
Looking across all wikis, there should be a larger number of header-navigation search sessions and events as the new header location was only available by default for anonymous users on early adopter wikis and by preference for all ohter users.
While the data show a slightly larger number of header-navigation events, I would expect this difference to be much higher.
Update: Further investigation showed this was due to a bug that incorrectly logged header-navigation events as header-moved events on legacy skin. See further breakdowns below.
search_location_check <- search_sessions %>%
filter(search_location %in% c('header-navigation', 'header-moved')) %>%
group_by(search_location, vector_version, skin_type) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
search_location_check
`summarise()` regrouping output by 'search_location', 'vector_version' (override with `.groups` argument)
search_location | vector_version | skin_type | num_events | num_sessions |
---|---|---|---|---|
<chr> | <chr> | <chr> | <int> | <int> |
header-moved | latest | vector | 30677585 | 3823712 |
header-moved | legacy | vector | 225579407 | 31746694 |
header-moved | NULL | NULL | 2736 | 492 |
header-navigation | latest | vector | 3992199 | 576515 |
header-navigation | legacy | vector | 272625862 | 38278963 |
header-navigation | NULL | NULL | 2596 | 499 |
FAILED:
ISSUE: Confirmed that the two header location types are only being recorded for the vector skin type; however, there are a large number header-moved events being recorded with legacy, which is not expected. There are also some header location events being recorded without an associated vector_version or skin.
I will further review by wiki and day below to confirm when these events started occuring.
## collect all search events with new header recorded on legacy by date.
query <-
"SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) as date,
event.inputLocation AS search_location,
event.skin as skin_type,
event.skinVersion AS vector_version,
event.searchSessionId AS search_session,
wiki,
Count(*) as events
FROM event.searchSatisfaction
-- review a few days prior to check when data started to come in
WHERE year = 2020 AND ((month = 09 and day >= 20) OR month >= 10)
-- review autocomplete searches on legacy
AND event.action = 'searchResultPage'
AND event.source = 'autocomplete'
AND event.skinVersion = 'legacy'
AND event.skin = 'vector'
AND useragent.is_bot = false
GROUP BY
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),
event.inputLocation,
event.skin,
event.skinVersion,
event.searchSessionId,
wiki"
search_sessions_legacy <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
search_sessions_legacy$date <- as.Date(search_sessions_legacy$date, format = "%Y-%m-%d")
legacy_sessions_withnewsearchloc <- search_sessions_legacy %>%
filter(search_location == 'header-moved') %>%
group_by(date, vector_version) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session)) %>%
arrange(date)
legacy_sessions_withnewsearchloc
`summarise()` regrouping output by 'date' (override with `.groups` argument)
date | vector_version | num_events | num_sessions |
---|---|---|---|
<date> | <chr> | <int> | <int> |
2020-09-22 | legacy | 6 | 3 |
2020-09-23 | legacy | 39 | 8 |
2020-09-24 | legacy | 124 | 8 |
2020-09-25 | legacy | 10 | 3 |
2020-09-26 | legacy | 38 | 6 |
2020-09-27 | legacy | 59 | 6 |
2020-09-28 | legacy | 1077 | 246 |
2020-09-29 | legacy | 4001 | 843 |
2020-09-30 | legacy | 1820084 | 264887 |
2020-10-01 | legacy | 12434002 | 1831724 |
2020-10-02 | legacy | 14168856 | 2014952 |
2020-10-03 | legacy | 13081927 | 1800468 |
2020-10-04 | legacy | 14650240 | 1994717 |
2020-10-05 | legacy | 17662173 | 2457038 |
2020-10-06 | legacy | 18107765 | 2496282 |
2020-10-07 | legacy | 18173882 | 2508217 |
2020-10-08 | legacy | 17938919 | 2467371 |
2020-10-09 | legacy | 16807923 | 2322149 |
2020-10-10 | legacy | 14741838 | 1988172 |
2020-10-11 | legacy | 16162932 | 2164940 |
2020-10-12 | legacy | 19126714 | 2612897 |
2020-10-13 | legacy | 16117012 | 2281531 |
2020-10-14 | legacy | 5395902 | 901123 |
2020-10-15 | legacy | 3424183 | 596037 |
2020-10-16 | legacy | 1799897 | 327429 |
2020-10-17 | legacy | 984685 | 182818 |
2020-10-18 | legacy | 811087 | 154778 |
2020-10-19 | legacy | 829004 | 161731 |
2020-10-20 | legacy | 643590 | 126207 |
2020-10-21 | legacy | 418973 | 82357 |
2020-10-22 | legacy | 186056 | 39050 |
2020-10-23 | legacy | 35077 | 7978 |
2020-10-24 | legacy | 23123 | 5196 |
2020-10-25 | legacy | 19617 | 4450 |
2020-10-26 | legacy | 9896 | 2267 |
p <- search_sessions_legacy %>%
# compare header-moved to header-navigation dates
filter(search_location %in% c('header-moved', 'header-navigation')) %>%
group_by(date, search_location, vector_version) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session)) %>%
ggplot(aes(x=date, y= num_sessions, color = search_location)) +
geom_line(size = 1.5) +
scale_y_continuous() +
labs (y = "Number of autocomplete search sessions per day",
x = "Date",
title = "Daily search sessions by search bar location on legacy vector skin") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=16),
axis.text.x = element_text(angle=45, hjust=1))
p
`summarise()` regrouping output by 'date', 'search_location' (override with `.groups` argument)
ggsave("Figures/daily_legacy_search_events.png", p, width = 16, height = 8, units = "in", dpi = 300)
Header-moved events on legacy start recording around September 28th, the date of deployment. Unlike the latest skin, they appear to drop on Oct 14th and have continued to decrease since then.
Upon further investigation, it was determined that bug leading to new header location events being recorded on legacy was due to a bug in the intrusmentation that was fixed on October 13th. Since the fix, these events are still declining.
## collect all search header events that do not have an associated vector skin
# Check number of events for each to see if roughly even or number of sessions?
query <-
"SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) as date,
event.inputLocation AS search_location,
event.skinVersion as vector_version,
event.searchSessionId AS search_session,
wiki,
Count(*) as events
FROM event.searchSatisfaction
-- review a few days prior to check when data started to come in
WHERE year = 2020 AND ((month = 09 and day >= 20) OR month >= 10)
-- further investigate instance where no skin was recorded
AND event.skinVersion is NULL
AND event.skin IS NULL
AND event.action = 'searchResultPage'
AND useragent.is_bot = false
AND event.source = 'autocomplete'
GROUP BY
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),
event.inputLocation,
event.skinVersion,
event.searchSessionId,
wiki"
search_sessions_null <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
search_sessions_null$date <- as.Date(search_sessions_null$date, format = "%Y-%m-%d")
daily_search_sessions_null_events <-search_sessions_null %>%
filter(search_location %in% c('header-navigation', 'header-moved')) %>%
group_by(search_location, vector_version, date) %>%
summarise(num_sessions = n_distinct(search_session),
num_events = sum(events)) %>%
arrange(date)
daily_search_sessions_null_events
`summarise()` regrouping output by 'search_location', 'vector_version' (override with `.groups` argument)
search_location | vector_version | date | num_sessions | num_events |
---|---|---|---|---|
<chr> | <chr> | <date> | <int> | <int> |
header-navigation | NULL | 2020-09-23 | 1 | 1 |
header-navigation | NULL | 2020-09-24 | 23 | 107 |
header-navigation | NULL | 2020-09-25 | 119 | 630 |
header-navigation | NULL | 2020-09-26 | 123 | 704 |
header-navigation | NULL | 2020-09-27 | 104 | 574 |
header-navigation | NULL | 2020-09-28 | 120 | 624 |
header-navigation | NULL | 2020-09-29 | 90 | 453 |
header-moved | NULL | 2020-09-30 | 5 | 30 |
header-navigation | NULL | 2020-09-30 | 70 | 352 |
header-moved | NULL | 2020-10-01 | 39 | 220 |
header-navigation | NULL | 2020-10-01 | 40 | 182 |
header-moved | NULL | 2020-10-02 | 36 | 169 |
header-navigation | NULL | 2020-10-02 | 12 | 59 |
header-moved | NULL | 2020-10-03 | 43 | 268 |
header-navigation | NULL | 2020-10-03 | 8 | 36 |
header-moved | NULL | 2020-10-04 | 45 | 289 |
header-navigation | NULL | 2020-10-04 | 10 | 62 |
header-moved | NULL | 2020-10-05 | 40 | 190 |
header-navigation | NULL | 2020-10-05 | 9 | 59 |
header-moved | NULL | 2020-10-06 | 51 | 344 |
header-navigation | NULL | 2020-10-06 | 6 | 33 |
header-moved | NULL | 2020-10-07 | 34 | 195 |
header-navigation | NULL | 2020-10-07 | 3 | 13 |
header-moved | NULL | 2020-10-08 | 33 | 169 |
header-navigation | NULL | 2020-10-08 | 5 | 26 |
header-moved | NULL | 2020-10-09 | 27 | 159 |
header-moved | NULL | 2020-10-10 | 30 | 135 |
header-moved | NULL | 2020-10-11 | 30 | 166 |
header-moved | NULL | 2020-10-12 | 23 | 134 |
header-navigation | NULL | 2020-10-12 | 2 | 4 |
header-moved | NULL | 2020-10-13 | 21 | 100 |
header-navigation | NULL | 2020-10-13 | 2 | 16 |
header-moved | NULL | 2020-10-14 | 13 | 58 |
header-navigation | NULL | 2020-10-14 | 6 | 33 |
header-moved | NULL | 2020-10-15 | 5 | 16 |
header-navigation | NULL | 2020-10-15 | 11 | 48 |
header-moved | NULL | 2020-10-16 | 2 | 12 |
header-navigation | NULL | 2020-10-16 | 9 | 76 |
header-moved | NULL | 2020-10-17 | 2 | 32 |
header-navigation | NULL | 2020-10-17 | 12 | 59 |
header-moved | NULL | 2020-10-18 | 3 | 10 |
header-navigation | NULL | 2020-10-18 | 17 | 87 |
header-moved | NULL | 2020-10-19 | 4 | 18 |
header-navigation | NULL | 2020-10-19 | 11 | 50 |
header-moved | NULL | 2020-10-20 | 3 | 10 |
header-navigation | NULL | 2020-10-20 | 5 | 16 |
header-navigation | NULL | 2020-10-21 | 13 | 93 |
header-navigation | NULL | 2020-10-22 | 10 | 69 |
header-navigation | NULL | 2020-10-23 | 5 | 16 |
header-moved | NULL | 2020-10-24 | 1 | 1 |
header-navigation | NULL | 2020-10-24 | 9 | 45 |
header-moved | NULL | 2020-10-25 | 1 | 10 |
header-navigation | NULL | 2020-10-25 | 12 | 61 |
header-moved | NULL | 2020-10-26 | 1 | 1 |
header-navigation | NULL | 2020-10-26 | 5 | 24 |
# plot null sessions
options(repr.plot.width = 10, repr.plot.height = 7)
p <- search_sessions_null %>%
# remove content - advanced search page searches
filter(search_location %in% c('header-navigation', 'header-moved')) %>%
#filter(search_location != 'content') %>%
group_by(date, search_location) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session)) %>%
ggplot(aes(x=date, y= num_sessions, color = search_location)) +
geom_line(size = 1.5) +
scale_y_continuous() +
labs (y = "Number of autocomplete search sessions per day",
x = "Date",
title = "Daily search sessions by search bar location without associated vector skin") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=16),
axis.text.x = element_text(angle=45, hjust=1))
p
`summarise()` regrouping output by 'date' (override with `.groups` argument)
ggsave("Figures/daily_null_search_events.png", p, width = 16, height = 8, units = "in", dpi = 300)
Search events and sessions recorded as having a NULL vector skin version have been decreasing since deployment. Based on the trend lines show in chart below it looks like these NULL values are related to caching issues as the new instrumentation was deployed.
We're currently only recording less than 12 NULL sessions per day on each wiki and they seem to be decreasing still so I don't think this is an issue.
## collect all latest search header events by location and date
query <-
"SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) as date,
event.inputLocation AS search_location,
event.searchSessionId AS search_session,
wiki,
Count(*) as events
FROM event.searchSatisfaction
-- review a few days prior to check when data started to come in
WHERE year = 2020 AND ((month = 09 and day >= 20) OR month >= 10)
-- only deployed on modern skin vector
AND event.skinVersion = 'latest'
AND event.skin = 'vector'
AND useragent.is_bot = false
AND event.action = 'searchResultPage'
AND event.source = 'autocomplete'
GROUP BY
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),
event.inputLocation,
event.searchSessionId,
wiki"
search_sessions_daily <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
search_sessions_daily$date <- as.Date(search_sessions_daily$date, format = "%Y-%m-%d")
daily_search_sessions <-search_sessions_daily %>%
#remove content- advanced page events
filter(search_location != 'content'
# review events after bug fix to address header-moved events on legacy
) %>%
group_by(search_location, date) %>%
summarise(unique_sessions = n_distinct(search_session),
total_events = sum(events)) %>%
arrange(date)
daily_search_sessions
`summarise()` regrouping output by 'search_location' (override with `.groups` argument)
search_location | date | unique_sessions | total_events |
---|---|---|---|
<chr> | <date> | <int> | <int> |
header | 2020-09-20 | 118534 | 922751 |
header | 2020-09-21 | 143362 | 1092926 |
header | 2020-09-22 | 140899 | 1054336 |
header | 2020-09-23 | 141795 | 1078461 |
header-navigation | 2020-09-23 | 392 | 2904 |
header | 2020-09-24 | 120545 | 926857 |
header-navigation | 2020-09-24 | 22423 | 144811 |
header | 2020-09-25 | 43904 | 278737 |
header-navigation | 2020-09-25 | 96679 | 695054 |
header | 2020-09-26 | 21589 | 129302 |
header-navigation | 2020-09-26 | 95391 | 717155 |
header | 2020-09-27 | 9429 | 55990 |
header-navigation | 2020-09-27 | 119616 | 930422 |
header | 2020-09-28 | 3504 | 21154 |
header-moved | 2020-09-28 | 504 | 2953 |
header-navigation | 2020-09-28 | 140556 | 1062970 |
header | 2020-09-29 | 1986 | 11406 |
header-moved | 2020-09-29 | 1900 | 9595 |
header-navigation | 2020-09-29 | 137554 | 1031666 |
header | 2020-09-30 | 210 | 1119 |
header-moved | 2020-09-30 | 20099 | 141768 |
header-navigation | 2020-09-30 | 124920 | 922669 |
header | 2020-10-01 | 61 | 268 |
header-moved | 2020-10-01 | 118960 | 917933 |
header-navigation | 2020-10-01 | 50194 | 309986 |
header | 2020-10-02 | 30 | 139 |
header-moved | 2020-10-02 | 122555 | 952519 |
header-navigation | 2020-10-02 | 32252 | 186346 |
header | 2020-10-03 | 11 | 54 |
header-moved | 2020-10-03 | 114168 | 928295 |
⋮ | ⋮ | ⋮ | ⋮ |
header | 2020-10-16 | 6 | 57 |
header-moved | 2020-10-16 | 146000 | 1189450 |
header-navigation | 2020-10-16 | 1836 | 9199 |
header | 2020-10-17 | 3 | 12 |
header-moved | 2020-10-17 | 117510 | 931602 |
header-navigation | 2020-10-17 | 1933 | 9416 |
header | 2020-10-18 | 2 | 10 |
header-moved | 2020-10-18 | 139435 | 1125455 |
header-navigation | 2020-10-18 | 2028 | 11159 |
header | 2020-10-19 | 5 | 25 |
header-moved | 2020-10-19 | 166965 | 1300230 |
header-navigation | 2020-10-19 | 2078 | 9639 |
header | 2020-10-20 | 4 | 12 |
header-moved | 2020-10-20 | 167227 | 1312794 |
header-navigation | 2020-10-20 | 2154 | 11209 |
header | 2020-10-21 | 4 | 22 |
header-moved | 2020-10-21 | 164099 | 1286373 |
header-navigation | 2020-10-21 | 2199 | 11460 |
header-moved | 2020-10-22 | 160020 | 1243858 |
header-navigation | 2020-10-22 | 2142 | 10851 |
header-moved | 2020-10-23 | 147414 | 1169522 |
header-navigation | 2020-10-23 | 2037 | 11021 |
header | 2020-10-24 | 1 | 8 |
header-moved | 2020-10-24 | 127152 | 1032914 |
header-navigation | 2020-10-24 | 2063 | 10339 |
header-moved | 2020-10-25 | 145663 | 1166658 |
header-navigation | 2020-10-25 | 2206 | 11965 |
header | 2020-10-26 | 3 | 7 |
header-moved | 2020-10-26 | 80299 | 615795 |
header-navigation | 2020-10-26 | 1044 | 5784 |
p <-daily_search_sessions %>%
ggplot(aes(x=date, y= unique_sessions, color = search_location)) +
geom_line(size = 1.5) +
scale_y_continuous() +
labs (y = "Number of autocomplete search sessions per day",
x = "Date",
title = "Daily search sessions by search bar location on latest vector skin") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=16),
axis.text.x = element_text(angle=45, hjust=1))
p
ggsave("Figures/daily_latest_search_events.png", p, width = 16, height = 8, units = "in", dpi = 300)
The "header" location events are how the original search bar events were recorded prior to the instrumentation - these are events are dropping off as expected since it was replced by "header-navigation" events.
Header-moved(new search location) events start recording on the deployment date 28 September. Following deployment, there is a signficant drop-off of header-navigation events on the latest vector skin. The decrease seems signficant but a decrease was expected since the new header was deployed as opt-out for all logged out users on early adopter wikis and 50% of logged in users. An additional check of logged in vesus logged out users will help clarify.
action_type_check <- search_sessions %>%
filter(search_location %in% c('header-navigation', 'header-moved')) %>%
group_by(search_location, action_type, source_type) %>%
summarise(events = sum(events))
action_type_check
`summarise()` regrouping output by 'search_location', 'action_type' (override with `.groups` argument)
search_location | action_type | source_type | events |
---|---|---|---|
<chr> | <chr> | <chr> | <int> |
header-moved | searchResultPage | autocomplete | 256259728 |
header-navigation | searchResultPage | autocomplete | 276620657 |
PASSED:
The header-moved and header-navigation events are only recorded with action=SearchResultPage
and source='autocomplete'
events as expected.
test_wiki_counts <- search_sessions %>%
filter(search_location %in% c('header-navigation', 'header-moved'),
## review test wikis where deployed
wiki %in% c('euwiki', 'frwiki', 'hewiki', 'ptwikiversity', 'frwiktionary', 'fawiki')) %>%
group_by(search_location, wiki) %>%
summarise(events = sum(events),
sessions = n_distinct(search_session)) %>%
arrange(wiki)
test_wiki_counts
`summarise()` regrouping output by 'search_location' (override with `.groups` argument)
search_location | wiki | events | sessions |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
header-moved | euwiki | 325172 | 34312 |
header-navigation | euwiki | 42852 | 5415 |
header-moved | fawiki | 1199768 | 158652 |
header-navigation | fawiki | 158559 | 24731 |
header-moved | frwiki | 27472480 | 3317233 |
header-navigation | frwiki | 3713332 | 526448 |
header-moved | frwiktionary | 1480194 | 212714 |
header-navigation | frwiktionary | 209442 | 37002 |
header-moved | hewiki | 156401 | 92817 |
header-navigation | hewiki | 27082 | 15833 |
header-moved | ptwikiversity | 2672 | 420 |
header-navigation | ptwikiversity | 378 | 85 |
## Review non test wikis to see what the difference is
nontest_wiki_counts <- search_sessions %>%
filter(search_location %in% c('header-navigation', 'header-moved'),
## review some non-test wikis where deployed
wiki %in% c('enwiki', 'eswiki', 'ruwiki', 'zhwiki', 'dewiki')) %>%
group_by(search_location, wiki) %>%
summarise(events = sum(events),
sessions = n_distinct(search_session)) %>%
arrange(wiki)
nontest_wiki_counts
`summarise()` regrouping output by 'search_location' (override with `.groups` argument)
search_location | wiki | events | sessions |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
header-moved | dewiki | 28332468 | 3958386 |
header-navigation | dewiki | 31971809 | 4523118 |
header-moved | enwiki | 132251938 | 18201935 |
header-navigation | enwiki | 163202407 | 22462494 |
header-moved | eswiki | 12229252 | 1613443 |
header-navigation | eswiki | 13987626 | 1815518 |
header-moved | ruwiki | 11748048 | 1789305 |
header-navigation | ruwiki | 13711217 | 2070640 |
header-moved | zhwiki | 693042 | 301380 |
header-navigation | zhwiki | 856770 | 370056 |
PASSED
On test wikis, there are 4 to 5% more header-moved events than header-navigation events across both skin types. This difference is expected as on the test wikis the new header was also show as default 50% of all logged-in users and by default to all anonymous users.
On a sample of non-partner wikis, there are more header-navigation events and sessions compared to header-moved events. This is also expected as the new location is available as preference on these non-partner wikis. Note: This difference should be even higher but there was an issue in pre 13 October 2020 data of incorrectly logging header-moved events on legacy skin.
The isAnon field was added on 20 October 2020 to distinguish logged in and logged out users. I'll review events since deployment to confirm they are being recorded as expected.
## collect all search header events by location
# Check number of events for each to see if roughly even or number of sessions?
query <-
"SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) as date,
event.skinVersion AS vector_version,
event.skin AS skin_type,
event.inputLocation AS search_location,
event.isAnon AS is_anonymous,
event.searchSessionId AS search_session,
wiki,
Count(*) as events
FROM event.searchSatisfaction
-- review a few days prior to check when data started to come in
WHERE year = 2020 AND ((month = 10 and day >= 18))
-- review autocomlete actions
AND event.action = 'searchResultPage'
AND event.source = 'autocomplete'
-- remove bots
-- remove bots
AND useragent.is_bot = false
GROUP BY
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),
event.skinVersion,
event.skin,
event.inputLocation,
event.isAnon,
event.searchSessionId,
wiki"
search_sessions_byanon <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
search_sessions_byanon$date <- as.Date(search_sessions_byanon$date, format = "%Y-%m-%d")
# rename is anon column to clarify logged in status
search_sessions_byanon_clean <- search_sessions_byanon %>%
mutate(logged_in_status = case_when(
is_anonymous == 'NULL' ~ "NULL",
is_anonymous == 'false'~ "logged-in",
is_anonymous == 'true' ~ "logged-out"))
search_sessions_daily_byanon <- search_sessions_byanon_clean %>%
filter(is_anonymous != 'NULL',
skin_type == 'vector',
vector_version == 'latest',
search_location %in% c('header-moved', 'header-navigation')) %>%
group_by(date, search_location, logged_in_status) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session)) %>%
arrange(date)
head(search_sessions_daily_byanon, 10)
`summarise()` regrouping output by 'date', 'search_location' (override with `.groups` argument)
date | search_location | logged_in_status | num_events | num_sessions |
---|---|---|---|---|
<date> | <chr> | <chr> | <int> | <int> |
2020-10-20 | header-moved | logged-in | 10488 | 1819 |
2020-10-20 | header-moved | logged-out | 842856 | 104794 |
2020-10-20 | header-navigation | logged-in | 7561 | 1383 |
2020-10-20 | header-navigation | logged-out | 6 | 2 |
2020-10-21 | header-moved | logged-in | 13966 | 2790 |
2020-10-21 | header-moved | logged-out | 1268413 | 160707 |
2020-10-21 | header-navigation | logged-in | 11430 | 2191 |
2020-10-21 | header-navigation | logged-out | 3 | 1 |
2020-10-22 | header-moved | logged-in | 13864 | 2693 |
2020-10-22 | header-moved | logged-out | 1228378 | 157102 |
# Plot logged in and logged out events by search location
options(repr.plot.width = 10, repr.plot.height = 7)
p <-search_sessions_daily_byanon %>%
ggplot(aes(x=date, y= num_sessions, color = search_location)) +
geom_line(size = 1.5) +
scale_y_continuous() +
facet_wrap(~logged_in_status, scales = "free_y") +
labs (y = "Number of autocomplete search sessions per day",
x = "Date",
title = "Daily search sessions by search bar location on latest vector skin") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=16),
axis.text.x = element_text(angle=45, hjust=1))
p
PASSED: Logged in events start appearing on 20 October, the date of deployment.
The differences between search location sessions for logged out vs logged in users appears as expected. Most logged-out sessions on the latest vector skin are header-moved events, as the new location is available by default for anonymous users on our early adopter wikis.
The new search location was deployed to 50% of logged-in users on early adopter wikis and only preference by other, which is why there is a more event split of sessions for logged-in users.
### Confirm number of events and sessions for two new search locations
search_sessions_bysearchlocation_byanon <- search_sessions_byanon_clean %>%
# Remove null events and only review vector skin events
filter(is_anonymous != 'NULL',
skin_type == 'vector',
search_location %in% c('header-moved', 'header-navigation')) %>%
group_by(vector_version, search_location, logged_in_status ) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
search_sessions_bysearchlocation_byanon
`summarise()` regrouping output by 'vector_version', 'search_location' (override with `.groups` argument)
vector_version | search_location | logged_in_status | num_events | num_sessions |
---|---|---|---|---|
<chr> | <chr> | <chr> | <int> | <int> |
latest | header-moved | logged-in | 90586 | 16836 |
latest | header-moved | logged-out | 7376053 | 925453 |
latest | header-navigation | logged-in | 69639 | 13154 |
latest | header-navigation | logged-out | 136 | 22 |
legacy | header-moved | logged-in | 33 | 2 |
legacy | header-moved | logged-out | 1048688 | 208045 |
legacy | header-navigation | logged-in | 2697490 | 505192 |
legacy | header-navigation | logged-out | 104985732 | 14152130 |
PASSED: Logged in status recorded for both vector versions (latest and legacy) and both search locations.
The numbers appears as expected.
search_sessions_byskintype_byanon <- search_sessions_byanon %>%
filter(is_anonymous != 'NULL') %>%
group_by(skin_type, vector_version, is_anonymous ) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
search_sessions_byskintype_byanon
`summarise()` regrouping output by 'skin_type', 'vector_version' (override with `.groups` argument)
skin_type | vector_version | is_anonymous | num_events | num_sessions |
---|---|---|---|---|
<chr> | <chr> | <chr> | <int> | <int> |
cologneblue | NULL | false | 1048 | 130 |
modern | NULL | false | 27117 | 4949 |
modern | NULL | true | 35 | 4 |
monobook | NULL | false | 230259 | 42060 |
monobook | NULL | true | 262 | 44 |
timeless | NULL | false | 28029 | 5462 |
timeless | NULL | true | 91 | 19 |
vector | latest | false | 181311 | 30707 |
vector | latest | true | 8293233 | 942578 |
vector | legacy | false | 3081016 | 517489 |
vector | legacy | true | 117898901 | 14550938 |
PASSED: Confirmed that the isAnon field is being logged for multiple skin types. The number of events and sessions seem as expected.
search_sessions_byskintype_byanon <- search_sessions_byanon_clean %>%
filter(is_anonymous != 'NULL',
wiki %in% c('euwiki', 'frwiki', 'hewiki', 'ptwikiversity', 'frwiktionary', 'fawiki')) %>%
group_by(wiki, logged_in_status ) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
search_sessions_byskintype_byanon
`summarise()` regrouping output by 'wiki' (override with `.groups` argument)
wiki | logged_in_status | num_events | num_sessions |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
euwiki | logged-in | 3538 | 674 |
euwiki | logged-out | 102687 | 8607 |
fawiki | logged-in | 19891 | 3195 |
fawiki | logged-out | 349904 | 38379 |
frwiki | logged-in | 180096 | 31216 |
frwiki | logged-out | 7320373 | 809328 |
frwiktionary | logged-in | 14569 | 2229 |
frwiktionary | logged-out | 396035 | 53823 |
hewiki | logged-in | 8499 | 2339 |
hewiki | logged-out | 123024 | 32310 |
ptwikiversity | logged-in | 16 | 7 |
ptwikiversity | logged-out | 1082 | 113 |
PASSED
Sessions by logged in status appear as expected for all the wikis. There are 11 to 24% more logged out sessions compared to logged-in sessions on each partner wiki.
We ran a check to confirm that it was deployed to 50% of users (using the modern vector skin) on each wiki as part of the AB test. Bucketing is done on a search session basis so we can review the number of distinct sessions.
During QA, we realized that there was no way to distinguish users in the AB test. An isAnon field was added to the schema on 20 October 2020.
# Review AB events to confirm the buckets are balanced
query <-
"
SELECT
COUNT (DISTINCT(event.searchSessionId) AS search_session,
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0') as date,
event.inputLocation AS search_location,
event.isAnon AS is_anonymous,
wiki
FROM event.searchSatisfaction
-- ab test restarted on Oct 20th when isAnon field was added
WHERE year = 2020 AND ((month = 10 and day >= 20))
-- review autocomplete actions
AND event.action = 'searchResultPage'
AND event.source = 'autocomplete'
AND event.inputLocation IN ('header-moved', 'header-navigation')
-- review test wikis
AND wiki IN ('euwiki', 'frwiki', 'hewiki', 'ptwikiversity', 'frwiktionary', 'fawiki')
-- deployed on on the new vector skin
AND event.skinVersion = 'latest'
AND event.skin = 'vector'
-- remove bots
AND useragent.is_bot = false
GROUP BY
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),
event.inputLocation,
event.isAnon,
event.searchSessionId,
wiki"
search_sessions_ab <- wmfdata::query_hive(query)
head(search_sessions_ab)
date | search_location | is_anonymous | search_session | wiki | events | |
---|---|---|---|---|---|---|
<chr> | <chr> | <chr> | <chr> | <chr> | <int> | |
1 | 2020-10-20 | header-moved | NULL | 0095a915b3a1d213c379kghu3f7f | frwiki | 5 |
2 | 2020-10-20 | header-moved | NULL | 01fb82cbe1d905573c0fkgh7luar | frwiktionary | 4 |
3 | 2020-10-20 | header-moved | NULL | 04a0f9c79659f303475ckghlgphd | frwiki | 3 |
4 | 2020-10-20 | header-moved | NULL | 0927dca0e97d2ce502bakghrezns | frwiki | 6 |
5 | 2020-10-20 | header-moved | NULL | 09516839b31bce7bc634kghxdbua | frwiki | 2 |
6 | 2020-10-20 | header-moved | NULL | 0b0c97100c563049ccb0kgh8tgh2 | frwiki | 8 |
search_sessions_ab$date <- as.Date(search_sessions_ab$date, format = "%Y-%m-%d")
# rename is anon column to clarify logged in status
search_sessions_ab_clean <- search_sessions_ab %>%
mutate(logged_in_status = case_when(
is_anonymous == 'NULL' ~ "NULL",
is_anonymous == 'false'~ "logged-in",
is_anonymous == 'true' ~ "logged-out"))
search_sessions_ab_check <- search_sessions_ab_clean %>%
# review only logged in events on wikis
filter(logged_in_status == 'logged-in') %>%
group_by(search_location) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
`summarise()` ungrouping output (override with `.groups` argument)
search_sessions_ab_check
search_location | num_events | num_sessions |
---|---|---|
<chr> | <int> | <int> |
header-moved | 129098 | 22783 |
header-navigation | 133058 | 25205 |
### Distinct Sessions for all users in AB test Per Wiki
search_sessions_ab_check_bywiki <- search_sessions_ab_clean %>%
# review only logged in events on wikis
filter(logged_in_status == 'logged-in') %>%
group_by(search_location, wiki) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session)) %>%
arrange(wiki)
search_sessions_ab_check_bywiki
`summarise()` regrouping output by 'search_location' (override with `.groups` argument)
search_location | wiki | num_events | num_sessions |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
header-moved | euwiki | 3422 | 694 |
header-navigation | euwiki | 2502 | 503 |
header-moved | fawiki | 12314 | 2031 |
header-navigation | fawiki | 11434 | 1906 |
header-moved | frwiki | 101237 | 17690 |
header-navigation | frwiki | 109061 | 20390 |
header-moved | frwiktionary | 10429 | 1498 |
header-navigation | frwiktionary | 7974 | 1330 |
header-moved | hewiki | 1659 | 859 |
header-navigation | hewiki | 2055 | 1069 |
header-moved | ptwikiversity | 37 | 11 |
header-navigation | ptwikiversity | 32 | 7 |
The current difference is within the probable range of a random 50/50 split and the buckets look balanced. Note: There are very few header-moved and header-navigation events recorded for pwikiversity. As a result, it may be difficult to determine any impact of the header move on search sessions initiated during the AB test analysis.
The new search widget was deployed on the test wiki on 17 September 2020 T259798.
I first reviewed events logged on the test wiki to ensure they were recorded as expected prior to the search widget being deployed to the pilot wikis.
Scenarios Tested:
# collect data from SearchSatisfaction for the relevant fields.
query <-
"SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) as search_dt,
event.inputLocation AS search_location,
event.searchSessionId AS search_session,
event.skinVersion AS vector_version,
event.action AS action_type,
event.source AS source_type,
event.skin AS skin_type,
event.isAnon AS is_anonymous,
event.extraParams AS search_type,
wiki AS wiki,
Count(*) AS events
FROM event.searchSatisfaction
WHERE year = 2021
AND month = 02
AND day >= 17
-- remove bots
AND useragent.is_bot = false
AND wiki IN ('testwiki', 'test2wiki')
GROUP BY
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),
event.inputLocation,
event.searchSessionId,
event.skinVersion,
event.action,
event.source,
event.skin,
event.isAnon,
event.extraParams,
wiki"
search_widget_sessions <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
search_widget_sessions$search_dt <- as.Date(search_widget_sessions$search_dt)
search_sessions_bydate <- search_widget_sessions %>%
#find new search widget events
filter(action_type == 'searchResultPage',
source_type == 'autocomplete',
search_type =="WVUI") %>%
group_by(search_dt) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session)) %>%
arrange(search_dt)
search_sessions_bydate
`summarise()` ungrouping output (override with `.groups` argument)
search_dt | num_events | num_sessions |
---|---|---|
<date> | <int> | <int> |
2021-02-17 | 19 | 4 |
2021-02-18 | 1 | 1 |
2021-02-22 | 1 | 1 |
A total of 6 new search widge search sessions initiated (as indicated by event.extraParams = 'WVUI'
) have been recorded to date on the testwiki as of 22 February 2020.
search_sessions_bysearchlocation <- search_widget_sessions %>%
# review only autcomplete sessions
filter(action_type == 'searchResultPage',
source_type == 'autocomplete',
skin_type == 'vector') %>%
group_by(search_type, vector_version, search_location) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
search_sessions_bysearchlocation
`summarise()` regrouping output by 'search_type', 'vector_version' (override with `.groups` argument)
search_type | vector_version | search_location | num_events | num_sessions |
---|---|---|---|---|
<chr> | <chr> | <chr> | <int> | <int> |
NULL | latest | content | 16 | 3 |
NULL | latest | header-moved | 30 | 13 |
NULL | legacy | content | 8 | 3 |
NULL | legacy | header-navigation | 85 | 24 |
WVUI | latest | header-moved | 21 | 6 |
The new search widget events are only logged for the new search location (event.inputLocation = 'header-moved
) as expected. We are also logging old search widget events (as indicated by event.extraParams IS NULL
) for the new search location) as expected.
The old search location only has events recorded for the legacy vector version and does not include any new search widget events.
NOTE:
search_location = content
events are searches done one the advanced search page and not appplicable to the new search widget which is displayed on article pages.search_sessions_byskintype <- search_widget_sessions %>%
# review only autcomplete sessions.
filter(action_type == 'searchResultPage',
source_type == 'autocomplete') %>%
group_by(search_type, skin_type, vector_version) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
search_sessions_byskintype
`summarise()` regrouping output by 'search_type', 'skin_type' (override with `.groups` argument)
search_type | skin_type | vector_version | num_events | num_sessions |
---|---|---|---|---|
<chr> | <chr> | <chr> | <int> | <int> |
NULL | monobook | NULL | 34 | 8 |
NULL | timeless | NULL | 10 | 3 |
NULL | vector | latest | 46 | 15 |
NULL | vector | legacy | 93 | 24 |
WVUI | vector | latest | 21 | 6 |
Confirmed that new search widget events are only logged for the latest vector skin.
search_sessions_byactiontype <- search_widget_sessions %>%
# review only autcomplete sessions.
filter(search_type == 'WVUI') %>%
group_by(search_type, source_type, action_type) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
search_sessions_byactiontype
`summarise()` regrouping output by 'search_type', 'source_type' (override with `.groups` argument)
search_type | source_type | action_type | num_events | num_sessions |
---|---|---|---|---|
<chr> | <chr> | <chr> | <int> | <int> |
WVUI | autocomplete | click | 2 | 2 |
WVUI | autocomplete | searchResultPage | 21 | 6 |
The new search widget events are recorded for autocomplete search sessions intiated as expected. There were also 2 click events that were logged as coming from the new search widget.
Note: Click events include the event.extraParams field to indicate if the new search widget was used but do not include event.inputLocation field to indicate the search location; however, we can use the sessionid to determine which search location sessions included a click to a result.
search_sessions_byanon_newloc <- search_widget_sessions %>%
## review search actions in new search location
filter(action_type == 'searchResultPage',
search_location == 'header-moved') %>%
#search_type == 'WVUI') %>%
group_by(search_type, is_anonymous) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
search_sessions_byanon_newloc
`summarise()` regrouping output by 'search_type' (override with `.groups` argument)
search_type | is_anonymous | num_events | num_sessions |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
NULL | false | 30 | 13 |
WVUI | false | 14 | 1 |
WVUI | true | 7 | 5 |
search_sessions_byanon_newsearch <- search_widget_sessions %>%
# ## review search actions using new search widget
filter( action_type == 'searchResultPage',
search_location %in% c('header-moved', 'header-navigation')) %>%
group_by(search_location, is_anonymous, vector_version) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
search_sessions_byanon_newsearch
`summarise()` regrouping output by 'search_location', 'is_anonymous' (override with `.groups` argument)
search_location | is_anonymous | vector_version | num_events | num_sessions |
---|---|---|---|---|
<chr> | <chr> | <chr> | <int> | <int> |
header-moved | false | latest | 44 | 14 |
header-moved | true | latest | 7 | 5 |
header-navigation | false | legacy | 80 | 23 |
header-navigation | true | legacy | 5 | 1 |
On the latest vector dekstop version, search sessions for logged-out are only recorded for the new search location and the new search widget as expected. Logged-in users see either the new search location and new search widget or the new search location and the old search widget depending on which AB treatment they receive.
New search widget events have been recorded for both logged in (1 session) and logged out (5 sessions) users.
search_sessions_ab_test <- search_widget_sessions %>%
# review search sessions initiated that meet AB bucketing criteria
filter(
search_location %in% c('header-moved', 'header-navigation'),
is_anonymous == 'false',
source_type == 'autocomplete',
action_type == 'searchResultPage',
vector_version == 'latest',
skin_type == 'vector') %>%
group_by(search_type, search_location) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
search_sessions_ab_test
`summarise()` regrouping output by 'search_type' (override with `.groups` argument)
search_type | search_location | num_events | num_sessions |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
NULL | header-moved | 30 | 13 |
WVUI | header-moved | 14 | 1 |
There have not been sufficent number of events to confirm AB Bucketing but I've confirmed that the AB instrumentation is in place as needed.
# collect data from SearchSatisfaction for the relevant fields.
query <-
"SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) as search_dt,
event.inputLocation AS search_location,
event.searchSessionId AS search_session,
event.skinVersion AS vector_version,
event.action AS action_type,
event.source AS source_type,
event.isAnon AS is_anonymous,
event.extraParams AS search_type,
event.usereditbucket as edit_bucket,
wiki AS wiki,
Count(*) AS events
FROM event.searchSatisfaction
WHERE year = 2021
AND month = 02
-- change deployed on 24 Feb 2021
AND day >= 24
-- remove bots
AND useragent.is_bot = false
AND event.skin = 'vector'
GROUP BY
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),
event.inputLocation,
event.searchSessionId,
event.skinVersion,
event.action,
event.source,
event.isAnon,
event.extraParams,
event.usereditbucket,
wiki"
search_sessions_weditbucket <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
editbucket_events_bywiki <- search_sessions_weditbucket %>%
filter(edit_bucket != 'NULL') %>%
group_by(wiki, edit_bucket) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
editbucket_events_bywiki
`summarise()` regrouping output by 'wiki' (override with `.groups` argument)
wiki | edit_bucket | num_events | num_sessions |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
mediawikiwiki | 0 edits | 1396 | 111 |
mediawikiwiki | 1-4 edits | 6 | 1 |
mediawikiwiki | 100-999 edits | 50 | 2 |
mediawikiwiki | 1000+ edits | 5 | 1 |
mediawikiwiki | 5-99 edits | 58 | 6 |
testwikidatawiki | 0 edits | 2 | 1 |
Search session with the edit bucket field data have been recorded on mediawiki (121 search sessions) and testwikidata (1 session) so far. We have events logged for all 5 edit buckts types and the number of events per edit bucket seem reasonable.
editbucket_events_byloggedin <- search_sessions_weditbucket %>%
filter(edit_bucket != 'NULL') %>%
group_by(wiki, is_anonymous, edit_bucket) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
editbucket_events_byloggedin
`summarise()` regrouping output by 'wiki', 'is_anonymous' (override with `.groups` argument)
wiki | is_anonymous | edit_bucket | num_events | num_sessions |
---|---|---|---|---|
<chr> | <chr> | <chr> | <int> | <int> |
mediawikiwiki | false | 0 edits | 75 | 14 |
mediawikiwiki | false | 1-4 edits | 6 | 1 |
mediawikiwiki | false | 100-999 edits | 50 | 2 |
mediawikiwiki | false | 1000+ edits | 5 | 1 |
mediawikiwiki | false | 5-99 edits | 58 | 6 |
mediawikiwiki | true | 0 edits | 1321 | 99 |
testwikidatawiki | true | 0 edits | 2 | 1 |
All logged out users are recorded as having 0 edits as expected.
editbucket_events_bysearchtype <- search_sessions_weditbucket %>%
# review search sessions initated with edit bucket recorded
filter(edit_bucket != 'NULL',
source_type == 'autocomplete',
action_type == 'searchResultPage') %>%
group_by(edit_bucket, search_location, vector_version, search_type) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session))
editbucket_events_bysearchtype
`summarise()` regrouping output by 'edit_bucket', 'search_location', 'vector_version' (override with `.groups` argument)
edit_bucket | search_location | vector_version | search_type | num_events | num_sessions |
---|---|---|---|---|---|
<chr> | <chr> | <chr> | <chr> | <int> | <int> |
0 edits | content | legacy | NULL | 266 | 22 |
0 edits | header-navigation | legacy | NULL | 432 | 76 |
1-4 edits | header-navigation | legacy | NULL | 5 | 1 |
100-999 edits | header-navigation | legacy | NULL | 18 | 2 |
1000+ edits | header-navigation | legacy | NULL | 2 | 1 |
5-99 edits | header-moved | latest | NULL | 1 | 1 |
5-99 edits | header-navigation | legacy | NULL | 30 | 5 |
Edit bucket events are recorded on both vector versions and both search location types. No new search widget events have been recorded since this feature has only been deployed on testwiki and we do not yet have any events there.
# collect data from SearchSatisfaction for the relevant fields.
query <-
"WITH search_sessions AS (
-- find search session start date
SELECT
min(CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0'))) as search_start,
event.searchSessionId AS search_session,
wiki AS wiki
FROM event.SearchSatisfaction
WHERE
Year = 2021
AND useragent.is_bot = false
AND event.skin = 'vector'
AND wiki IN ('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki',
'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' )
GROUP BY
event.searchSessionId,
wiki)
SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) as event_dt,
search_sessions.search_start,
event.inputLocation AS search_location,
event.searchSessionId AS search_session,
event.action AS action_type,
event.source AS source_type,
event.isAnon AS is_anonymous,
event.extraParams AS search_type,
event.usereditbucket as edit_bucket,
ss.wiki AS wiki,
Count(*) AS events
FROM event.searchSatisfaction ss
INNER JOIN search_sessions ON
event.searchSessionId = search_sessions.search_session AND
ss.wiki = search_sessions.wiki
WHERE year = 2021
-- Review following bug fix on March 10.
AND MONTH >= 03
--deployed at 7:00 UTC
AND HOUR >= 19
-- change deployed on 24 Feb 2021
-- remove bots
AND useragent.is_bot = false
AND event.skin = 'vector'
AND ss.wiki IN ('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki',
'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' )
GROUP BY
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),
event.inputLocation,
event.searchSessionId,
event.action,
event.source,
event.isAnon,
event.extraParams,
event.usereditbucket,
ss.wiki,
search_sessions.search_start"
search_widget_sessions_pilotwikis <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
search_widget_sessions_pilotwikis$event_dt <- as.Date(search_widget_sessions_pilotwikis$event_dt)
search_widget_sessions_pilotwikis$search_start <- as.Date(search_widget_sessions_pilotwikis$search_start)
I reviewed search sessions initiated by pilot wiki to confirm that the split between each group is close to what would be expected with a 50/50 split.
Note: There was an initial issue with the instrumentation that resulted in a number of the new search widget sessions to be not be logged. This was fixed on March 10th. See https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/670459/
search_sessions_ab_test_pilot <- search_widget_sessions_pilotwikis %>%
# review search sessions initiated that meet AB bucketing criteria
filter(
search_location %in% c('header-moved', 'header-navigation'),
# exclude long running search sessionst that started prior to deployment
search_start >= '2021-03-10',
is_anonymous == 'false',
source_type == 'autocomplete',
action_type == 'searchResultPage') %>%
group_by(wiki,search_location, search_type) %>%
summarise(
num_sessions = n_distinct(search_session))
search_sessions_ab_test_pilot
`summarise()` regrouping output by 'wiki', 'search_location' (override with `.groups` argument)
wiki | search_location | search_type | num_sessions |
---|---|---|---|
<chr> | <chr> | <chr> | <int> |
bnwiki | header-moved | NULL | 58 |
bnwiki | header-moved | WVUI | 94 |
bnwiki | header-navigation | NULL | 209 |
dewikivoyage | header-moved | NULL | 10 |
dewikivoyage | header-moved | WVUI | 11 |
dewikivoyage | header-navigation | NULL | 74 |
euwiki | header-moved | NULL | 274 |
euwiki | header-moved | WVUI | 351 |
euwiki | header-navigation | NULL | 63 |
fawiki | header-moved | NULL | 645 |
fawiki | header-moved | WVUI | 1386 |
fawiki | header-navigation | NULL | 2067 |
frwiki | header-moved | NULL | 9447 |
frwiki | header-moved | WVUI | 13522 |
frwiki | header-navigation | NULL | 16094 |
frwiktionary | header-moved | NULL | 727 |
frwiktionary | header-moved | WVUI | 1193 |
frwiktionary | header-navigation | NULL | 619 |
hewiki | header-moved | NULL | 386 |
hewiki | header-moved | WVUI | 2142 |
hewiki | header-navigation | NULL | 1311 |
kowiki | header-moved | NULL | 199 |
kowiki | header-moved | WVUI | 403 |
kowiki | header-navigation | NULL | 794 |
ptwiki | header-moved | NULL | 1935 |
ptwiki | header-moved | WVUI | 3936 |
ptwiki | header-navigation | NULL | 10495 |
ptwikiversity | header-moved | NULL | 14 |
ptwikiversity | header-moved | WVUI | 11 |
ptwikiversity | header-navigation | NULL | 20 |
srwiki | header-moved | NULL | 209 |
srwiki | header-moved | WVUI | 517 |
srwiki | header-navigation | NULL | 1812 |
trwiki | header-moved | NULL | 682 |
trwiki | header-moved | WVUI | 1418 |
trwiki | header-navigation | NULL | 4529 |
vecwiki | header-moved | NULL | 7 |
vecwiki | header-moved | WVUI | 44 |
vecwiki | header-navigation | NULL | 159 |
# plot search location move trends
p <- search_widget_sessions_pilotwikis %>%
mutate(search_type = ifelse(search_type == 'NULL', "old search", "new search")) %>%
# review search sessions initiated that meet AB bucketing criteria
filter(
search_location %in% c('header-moved', 'header-navigation'),
is_anonymous == 'false',
source_type == 'autocomplete',
action_type == 'searchResultPage',
search_start >= '2021-03-10') %>%
group_by(event_dt, search_type) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session)) %>%
ggplot(aes(x=event_dt, y = num_sessions, color = search_type)) +
geom_line(size = 1.5) +
labs(y = "number of unique search sessions per day",
x = "Date",
title = "Daily Search Sessions Initiated by Logged-In Users on Pilot Wikis") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=16),
legend.position="bottom",
axis.text.x = element_text(angle=45, hjust=1))
p
ggsave("Figures/daily_search_sessions_by_type.png", p, width = 16, height = 8, units = "in", dpi = 300)
`summarise()` regrouping output by 'event_dt' (override with `.groups` argument)
Search sessions appear to be logging as expected following the deployment of the fix on 29 April 2021.
Some explanation of various decreases/increase: There's a decrease in new search widget session on 18 March when the regression occurred, a decrease in old search widget sessions when the AB test ended, and increase in new search widget on 29 April 2021 when the new fix was deployed.
# collect non-pilot search session data
query <-
"
SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) as event_dt,
event.inputLocation AS search_location,
event.searchSessionId AS search_session,
event.action AS action_type,
event.source AS source_type,
event.isAnon AS is_anonymous,
event.extraParams AS search_type,
event.skinVersion AS skin_version,
event.usereditbucket as edit_bucket,
wiki AS wiki,
Count(*) AS events
FROM event.searchSatisfaction
WHERE year = 2021
-- Review following bug fix on March 10.
AND Month >= 02
--deployed at 7:00 UTC
-- change deployed on 24 Feb 2021
-- remove bots
AND useragent.is_bot = false
AND event.skin = 'vector'
AND wiki IN ('slwiki', 'bewiki', 'dewiki', 'ruwiktionary', 'eswiktionary', 'cawiki',
'da.wikipedia', 'rowiki', 'arwiki', 'idwiki', 'jawikiversity', 'jawiki', 'eswiki' )
GROUP BY
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),
event.inputLocation,
event.skinVersion,
event.searchSessionId,
event.action,
event.source,
event.isAnon,
event.extraParams,
event.usereditbucket,
wiki"
search_widget_sessions_non_pilotwikis <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
search_widget_sessions_non_pilotwikis $event_dt <- as.Date(search_widget_sessions_non_pilotwikis$event_dt)
search_sessions_non_pilot <- search_widget_sessions_non_pilotwikis %>%
# review search sessions initiated that meet AB bucketing criteria
filter(
search_location %in% c('header-moved', 'header-navigation'),
is_anonymous == 'false',
source_type == 'autocomplete',
action_type == 'searchResultPage') %>%
group_by(wiki,search_location, search_type) %>%
summarise(
num_sessions = n_distinct(search_session))
search_sessions_non_pilot
`summarise()` regrouping output by 'wiki', 'search_location' (override with `.groups` argument)
wiki | search_location | search_type | num_sessions |
---|---|---|---|
<chr> | <chr> | <chr> | <int> |
arwiki | header-moved | NULL | 524 |
arwiki | header-moved | WVUI | 11 |
bewiki | header-moved | NULL | 31 |
cawiki | header-moved | NULL | 354 |
cawiki | header-moved | WVUI | 2 |
dewiki | header-moved | NULL | 2522 |
dewiki | header-moved | WVUI | 10 |
eswiki | header-moved | NULL | 350 |
eswiki | header-moved | WVUI | 3 |
eswiki | header-navigation | NULL | 24 |
eswiktionary | header-moved | NULL | 2 |
idwiki | header-moved | NULL | 679 |
idwiki | header-moved | WVUI | 10 |
jawiki | header-moved | NULL | 852 |
jawiki | header-moved | WVUI | 33 |
rowiki | header-moved | NULL | 420 |
ruwiktionary | header-moved | NULL | 826 |
New search widget sessions (indicated by search_type = "WVUI") were recorded on some of these non-test wikis but not as many as would be expected if available to all logged-in users that are opt'in to the new skin.
p <- search_widget_sessions_non_pilotwikis %>%
mutate(search_type = ifelse(search_type == 'NULL', "old search", "new search")) %>%
# review search sessions initiated that meet AB bucketing criteria
filter(
event_dt < '2021-03-15' , #remove date with incomplete data
search_location %in% c('header-moved', 'header-navigation'),
is_anonymous == 'false',
source_type == 'autocomplete',
action_type == 'searchResultPage') %>%
group_by(event_dt, search_type) %>%
summarise(num_events = sum(events),
num_sessions = n_distinct(search_session)) %>%
ggplot(aes(x=event_dt, y = num_sessions, color = search_type)) +
geom_line(size = 1.5) +
labs(y = "number of unique search sessions per day",
x = "Date",
title = "Daily Non-AB Test Search Sessions Initiated by Search Type") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=16),
legend.position="bottom",
axis.text.x = element_text(angle=45, hjust=1))
p
ggsave("Figures/daily_search_sessions_by_type_nonAB.png", p, width = 16, height = 8, units = "in", dpi = 300)
`summarise()` regrouping output by 'event_dt' (override with `.groups` argument)
We're not seeing any new widget sessions for logged in users opt'd in on non pilot wikis after March 2nd. We record events from Feb 25th through March 2nd only.
Update: Per discussions with the Web team, it was clarified that the new search widget was turned off to all non-pilot wikis following start of the AB test to not interfere with the AB test results.
I identified an issue where clicks to the search button were not instrumented for the new search widget. A fix was deployed around 13 May 2021 to address.
query <-
"WITH searches AS (
-- all search sessions
SELECT
MIN(TO_DATE(dt)) as session_start_date,
event.searchSessionId,
event.extraParams,
min(event.usereditbucket) as edit_count,
event.isAnon,
event.skinVersion,
event.inputLocation,
wiki,
CASE
WHEN event.hitsReturned > 0 THEN 'TRUE'
WHEN event.hitsReturned = 0 THEN 'FALSE'
ELSE NULL
END AS results_returned
FROM event.searchSatisfaction
WHERE year = 2021
-- Review following bug fix on March 10 and prior to regression on March 17th
AND MONTH >= 05
AND event.action = 'searchResultPage'
AND event.source = 'autocomplete'
-- review test wikis
AND wiki IN ('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki',
'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' )
-- deployed on vector skin
AND event.skin = 'vector'
-- remove bots
AND useragent.is_bot = false
AND event.subTest is NULL
GROUP BY
event.searchSessionId,
event.extraParams,
event.isAnon,
event.skinVersion,
event.inputLocation,
wiki,
CASE
WHEN event.hitsReturned > 0 THEN 'TRUE'
WHEN event.hitsReturned = 0 THEN 'FALSE'
ELSE NULL
END
-- filter out sessions with over 50 searches
HAVING SUM(1) < 50
),
clicks AS (
-- all search sessions where a click event was recorded
SELECT
event.searchSessionId,
event.position,
wiki
FROM event.searchSatisfaction
WHERE
year = 2021
-- Review following bug fix on March 10 and prior to regression on March 17th
AND MONTH >= 05
AND event.action = 'click'
AND event.source = 'autocomplete'
-- review test wikis
AND wiki IN ('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki',
'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' )
-- deployed on on the new vector skin
AND event.skin = 'vector'
-- remove bots
AND useragent.is_bot = false
AND event.subTest is NULL
)
-- Main query
SELECT
searches.wiki AS wiki,
searches.session_start_date,
searches.searchSessionId AS search_session,
searches.extraParams AS search_type,
searches.edit_count,
searches.isAnon AS is_anonymous,
searches.skinVersion AS vector_version,
searches.inputLocation As search_location,
searches.results_returned,
IF(clicks.searchSessionId IS NOT NULL, 1, 0) AS clicked_through,
coalesce(clicks.position, -2) AS position
FROM searches
LEFT JOIN clicks
ON searches.searchSessionId = clicks.searchSessionId AND
searches.wiki = clicks.wiki
"
search_sessions_completed <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
# Set edit count factor levels
search_sessions_completed$edit_count <- factor(search_sessions_completed$edit_count ,
levels = c("NULL", "0 edits", "1-4 edits", "5-99 edits", "100-999 edits",
"1000+ edits"))
# set search type factor levels
search_sessions_completed$search_type <- factor(search_sessions_completed$search_type ,
levels = c("NULL", "WVUI"),
labels = c("old_search_widget", "new_search_widget"))
p <- search_sessions_completed %>%
filter(clicked_through == 1) %>% # only successful searches
group_by(position, search_type) %>%
summarize(num_sessions = n_distinct(search_session), .groups = 'drop') %>%
ggplot(aes(x= position, y = num_sessions, fill = search_type)) +
geom_col(position = 'dodge') +
geom_text(aes(label = paste(num_sessions)), position = position_dodge(width = 1), color = "black", vjust=-1.5, size = 5) +
scale_x_continuous(breaks = seq(-1, 10, by = 1)) +
labs (y = "Number of search sessions",
x ="Position of the search result click",
title = "Number of search sessions completed by search result position",
caption = "Reflects wikis included in the AB Test") +
theme_bw() +
scale_fill_manual(values= c("#999999", "#000099"), name = "Search widget type", labels = c("Old search widget", "New search widget")) +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
strip.background =element_rect(fill="white"),
plot.title = element_text(hjust = 0.5),
text = element_text(size=16),
legend.position = "bottom")
p
search_clicks_loggedin_status <- search_sessions_completed %>%
filter(clicked_through == 1, position == -1) %>% # only successful searches
group_by(position, search_type, is_anonymous) %>%
summarize(num_sessions = n_distinct(search_session), .groups = 'drop')
search_clicks_loggedin_status
position | search_type | is_anonymous | num_sessions |
---|---|---|---|
<int> | <chr> | <chr> | <int> |
-1 | NULL | false | 36198 |
-1 | NULL | true | 9548 |
-1 | WVUI | false | 52006 |
-1 | WVUI | true | 2007149 |
options(repr.plot.width = 15, repr.plot.height = 10)
Confirmed we are now recording clicks to the search icon (event.position = -1
) for the new search widget. The total clicks recorded to date for logged-in and logged-out users look correct on the old and new search widget appear correct.
search_clicks_loggedin_status <- search_sessions_completed %>%
filter(clicked_through == 1, position == -1) %>% # only successful searches
group_by(search_type, vector_version, search_location) %>%
summarize(num_sessions = n_distinct(search_session), .groups = 'drop')
search_clicks_loggedin_status
search_type | vector_version | search_location | num_sessions |
---|---|---|---|
<chr> | <chr> | <chr> | <int> |
NULL | latest | content | 7801 |
NULL | latest | header-moved | 2425 |
NULL | legacy | content | 1330 |
NULL | legacy | header | 15 |
NULL | legacy | header-navigation | 35868 |
WVUI | latest | content | 55497 |
WVUI | latest | header-moved | 2056125 |