# Machine learning to help researchers evaluate biases in clinical trials: a prospective, randomized user study¶

Authors list:
Frank Soboczenski (1),
Thomas Trikalinos (2),
Joel Kuiper (3),
Randolph G Bias (4),
Byron C Wallace (5),
Iain J Marshall (1)

(1) School of Population Health and Life Sciences, King's College London, London, UK
(2) Center for Evidence-based Medicine, Brown University, Providence, USA
(3) Vortext Systems, Groningen, Netherlands
(4) School of Information, University of Texas at Austin, Austin, USA
(5) College of Computer and Information Science, Northeastern University, Boston, USA

The System used and evaluated in this study can be found here: RobotReviewer User Study

# 1. Main Analysis¶

This Notebook is unsing an R Kernel!
First setting up the R environment:

In [9]:
#set specific working directory uncomment to refer to the data files
#setwd("~/Desktop/R_STUFF")

# load performance improvement libraries & enable just in time compiler
library(compiler)
enableJIT(1)

#some environmental variables (decimals)
options=7

1
In [10]:
#check if required packages are there - if not the script will install them!
requiredPackages = c('rcompanion','gdata','compiler','car','lsr','sft','nlme', 'lme4', 'bibtex', 'psych', 'likert', 'ggplot2', 'tidyverse')
for(p in requiredPackages){
if(!require(p,character.only = TRUE)) install.packages(p);
library(p,character.only = TRUE);
}
lapply(requiredPackages, require, character.only = TRUE);

# !!!! NOTE: run this cell twice for decluttering

1. TRUE
2. TRUE
3. TRUE
4. TRUE
5. TRUE
6. TRUE
7. TRUE
8. TRUE
9. TRUE
10. TRUE
11. TRUE
12. TRUE
13. TRUE
In [11]:
# Importing the data

In [12]:
# quick check on the data

1 -JjAzwjQakZk-3kIbeMfw iHbGgWtrKNfksdoj9Hxv9 3 297 1 1 9 10+ 12.5 No Yes Yes Yes Yes Yes C1:1added, C2:1added, C4:1added 3 0 15
1 -JjAzwjQakZk-3kIbeMfw cnDXl97I_WoVUqyIH0HbQ 4 610 0 1 9 10+ 12.5 No No No No Yes No C1:1added, C2:0added, C3:0added 1 0 1
2 hNQKTiKtHSY_BmxLyZb9Q XOCbiBddVQK3BYy1Ox4lI 1 306 1 1 8 5to10 7.5 No Yes No Yes No Yes C1:1added 1 0 16
2 hNQKTiKtHSY_BmxLyZb9Q oOAy7INgRLumTV3vZWtVM 2 127 1 1 8 5to10 7.5 No Yes No Yes No Yes 0 0 15

ID = individual participant
PDF = document ID
Order = document order of appearance
Time = Time spent on one document in seconds
Condition = Independent variable (1 = Machine learning recommendations (MLR) present, 0 = no MLR)
CExperience = Experience with Cochrance Risk of Bias Tool? (1 = Yes, 0 = No)
Tasks = Number of tasks (in a systematic review) performed (9 = max)
NReviews = Number of systematic reviews performed
AMeanNReviews = Artithmetic Mean number of systematic reviews performed (0=0, 1-5=3, 5-10=7.5, 10+=12.5)
Error = reported errors (ignore for now)

# 1.1 Data Shape¶

In [13]:
# first we'll look at data: the histogram
hist(data$Time)  In [14]: # However we chose to use a log representation (lot=log of time) of our data to account for lot <- log(data$Time)
hist(lot)

In [15]:
# we perform a shapiro-wilk test to establish if the data is parametric or not:
shapiro.test(data$Time)   Shapiro-Wilk normality test data: data$Time
W = 0.69672, p-value < 2.2e-16


The Shapiro-Wilk test shows significance ($W=0.69$, $p<0.001$) which means it is highly unlikely that the data here was sampled from a normal distribution. Hence our data is non-parametric.

As this analysis is looking for differences between the two groups (machine-leanring and No-machine-learning) and as the study followed a within-jubjects design, a Wilcoxon Rank-Sum test is the suitable test for an initial overview of the data.

An alpha level of .05 was used for all statistical tests. A Shapiro-Wilk test showed that the timing data does not follow a normal distribution $W=0.69$, $p<0.001$. The subsequent Wilcoxon Rank-Sum test showed that there was a significant difference in time between the participants in the semi-automated (machine-learning) and the manual (non-machine learning) condition $W(164)=$ 13530, $p<$ 0.001.

In [16]:
# We used th log of the timeing data for all subsequent analyses
T_log <- log(data$Time) # In this case the Wilcox test does not care about Log or not as it uses Rankings of bins # Wilcox text on the Log Time of the data wilcox.test(T_log, data$Condition, paired = TRUE)
# We also looked at the Timing differences by documents
#wilcox.test(T_log, data$PDF, paired = TRUE) # As well as differences in time in respect to the order wilcox.test(T_log, data$Order, paired = TRUE)

	Wilcoxon signed rank test with continuity correction

data:  T_log and data$Condition V = 13530, p-value < 2.2e-16 alternative hypothesis: true location shift is not equal to 0   Wilcoxon signed rank test with continuity correction data: T_log and data$Order
V = 13530, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

In [17]:
# We'll have a look at the boxplots now:
# For timing data (Logarithmic scale) by Condition (ML or Non-ML)
boxplot(T_log~data$Condition, xlab='Condition 0=Manual, 1=Semi-Automated', ylab="Time in Seconds (Log)", main='Overall timing by condition') #boxplot(data$Time~data$Condition, xlab='Condition', ylab="Time in Seconds", main=axis(1, at=0:1, labels=(["N", "M"]))  In [18]: # also by looking at the scatterplot scatterplot(T_log ~ Condition, data=data)  Eyeballing...there is a slight downward tendencey of time towards the MLR (1.0) condition. In [19]: # We also perfomed a t-test (paired because our within-participants design) to # check robustness t.test(T_log, data$Condition, paired = TRUE, alternative = "two.sided")

	Paired t-test

data:  T_log and data$Condition t = 73.028, df = 163, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 5.668718 5.983796 sample estimates: mean of the differences 5.826257  The t-test also confirms the result is highly significant! The t-test shows that the two groups differ significatnly from each other$t(163) = 73.02, p < 0.001$In [20]: # before we carry on, lets have a look if there is a difference of timing in terms of the # documents. First the box plot: boxplot(T_log~data$PDF)

In [25]:
#Also looking at the order here First the box plot:
boxplot(data$Time~data$Order, xlab="Order of the Documents", ylab="Time in Seconds", main="Overall Time by Order")


This nicely shows the incease in time spent on a PDF from the 1st Document participants see to the 4th.

In [23]:
# Now the same plot in time (Log scale) and separated by condition
boxplot(T_log~data$Order*data$Condition, xlab='0.X = Manual conditions, 1.X = Semi-Automated conditions', ylab="Time in Seconds (Log)", main='Order by Time Separated by Condition')


We can see a almost linear increase in the timing data on the non-machine learning side (1.0, 2.0, 3.0, 4.0) compared to the machine-learning side. Note, as this was a within-participants design fatigue effects are expected and can be seen here in form of the last plot 4.0 & 4.1. The timing in the machine-learning condition almost stays at a constant level except the expected last box.

In [26]:
# Again for robustness we also performed a t-test:
t.test(T_log, data$Order, paired = TRUE, alternative = "two.sided")   Paired t-test data: T_log and data$Order
t = 44.277, df = 163, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
3.655618 3.996895
sample estimates:
mean of the differences
3.826257


The t-test shows that the Order is significantly different in terms of timing data $t(163) = 44.28, p < 0.001$

# 2. Descriptives¶

In [30]:
# creating two groups (ML (1) and NonML (0) )
Annotations <- subset(data, Condition=='1')
NoAnnotations <- subset(data, Condition=='0')

In [31]:
# checking the data

31 -JjAzwjQakZk-3kIbeMfw iHbGgWtrKNfksdoj9Hxv9 3 297 1 1 9 10+ 12.5 No Yes Yes Yes Yes Yes C1:1added, C2:1added, C4:1added 3 0 15
52 hNQKTiKtHSY_BmxLyZb9Q XOCbiBddVQK3BYy1Ox4lI 1 306 1 1 8 5to10 7.5 No Yes No Yes No Yes C1:1added 1 0 16
62 hNQKTiKtHSY_BmxLyZb9Q oOAy7INgRLumTV3vZWtVM 2 127 1 1 8 5to10 7.5 No Yes No Yes No Yes 0 0 15
103 pyTvUi2W85Nb9handPSvZ gD_JugCFB6D5iV-JIOI0p 2 209 1 1 9 5to10 7.5 No Yes No Yes No Yes 0 0 15
123 pyTvUi2W85Nb9handPSvZ 4gIe5sni3WJJCzNglSdo8 4 756 1 1 9 5to10 7.5 Yes Yes Yes Yes Yes Yes C1:1added all left;C2:1added all left,C3:1added,C4:1added4 0 19
In [32]:
cat("Time Overall: ", sum(data$Time), '\n') cat("Overall Mean Time: ", mean(data$Time), '\n')
cat("Overall SD Time: ", sd(data$Time), '\n') cat('_____________________________________________', '\n') cat("Time in Annotations: ", sum(Annotations$Time), "Time in NoAnnotations: ", sum(NoAnnotations$Time), '\n') cat("Mean Annotations: ", mean(Annotations$Time), "Mean NoAnnotations: ", mean(NoAnnotations$Time), '\n') cat("SD Annotations: ", sd(Annotations$Time), "SD NoAnnotations: ", sd(NoAnnotations$Time), '\n') cat('_____________________________________________', '\n')  Time Overall: 129454 Overall Mean Time: 789.3537 Overall SD Time: 794.4597 _____________________________________________ Time in Annotations: 61913 Time in NoAnnotations: 67541 Mean Annotations: 755.0366 Mean NoAnnotations: 823.6707 SD Annotations: 868.4611 SD NoAnnotations: 716.6 _____________________________________________  Forty-one participants were recruited. All except four had experience of at least one systematic review and all but eight were familiar with the Cochrane Risk of Bias tool. Twenty listed more than one task on how they contributed to previsous systematic reviews. A mean of 755 seconds (SD 868) were taken for semi-automated bias assesmsents and 824 seconds (SD 717) for manual assessments ($p<0.001$). Participants spent in total 129454 seconds ($Mean$=789.35,$SD$=794.46) to complete the study together in both conditions semi-automated and manual. # 3. Tukey Ladder of Powers¶ The Tukey Ladder of Powers was used to transform the response variable (time) to come closer to a normal distribution. The resulting$\lambda =$-0.15 is close to 0 which confirms that the optimal transformation according to the Tukey Ladder of Powers is indeed a logarithmic operation. For the following mixed model analysis the timing data was therefore transformed to a logarithmic scale. In [33]: # Quick look at the data plotNormalHistogram(data$Time)

In [34]:
# cheching the data transformation:
T_tuk = transformTukey(data$Time, plotit=FALSE)   lambda W Shapiro.p.value 395 -0.15 0.9907 0.3626 if (lambda > 0){TRANS = x ^ lambda} if (lambda == 0){TRANS = log(x)} if (lambda < 0){TRANS = -1 * x ^ lambda}  In [35]: # plotting the transformed data plotNormalHistogram(T_tuk)  In [36]: # Now using the log of the data and plotting it: plotNormalHistogram(T_log)  The Tukey Ladder of Powers was used to transform the response variable (time) to come closer to a normal distribution. The resulting$\lambda =$-0.15 is close to 0 which confirms that the optimal transformation according to the Tukey Ladder of Powers is indeed a logarithmic operation. For the following mixed model analysis the timing data was therefore transformed to a logarithmic scale. # 4. Linear Mixed Effects Model Analysis¶ A linear mixed effects model was used to examine the associations between the log transformed time response, the semi-automatic or manual condition, the order in which a document was randomly presented and self reported characteristics of the reviewers. ## 4.1 Primary model analysis¶ In [37]: # first add the log time to the data frame data$logT <- log(data$Time) # create the primary model: Log(Time) by Condition as fixed effects and ID as random effect: ml.p = lmer(logT ~ Condition + (1 | ID), data=data)  In [38]: summary(ml.p)  Linear mixed model fit by REML ['lmerMod'] Formula: logT ~ Condition + (1 | ID) Data: data REML criterion at convergence: 374.2 Scaled residuals: Min 1Q Median 3Q Max -2.29905 -0.63291 0.02993 0.55999 2.47915 Random effects: Groups Name Variance Std.Dev. ID (Intercept) 0.2085 0.4566 Residual 0.4277 0.6540 Number of obs: 164, groups: ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 6.4678 0.1015 63.73 Condition -0.2832 0.1021 -2.77 Correlation of Fixed Effects: (Intr) Condition -0.503 In [39]: # looking at coefficients by ID coef(ml.p)$ID

(Intercept)Condition
-JjAzwjQakZk-3kIbeMfw6.342909 -0.2831514
-O8MP5AR-esmcnFcENyoF6.402884 -0.2831514
-YoLEIFk1XZUJo3BgVmlp6.723705 -0.2831514
2K-K26QZH9xPJH9RVMltM6.419611 -0.2831514
2snvok14kLm_dRIdL67PF6.223803 -0.2831514
3CDOco99k3UGgia7A9LQo7.275245 -0.2831514
4dkYUV8wlkm_Q5miw0qCJ6.865738 -0.2831514
5h9rXuHepqDSmetOKXM0J6.557449 -0.2831514
5SAqbCbqZP_q7hVWTEJdL5.615216 -0.2831514
8arsi19w5sMwMmpkmM0sX6.415119 -0.2831514
C07qGfUuD1L-1IoLjtOXY6.572443 -0.2831514
C4xWyqDJrsHcIoHU5GtUG7.029055 -0.2831514
Ci3GWDLAOc5ANxkPpNOBc6.409193 -0.2831514
cs1qlmKFvdkFjUcXEkJgT6.431466 -0.2831514
FfUZhXrHdKSHZ3ON2_Dty6.667310 -0.2831514
fQoBVJ60uF5ehuYNAg4v26.417978 -0.2831514
FXvg7iTf7P1bl-Z9dHKPO6.431382 -0.2831514
GR96RbbiPLL4gS3-ilxky7.259352 -0.2831514
hNQKTiKtHSY_BmxLyZb9Q6.227614 -0.2831514
j-6IWj13fqVcUXKjFCo7y6.202480 -0.2831514
jBFO8AR45EErIQ5grz5n-6.324240 -0.2831514
jYha2hNra2RfUfgXnVaj16.436849 -0.2831514
KlKCr9O8A-8WK6gr2vK9k6.455434 -0.2831514
KtfB3jYg9jsg0osJkMWVG5.920122 -0.2831514
L-KBZ1ZC0lK2hQlDSUCgH6.018265 -0.2831514
lllK6Q_LLhHbw7bTJ1v516.135699 -0.2831514
pyTvUi2W85Nb9handPSvZ6.191651 -0.2831514
Rh7QrcKwEzMEORahevriS6.846711 -0.2831514
S2oI3sQr3HnSJ7rgbaMJR6.692770 -0.2831514
sLUZSWii-lFp5oEFHCd7f6.310985 -0.2831514
to4PqjXHBIG3bKQhv9cHr6.173613 -0.2831514
tRdDoOfbjIQw3tUkGktky6.736527 -0.2831514
TTtTg_voG4fV-uDraAWC36.302266 -0.2831514
UbIO7NW7RAUjVfe3_VJ486.232454 -0.2831514
uq4HMVa0zTYYknq8_4SOc7.464664 -0.2831514
UyqDH9vL0gK918t2bEj0z6.266225 -0.2831514
V9_3tiOUC-jab7xnPK1HK6.569870 -0.2831514
WpFHCjeR-3YdyEJJmCT2Y6.208212 -0.2831514
ynh1N1_BZO8W4d0_gaMDy6.345564 -0.2831514
ZAj9Y0K4px17TAcPf3uYG6.092636 -0.2831514
ZXpghXhauDD3Vpv76PGoj6.966419 -0.2831514
In [40]:
# the mean values
coef(summary(ml.p))[,"Estimate"]

(Intercept)
6.46783243288911
Condition
-0.283151403792091
In [41]:
# establishing the confidence intervalls:
confint(ml.p)

Computing profile confidence intervals ...

2.5 %97.5 %
.sig01 0.3074827 0.62121739
.sigma 0.5777147 0.74206725
(Intercept) 6.2682066 6.66745826
Condition-0.4840935 -0.08220934

The confidence intervals are -0.48 to -0.08 -> exp(-0.48)=62% to exp(-0.08)=92% )

In [42]:
# main speed up = 100 - 0.75 = 25%
exp(-.28)

0.755783741455725

The primary model ($m_p$) took the log transformed time and the condition semi-automated or manual) as fixed effects and the individuals as random effect into account. Participants performing bias assessments in the semi-automated condition were on average 25% quicker than the participants in the manual condition (95% CI 62% to 92%).

# 4.2 Exploratory Analysis¶

#### Now continuing with Exploratory Analisys (Likelihood Ratio)¶

In addition, model $m_1$ took also the Order as fixed effect and the document as random effect into account. whereas $m_2$ did only account for the document as random effect to examine the importance of the Order in the model. A close examination for $m_1$ showed that there was a random intercept for every PDF. The random effect variance of the documents is 0.02 (about 1/10th) as high as from the individuals (0.23). Hence, there is more variance across participants than there is across documents. The subsequent likelihood-ratio test showed that the Order was highly significant $\chi^2$(1) = 42.26, $p<$ 0.001. Therefore the Order was kept in the following models.\newline

In [43]:
# What is the effect of the PDF, Order, Condition and Person?
ml.1 = lmer(logT ~ Order + Condition + (1|ID) + (1|PDF), data=data, REML=FALSE)
ml.2 = lmer(logT ~ Condition + (1|ID) + (1|PDF), data=data, REML=FALSE)

In [44]:
summary(ml.1)

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: logT ~ Order + Condition + (1 | ID) + (1 | PDF)
Data: data

AIC      BIC   logLik deviance df.resid
338.2    356.8   -163.1    326.2      158

Scaled residuals:
Min       1Q   Median       3Q      Max
-2.05956 -0.44213 -0.01847  0.60865  2.15485

Random effects:
Groups   Name        Variance Std.Dev.
PDF      (Intercept) 0.02466  0.1571
ID       (Intercept) 0.23839  0.4883
Residual             0.27691  0.5262
Number of obs: 164, groups:  PDF, 46; ID, 41

Fixed effects:
Estimate Std. Error t value
(Intercept)  5.78065    0.13809   41.86
Order        0.27151    0.03781    7.18
Condition   -0.27389    0.08498   -3.22

Correlation of Fixed Effects:
(Intr) Order
Order     -0.691
Condition -0.319  0.012
In [45]:
summary(ml.2)

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: logT ~ Condition + (1 | ID) + (1 | PDF)
Data: data

AIC      BIC   logLik deviance df.resid
378.5    394.0   -184.2    368.5      159

Scaled residuals:
Min       1Q   Median       3Q      Max
-2.29806 -0.63552  0.02841  0.55830  2.49665

Random effects:
Groups   Name        Variance  Std.Dev.
PDF      (Intercept) 2.306e-15 4.802e-08
ID       (Intercept) 2.016e-01 4.490e-01
Residual             4.243e-01 6.514e-01
Number of obs: 164, groups:  PDF, 46; ID, 41

Fixed effects:
Estimate Std. Error t value
(Intercept)   6.4678     0.1005   64.38
Condition    -0.2832     0.1017   -2.78

Correlation of Fixed Effects:
(Intr)
Condition -0.506
In [46]:
# Likelihood-Ratio analysis (ANOVA) to see if the Order is important
anova(ml.1, ml.2)

DfAICBIClogLikdevianceChisqChi DfPr(>Chisq)
ml.25 378.4641 393.9634 -184.2321 368.4641 NA NA NA
ml.16 338.1962 356.7954 -163.0981 326.1962 42.26793 1 7.958708e-11

This tells us (and confirms previous overall results) to keep the order:
The Likelihood-Ratio analysis (Anova) between $m_1$ and $m_2$ tells us that the order is highly significant ($p<0.001$).
That is an indication to keep the order in the model.
Let's again take a look at model $m_1$

In [47]:
summary(ml.1)

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: logT ~ Order + Condition + (1 | ID) + (1 | PDF)
Data: data

AIC      BIC   logLik deviance df.resid
338.2    356.8   -163.1    326.2      158

Scaled residuals:
Min       1Q   Median       3Q      Max
-2.05956 -0.44213 -0.01847  0.60865  2.15485

Random effects:
Groups   Name        Variance Std.Dev.
PDF      (Intercept) 0.02466  0.1571
ID       (Intercept) 0.23839  0.4883
Residual             0.27691  0.5262
Number of obs: 164, groups:  PDF, 46; ID, 41

Fixed effects:
Estimate Std. Error t value
(Intercept)  5.78065    0.13809   41.86
Order        0.27151    0.03781    7.18
Condition   -0.27389    0.08498   -3.22

Correlation of Fixed Effects:
(Intr) Order
Order     -0.691
Condition -0.319  0.012

We have a random intercept for every PDF. We can see that the random effect variance of PDF is about 1/10th as high as the variance from the ID. So there is more variance across people than there is across PDF (document) related.
Let's examine a model $m_3$ (removed the PDF random effect):

In [48]:
ml.3 = lmer(logT ~ Order + Condition + (1|ID), data=data, REML=FALSE)

In [49]:
# again likelihood-ratio analysis (ANOVA) now looking at the difference between m_1 and m_3 :
anova(ml.1, ml.3)

DfAICBIClogLikdevianceChisqChi DfPr(>Chisq)
ml.35 336.9199 352.4193 -163.4600326.9199 NA NA NA
ml.16 338.1962 356.7954 -163.0981326.1962 0.723746 1 0.3949179

Model $m_3$ did not have the document as random effect in order to see if it had an influence. The following likelihood-ratio comparisons with $m_1$ showed that there was no significance which suggests that there is little variability from documents and by ignoring this does not detract from the explanatory potential of the model $\chi^2$(1) = 0.72, $p=$ 0.39.

There is no significance! This means we do not care so much for the PDF random effect.
In fact we could loose the PDF random effect without destroying the known universe.
So let's see the random effects coefficients:

In [50]:
coef(ml.3)

$ID (Intercept) Order Condition -JjAzwjQakZk-3kIbeMfw 5.646676 0.2701334 -0.2765628 -O8MP5AR-esmcnFcENyoF 5.715102 0.2701334 -0.2765628 -YoLEIFk1XZUJo3BgVmlp 6.081139 0.2701334 -0.2765628 2K-K26QZH9xPJH9RVMltM 5.734187 0.2701334 -0.2765628 2snvok14kLm_dRIdL67PF 5.510783 0.2701334 -0.2765628 3CDOco99k3UGgia7A9LQo 6.710410 0.2701334 -0.2765628 4dkYUV8wlkm_Q5miw0qCJ 6.243188 0.2701334 -0.2765628 5h9rXuHepqDSmetOKXM0J 5.891452 0.2701334 -0.2765628 5SAqbCbqZP_q7hVWTEJdL 4.816425 0.2701334 -0.2765628 8arsi19w5sMwMmpkmM0sX 5.729063 0.2701334 -0.2765628 C07qGfUuD1L-1IoLjtOXY 5.908558 0.2701334 -0.2765628 C4xWyqDJrsHcIoHU5GtUG 6.429523 0.2701334 -0.2765628 Ci3GWDLAOc5ANxkPpNOBc 5.722301 0.2701334 -0.2765628 cs1qlmKFvdkFjUcXEkJgT 5.747713 0.2701334 -0.2765628 FfUZhXrHdKSHZ3ON2_Dty 6.016796 0.2701334 -0.2765628 fQoBVJ60uF5ehuYNAg4v2 5.732324 0.2701334 -0.2765628 FXvg7iTf7P1bl-Z9dHKPO 5.747617 0.2701334 -0.2765628 GR96RbbiPLL4gS3-ilxky 6.692276 0.2701334 -0.2765628 hNQKTiKtHSY_BmxLyZb9Q 5.515131 0.2701334 -0.2765628 j-6IWj13fqVcUXKjFCo7y 5.486455 0.2701334 -0.2765628 jBFO8AR45EErIQ5grz5n- 5.625375 0.2701334 -0.2765628 jYha2hNra2RfUfgXnVaj1 5.753855 0.2701334 -0.2765628 KlKCr9O8A-8WK6gr2vK9k 5.775059 0.2701334 -0.2765628 KtfB3jYg9jsg0osJkMWVG 5.164302 0.2701334 -0.2765628 L-KBZ1ZC0lK2hQlDSUCgH 5.276278 0.2701334 -0.2765628 lllK6Q_LLhHbw7bTJ1v51 5.410262 0.2701334 -0.2765628 pyTvUi2W85Nb9handPSvZ 5.474099 0.2701334 -0.2765628 Rh7QrcKwEzMEORahevriS 6.221480 0.2701334 -0.2765628 S2oI3sQr3HnSJ7rgbaMJR 6.045844 0.2701334 -0.2765628 sLUZSWii-lFp5oEFHCd7f 5.610252 0.2701334 -0.2765628 to4PqjXHBIG3bKQhv9cHr 5.453520 0.2701334 -0.2765628 tRdDoOfbjIQw3tUkGktky 6.095768 0.2701334 -0.2765628 TTtTg_voG4fV-uDraAWC3 5.600304 0.2701334 -0.2765628 UbIO7NW7RAUjVfe3_VJ48 5.520653 0.2701334 -0.2765628 uq4HMVa0zTYYknq8_4SOc 6.926525 0.2701334 -0.2765628 UyqDH9vL0gK918t2bEj0z 5.559183 0.2701334 -0.2765628 V9_3tiOUC-jab7xnPK1HK 5.905622 0.2701334 -0.2765628 WpFHCjeR-3YdyEJJmCT2Y 5.492995 0.2701334 -0.2765628 ynh1N1_BZO8W4d0_gaMDy 5.649705 0.2701334 -0.2765628 ZAj9Y0K4px17TAcPf3uYG 5.361130 0.2701334 -0.2765628 ZXpghXhauDD3Vpv76PGoj 6.358060 0.2701334 -0.2765628 attr(,"class") [1] "coef.mer" In [51]: summary(ml.3)  Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: logT ~ Order + Condition + (1 | ID) Data: data AIC BIC logLik deviance df.resid 336.9 352.4 -163.5 326.9 159 Scaled residuals: Min 1Q Median 3Q Max -2.19339 -0.52439 -0.01335 0.53696 2.27474 Random effects: Groups Name Variance Std.Dev. ID (Intercept) 0.2320 0.4817 Residual 0.3027 0.5501 Number of obs: 164, groups: ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 5.78920 0.13663 42.37 Order 0.27013 0.03843 7.03 Condition -0.27656 0.08592 -3.22 Correlation of Fixed Effects: (Intr) Order Order -0.707 Condition -0.322 0.011 The variance of coefficients 0.23 Do different people have a different coefficient when it comes to the Machine learning (random slope)? One option to examine this is to include the condition also as random effect. So there is an overall mean for the machine learning, but how variable does it seem to be? In [52]: # including the condition also as random effect in model m_3 ml.3 = lmer(logT ~ Order + Condition + (Condition | ID), data=data, REML=FALSE)  In [53]: summary(ml.3)  Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: logT ~ Order + Condition + (Condition | ID) Data: data AIC BIC logLik deviance df.resid 328.0 349.7 -157.0 314.0 157 Scaled residuals: Min 1Q Median 3Q Max -2.3985 -0.4850 0.0052 0.5775 2.2962 Random effects: Groups Name Variance Std.Dev. Corr ID (Intercept) 0.1487 0.3856 Condition 0.2000 0.4473 0.29 Residual 0.2361 0.4859 Number of obs: 164, groups: ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 5.76592 0.12264 47.01 Order 0.27940 0.03678 7.60 Condition -0.27634 0.10314 -2.68 Correlation of Fixed Effects: (Intr) Order Order -0.753 Condition -0.138 0.009 There seems to be variability in the slope. So different people seem to react differently to the machine learning condition. It seems that the residual variance has dropped between the first$m_3$and updated$m_3$models: It was .40 in the first$m_3$now in the updated$m_3$it is .24. Perhaps this model explains a bit more. Let's take a look at the coefficients: In [54]: # coefficients by ID coef(ml.3)$ID

(Intercept)OrderCondition
-JjAzwjQakZk-3kIbeMfw5.645644 0.2794002 -0.31868291
-O8MP5AR-esmcnFcENyoF5.689063 0.2794002 -0.24591965
-YoLEIFk1XZUJo3BgVmlp6.018559 0.2794002 -0.21255577
2K-K26QZH9xPJH9RVMltM5.758266 0.2794002 -0.43458312
2snvok14kLm_dRIdL67PF5.533983 0.2794002 -0.37010571
3CDOco99k3UGgia7A9LQo6.471687 0.2794002 0.25956400
4dkYUV8wlkm_Q5miw0qCJ6.085542 0.2794002 0.09094654
5h9rXuHepqDSmetOKXM0J5.825058 0.2794002 -0.14658113
5SAqbCbqZP_q7hVWTEJdL5.065488 0.2794002 -1.00634718
8arsi19w5sMwMmpkmM0sX5.665417 0.2794002 -0.11211088
C07qGfUuD1L-1IoLjtOXY5.907933 0.2794002 -0.39198176
C4xWyqDJrsHcIoHU5GtUG6.114986 0.2794002 0.61406152
Ci3GWDLAOc5ANxkPpNOBc5.721248 0.2794002 -0.33934149
cs1qlmKFvdkFjUcXEkJgT5.696203 0.2794002 -0.16163754
FfUZhXrHdKSHZ3ON2_Dty5.907440 0.2794002 -0.02371297
fQoBVJ60uF5ehuYNAg4v25.799538 0.2794002 -0.59194148
FXvg7iTf7P1bl-Z9dHKPO5.716290 0.2794002 -0.23548329
GR96RbbiPLL4gS3-ilxky6.409160 0.2794002 0.42701709
hNQKTiKtHSY_BmxLyZb9Q5.593803 0.2794002 -0.57432025
j-6IWj13fqVcUXKjFCo7y5.540374 0.2794002 -0.47586129
jBFO8AR45EErIQ5grz5n-5.589154 0.2794002 -0.18404910
jYha2hNra2RfUfgXnVaj15.650462 0.2794002 0.02656450
KlKCr9O8A-8WK6gr2vK9k5.710093 0.2794002 -0.11988919
KtfB3jYg9jsg0osJkMWVG5.223300 0.2794002 -0.40611143
L-KBZ1ZC0lK2hQlDSUCgH5.348445 0.2794002 -0.48501901
lllK6Q_LLhHbw7bTJ1v515.418836 0.2794002 -0.28901104
pyTvUi2W85Nb9handPSvZ5.496255 0.2794002 -0.35622486
Rh7QrcKwEzMEORahevriS6.114006 0.2794002 -0.08672533
S2oI3sQr3HnSJ7rgbaMJR5.908495 0.2794002 0.07077306
sLUZSWii-lFp5oEFHCd7f5.602679 0.2794002 -0.28475114
to4PqjXHBIG3bKQhv9cHr5.475240 0.2794002 -0.34898562
tRdDoOfbjIQw3tUkGktky6.095288 0.2794002 -0.44385065
TTtTg_voG4fV-uDraAWC35.697105 0.2794002 -0.66402711
UbIO7NW7RAUjVfe3_VJ485.689982 0.2794002 -0.90763414
uq4HMVa0zTYYknq8_4SOc6.665880 0.2794002 0.28053689
UyqDH9vL0gK918t2bEj0z5.595343 0.2794002 -0.43080896
V9_3tiOUC-jab7xnPK1HK5.849366 0.2794002 -0.18756661
WpFHCjeR-3YdyEJJmCT2Y5.609388 0.2794002 -0.70630849
ynh1N1_BZO8W4d0_gaMDy5.831413 0.2794002 -0.98833027
ZAj9Y0K4px17TAcPf3uYG5.459037 0.2794002 -0.60249247
ZXpghXhauDD3Vpv76PGoj6.207454 0.2794002 0.03368090
In [55]:
# histogram about those conditions:
hist(coef(ml.3)$ID[,"Condition"])  So there seems to be some people who do it faster and some slower so there seems to be some heterogeneity. Model$m_4$adds the random effect per person: In [56]: ml.4 = lmer(logT ~ Order + Condition + (Condition|ID), data=data, REML=FALSE)  In [57]: # again likelihood-ratio (ANOVA) to see the differences between models: anova(ml.4, ml.3)  DfAICBIClogLikdevianceChisqChi DfPr(>Chisq) ml.47 328.0184 349.7175 -157.0092314.0184 NA NA NA ml.37 328.0184 349.7175 -157.0092314.0184 0 0 1 Not significant:$\chi^2$(0) = 0.0,$p=$1. In [58]: # Do we need the Order if we have the condition as a random effect? ml.5 = lmer(logT ~ Condition + (Condition|ID), data=data, REML=FALSE)  In [59]: # Differences in models: anova(ml.4, ml.5)  DfAICBIClogLikdevianceChisqChi DfPr(>Chisq) ml.56 372.7248 391.3240 -180.3624 360.7248 NA NA NA ml.47 328.0184 349.7175 -157.0092 314.0184 46.70645 1 8.245712e-12 Yes, significant! The order is still important:$\chi^2$(1) = 46.70,$p<$0.001. In [60]: # examining model m_4 summary(ml.4)  Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: logT ~ Order + Condition + (Condition | ID) Data: data AIC BIC logLik deviance df.resid 328.0 349.7 -157.0 314.0 157 Scaled residuals: Min 1Q Median 3Q Max -2.3985 -0.4850 0.0052 0.5775 2.2962 Random effects: Groups Name Variance Std.Dev. Corr ID (Intercept) 0.1487 0.3856 Condition 0.2000 0.4473 0.29 Residual 0.2361 0.4859 Number of obs: 164, groups: ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 5.76592 0.12264 47.01 Order 0.27940 0.03678 7.60 Condition -0.27634 0.10314 -2.68 Correlation of Fixed Effects: (Intr) Order Order -0.753 Condition -0.138 0.009 In [61]: # examining model m_5 summary(ml.5)  Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: logT ~ Condition + (Condition | ID) Data: data AIC BIC logLik deviance df.resid 372.7 391.3 -180.4 360.7 158 Scaled residuals: Min 1Q Median 3Q Max -2.34127 -0.62213 0.05227 0.57998 2.21132 Random effects: Groups Name Variance Std.Dev. Corr ID (Intercept) 0.1069 0.3270 Condition 0.1603 0.4003 0.52 Residual 0.3708 0.6090 Number of obs: 164, groups: ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 6.46783 0.08444 76.60 Condition -0.28315 0.11381 -2.49 Correlation of Fixed Effects: (Intr) Condition -0.298 Interestingly, both of the models have the same variance with different sign (indication for colinearty?). However, the order is not exactly random. Each prticipant has two documents of each. It is a random ordering. Each participant will have for sure two ML and two Non-ML conditions / documents. After we remove the order in model$m_5$the condition variance is roughly the same. Before was -27 now is -28. # 5. Reviewer Jugements & Annotations Analysis¶ ## 5.1 Descriptives¶ In [62]: # Some descriptitves # TOTAL first (Annotations & NonAnnotations together) cat('____________TOTAL DATA:______________________', '\n') cat("Total Annotations added:", sum(data$TotalAdded), '\n')
cat("Mean Total Annotations added:", mean(data$TotalAdded), '\n') cat("SD Total Annotations added:", sd(data$TotalAdded), '\n')
cat('_____________________________________________', '\n')

cat("Total Annotations deleted:", sum(data$TotalDeleted), '\n') cat("Mean Total Annotations deleted:", mean(data$TotalDeleted), '\n')
cat("SD Total Annotations deleted:", sd(data$TotalDeleted), '\n') cat('\n') cat('____________MACHINE LEARNING:________________', '\n') # Now only for Machine-Learning Condition cat("Total Annotations added:", sum(Annotations$TotalAdded), '\n')
cat("Mean Total Annotations added:", mean(Annotations$TotalAdded), '\n') cat("SD Total Annotations added:", sd(Annotations$TotalAdded), '\n')
cat('_____________________________________________', '\n')

cat("Total Annotations deleted:", sum(Annotations$TotalDeleted), '\n') cat("Mean Total Annotations deleted:", mean(Annotations$TotalDeleted), '\n')
cat("SD Total Annotations deleted:", sd(Annotations$TotalDeleted), '\n') cat('\n') cat('____________NON-MACHINE LEARNING: ___________', '\n') # And for the Non-Machine-Learning Condition cat("Total Annotations added:", sum(NoAnnotations$TotalAdded), '\n')
cat("Mean Total Annotations added:", mean(NoAnnotations$TotalAdded), '\n') cat("SD Total Annotations added:", sd(NoAnnotations$TotalAdded), '\n')
cat('_____________________________________________', '\n')

cat("Total Annotations deleted:", sum(NoAnnotations$TotalDeleted), '\n') cat("Mean Total Annotations deleted:", mean(NoAnnotations$TotalDeleted), '\n')
cat("SD Total Annotations deleted:", sd(NoAnnotations$TotalDeleted), '\n')  ____________TOTAL DATA:______________________ Total Annotations added: 486 Mean Total Annotations added: 2.963415 SD Total Annotations added: 3.033333 _____________________________________________ Total Annotations deleted: 127 Mean Total Annotations deleted: 0.7743902 SD Total Annotations deleted: 1.957607 ____________MACHINE LEARNING:________________ Total Annotations added: 103 Mean Total Annotations added: 1.256098 SD Total Annotations added: 1.755492 _____________________________________________ Total Annotations deleted: 127 Mean Total Annotations deleted: 1.54878 SD Total Annotations deleted: 2.549037 ____________NON-MACHINE LEARNING: ___________ Total Annotations added: 383 Mean Total Annotations added: 4.670732 SD Total Annotations added: 3.087429 _____________________________________________ Total Annotations deleted: 0 Mean Total Annotations deleted: 0 SD Total Annotations deleted: 0  ## 5.2 Descriptives Self-Reported characteristics¶ In [63]: # using a subset of the above main data here: data_s <- read.xls("data/subset_selfreported.xlsx", verbose=FALSE, na.strings=c("NA")) # and filtering out douplicates data_s2 <- unique(data_s)  In [64]: # quick check on the data head(data_s2)  IDCExperienceTasksNReviews 1-JjAzwjQakZk-3kIbeMfw1 9 10+ 5hNQKTiKtHSY_BmxLyZb9Q1 8 5to10 9pyTvUi2W85Nb9handPSvZ1 9 5to10 132K-K26QZH9xPJH9RVMltM1 9 10+ 178arsi19w5sMwMmpkmM0sX1 9 10+ 21GR96RbbiPLL4gS3-ilxky1 9 1to5 In [65]: # Calculating Descriptive Self-reported characteristics data: NumberOfReviews0 <- sum(data_s2$NReviews=='0')
NumberOfReviews1_5 <- sum(data_s2$NReviews=='1to5') NumberOfReviews5_0 <- sum(data_s2$NReviews=='5to10')
NumberOfReviews10plus <- sum(data_s2$NReviews=='10+') NumberOfReviews0 NumberOfReviews1_5 NumberOfReviews5_0 NumberOfReviews10plus NumberOfReviews0_percentage <- NumberOfReviews0/41 NumberOfReviews1_5_percentage <- NumberOfReviews1_5/41 NumberOfReviews5_0_percentage <- NumberOfReviews5_0/41 NumberOfReviews10plus_percentage <- NumberOfReviews10plus/41 NumberOfReviews0_percentage NumberOfReviews1_5_percentage NumberOfReviews5_0_percentage NumberOfReviews10plus_percentage TotalSumOfPeopleWithCochraneExperience <- sum(data_s2$CExperience=='1')
TotalSumOfPeopleWithoutCochraneExperience <- sum(data_s2$CExperience=='0') TotalSumOfPeopleWithCochraneExperience TotalSumOfPeopleWithoutCochraneExperience Percentage_TotalSumOfPeopleWithCochraneExperience <- TotalSumOfPeopleWithCochraneExperience/41 Percentage_TotalSumOfPeopleWithoutCochraneExperience <- TotalSumOfPeopleWithoutCochraneExperience/41 Percentage_TotalSumOfPeopleWithCochraneExperience Percentage_TotalSumOfPeopleWithoutCochraneExperience MedianOfTasksPerfomed <- median(data_s2$Tasks)
IRQofTasksPerformed <- quantile(data_s2$Tasks) MedianOfTasksPerfomed IRQofTasksPerformed  5 9 12 15 0.121951219512195 0.219512195121951 0.292682926829268 0.365853658536585 32 9 0.780487804878049 0.219512195121951 8 0% 1 25% 6 50% 8 75% 9 100% 9 ## 5.3 Jugdement Agrement Data¶ In [66]: # Importing the data data_agreement <- read.xls("data/agreement.xlsx", verbose=FALSE, na.strings=c("NA"))  In [67]: # creating two groups (ML (1) and NonML (0) ) ML <- subset(data_agreement, Condition=='1') NoML <- subset(data_agreement, Condition=='0')  In [68]: # checking how many datapoints in one column: count(ML)  n 82 In [69]: # calculating the changed data and percentiles: RSG <- sum(ML$Changed)
cat("RSG changed:", RSG, "RSG %:", RSG/82, '\n')
AC <- sum(ML$Changed.1) cat("AC changed:", AC, "AC %:", AC/82, '\n') BPP <- sum(ML$Changed.2)
cat("BPP changed:", BPP, "BPP %:", BPP/82, '\n')
BOA <- sum(ML$Changed.3) cat("BOA changed:", BOA, "BOA %:", BOA/82, '\n') Overall <- RSG+AC+BPP+BOA cat("Overall changed:", Overall, "RSG %:", Overall/328, '\n')  RSG changed: 7 RSG %: 0.08536585 AC changed: 7 AC %: 0.08536585 BPP changed: 6 BPP %: 0.07317073 BOA changed: 7 BOA %: 0.08536585 Overall changed: 27 RSG %: 0.08231707  ## 5.4 Annotations Data¶ In [70]: # Overall mean annotations data <- read.xls("data/TimeAnalysis2_1.xlsx", verbose=FALSE, na.strings=c("NA")) data2 <- subset(data, Condition=='1') data3 <- subset(data, Condition=='0') mean(data2$TotalSubmitted)
mean(data3$TotalSubmitted)  14.6341463414634 4.67073170731707 In [71]: ############## # annotations data <- read.xls("data/annotationschanged.xlsx", verbose=FALSE, na.strings=c("NA")) data2 <- subset(data, Condition=='1') data3 <- subset(data, Condition=='0') RSG1 <- sum(data2$Changed1=='0')
RSG2 <- sum(data2$Changed1=='1') RSG3 <- sum(data2$Changed1=='2')
RSG1/82
RSG2/82
RSG3/82

AC1 <- sum(data2$Changed2=='0') AC2 <- sum(data2$Changed2=='1')
AC3 <- sum(data2$Changed2=='2') AC1/82 AC2/82 AC3/82 BPP1 <- sum(data2$Changed3=='0')
BPP2 <- sum(data2$Changed3=='1') BPP3 <- sum(data2$Changed3=='2')
BPP1/82
BPP2/82
BPP3/82

BOA1 <- sum(data2$Changed4=='0') BOA2 <- sum(data2$Changed4=='1')
BOA3 <- sum(data2$Changed4=='2') BOA1/82 BOA2/82 BOA3/82 TotalUnchanged <- RSG1+AC1+BPP1+BOA1 TotalUnchanged/328 TotalChangedML <- RSG2+AC2+BPP2+BOA2 TotalChangedML/328 TotalChangedNoML <- RSG3+AC3+BPP3+BOA3 TotalChangedNoML/328  0.560975609756098 0.414634146341463 0.024390243902439 0.621951219512195 0.341463414634146 0.0365853658536585 0.634146341463415 0.329268292682927 0.0365853658536585 0.646341463414634 0.292682926829268 0.0609756097560976 0.615853658536585 0.344512195121951 0.0396341463414634 # 6. QUESTIONNAIRE ANALYSIS¶ In [74]: # loading the data: dataq <- read.xls("data/UXData1.xlsx", verbose=FALSE, na.strings=c("NA"))  ## 6.1 Data Preparation (cleaning and structuring)¶ In [75]: # subsetting data_rel <- as.data.frame(dataq) ss2 <- c(1,3,4, 25:44) data_q2 <- subset(data_rel, select=ss2)  In [76]: # checking the data head(data_q2)  ParticipantNoSequenceConditionCapacityReviewNoOfTasksPerformedHowManyReviewsCochraneRoBExpUseFrequentlyComplexEasyToUseUseQuicklyCumbersomeConfidentNeededLearnTextHelpfulDifficultToNavImproveQualityIrrelevantConfusedContinueUse 1 1010 A develop questions; planning methods or write and publish protocols; develop and run search; select studies; collect data; assess RoB; analyse data; interprete findings; write and publish review 9 10+ Yes 5 1 4 5 1 5 1 4 1 4 3 2 5 1 1010 NOA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 1100 A planning methods or write and publish protocols; develop and run search; select studies; collect data; assess RoB; analyse data; interprete findings; write and publish review 8 5to10 Yes 5 1 4 5 2 5 2 5 1 4 3 1 5 2 1100 NOA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 101 A develop questions; planning methods or write and publish protocols; develop and run search; select studies; collect data; assess RoB; analyse data; interprete findings; write and publish review 9 5to10 Yes 5 4 5 5 2 2 2 5 1 5 2 2 5 3 101 NOA NA NA NA NA NA NA NA NA NA NA NA NA NA NA In [77]: ss3 <- c(1, 3, 8:23) clean_q2 <- subset(data_q2, select=ss3) head(clean_q2)  ParticipantNoConditionUseFrequentlyComplexEasyToUseNeedSupportWellIntegratedInconsistencyUseQuicklyCumbersomeConfidentNeededLearnTextHelpfulDifficultToNavImproveQualityIrrelevantConfusedContinueUse 1 A 5 1 4 1 4 2 5 1 5 1 4 1 4 3 2 5 1 NOANA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 A 5 1 4 1 5 1 5 2 5 2 5 1 4 3 1 5 2 NOANA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 A 5 4 5 2 5 2 5 2 2 2 5 1 5 2 2 5 3 NOANA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA In [78]: clean_q2 <- subset(clean_q2, Condition=="A")  In [79]: head(clean_q2)  ParticipantNoConditionUseFrequentlyComplexEasyToUseNeedSupportWellIntegratedInconsistencyUseQuicklyCumbersomeConfidentNeededLearnTextHelpfulDifficultToNavImproveQualityIrrelevantConfusedContinueUse 11A5141425151414325 32A5141515252514315 53A5452525222515225 74A5254525542524225 95A5252514141313324 116A5333245211255125 In [80]: clean_q2_A <- subset(clean_q2, Condition=='A') clean_q2_NOA <- subset(clean_q2, Condition=='NOA')  In [81]: head(clean_q2_A)  ParticipantNoConditionUseFrequentlyComplexEasyToUseNeedSupportWellIntegratedInconsistencyUseQuicklyCumbersomeConfidentNeededLearnTextHelpfulDifficultToNavImproveQualityIrrelevantConfusedContinueUse 11A5141425151414325 32A5141515252514315 53A5452525222515225 74A5254525542524225 95A5252514141313324 116A5333245211255125 ## 6.2 Qualitative Analysis (likert scales diagram)¶ In [82]: str(clean_q2_A)  'data.frame': 41 obs. of 18 variables:$ ParticipantNo : int  1 2 3 4 5 6 7 8 9 10 ...
$Condition : Factor w/ 2 levels "A","NOA": 1 1 1 1 1 1 1 1 1 1 ...$ UseFrequently : int  5 5 5 5 5 5 4 1 2 4 ...
$Complex : int 1 1 4 2 2 3 1 3 2 1 ...$ EasyToUse     : int  4 4 5 5 5 3 5 4 2 4 ...
$NeedSupport : int 1 1 2 4 2 3 2 1 1 1 ...$ WellIntegrated: int  4 5 5 5 5 2 4 4 4 4 ...
$Inconsistency : int 2 1 2 2 1 4 3 1 3 1 ...$ UseQuickly    : int  5 5 5 5 4 5 4 3 3 4 ...
$Cumbersome : int 1 2 2 5 1 2 2 4 2 1 ...$ Confident     : int  5 5 2 4 4 1 4 2 3 4 ...
$NeededLearn : int 1 2 2 2 1 1 2 2 3 1 ...$ TextHelpful   : int  4 5 5 5 3 2 3 5 4 3 ...
$DifficultToNav: int 1 1 1 2 1 5 2 3 1 1 ...$ ImproveQuality: int  4 4 5 4 3 5 3 3 5 2 ...
$Irrelevant : int 3 3 2 2 3 1 4 2 2 4 ...$ Confused      : int  2 1 2 2 2 2 3 2 2 4 ...
$ContinueUse : int 5 5 5 5 4 5 4 4 2 4 ...  In [83]: # need to change this to factors! clean_q2_A$UseFrequently = factor(clean_q2_A$UseFrequently, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$Complex = factor(clean_q2_A$Complex, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$EasyToUse = factor(clean_q2_A$EasyToUse, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$NeedSupport = factor(clean_q2_A$NeedSupport, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$WellIntegrated = factor(clean_q2_A$WellIntegrated, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$Inconsistency = factor(clean_q2_A$Inconsistency, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$UseQuickly = factor(clean_q2_A$UseQuickly, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$Cumbersome = factor(clean_q2_A$Cumbersome, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$Confident = factor(clean_q2_A$Confident, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$NeededLearn = factor(clean_q2_A$NeededLearn, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$TextHelpful = factor(clean_q2_A$TextHelpful, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$DifficultToNav = factor(clean_q2_A$DifficultToNav, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$ImproveQuality = factor(clean_q2_A$ImproveQuality, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$Irrelevant = factor(clean_q2_A$Irrelevant, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$Confused = factor(clean_q2_A$Confused, levels = c("1", "2", "3", "4", "5"), ordered = TRUE) clean_q2_A$ContinueUse = factor(clean_q2_A$ContinueUse, levels = c("1", "2", "3", "4", "5"), ordered = TRUE)  In [84]: # checkl again str(clean_q2_A)  'data.frame': 41 obs. of 18 variables:$ ParticipantNo : int  1 2 3 4 5 6 7 8 9 10 ...
$Condition : Factor w/ 2 levels "A","NOA": 1 1 1 1 1 1 1 1 1 1 ...$ UseFrequently : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 5 5 5 5 4 1 2 4 ...
$Complex : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 4 2 2 3 1 3 2 1 ...$ EasyToUse     : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 4 5 5 5 3 5 4 2 4 ...
$NeedSupport : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 2 4 2 3 2 1 1 1 ...$ WellIntegrated: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 5 5 5 5 2 4 4 4 4 ...
$Inconsistency : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 2 1 2 2 1 4 3 1 3 1 ...$ UseQuickly    : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 5 5 4 5 4 3 3 4 ...
$Cumbersome : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 2 2 5 1 2 2 4 2 1 ...$ Confident     : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 2 4 4 1 4 2 3 4 ...
$NeededLearn : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 2 2 2 1 1 2 2 3 1 ...$ TextHelpful   : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 5 5 5 3 2 3 5 4 3 ...
$DifficultToNav: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 1 2 1 5 2 3 1 1 ...$ ImproveQuality: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 4 5 4 3 5 3 3 5 2 ...
$Irrelevant : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 3 3 2 2 3 1 4 2 2 4 ...$ Confused      : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 2 1 2 2 2 2 3 2 2 4 ...
\$ ContinueUse   : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 5 5 4 5 4 4 2 4 ...

In [85]:
summary(clean_q2_A)

 ParticipantNo Condition UseFrequently Complex   EasyToUse NeedSupport
Min.   : 1    A  :41    1   : 1       1   :24   1   : 3   1   :25
1st Qu.:11    NOA: 0    2   : 5       2   :10   2   : 1   2   :11
Median :21              3   : 8       3   : 4   3   : 6   3   : 2
Mean   :21              4   :10       4   : 1   4   :13   4   : 1
3rd Qu.:31              5   :15       5   : 0   5   :16   5   : 0
Max.   :41              NA's: 2       NA's: 2   NA's: 2   NA's: 2
WellIntegrated Inconsistency UseQuickly Cumbersome Confident NeededLearn
1   : 1        1   :15       1   : 0    1   :20    1   : 2   1   :17
2   : 2        2   :13       2   : 2    2   :11    2   : 5   2   :15
3   : 7        3   : 7       3   : 2    3   : 5    3   : 8   3   : 3
4   :17        4   : 4       4   :17    4   : 2    4   :12   4   : 2
5   :12        5   : 0       5   :18    5   : 1    5   :12   5   : 2
NA's: 2        NA's: 2       NA's: 2    NA's: 2    NA's: 2   NA's: 2
TextHelpful DifficultToNav ImproveQuality Irrelevant Confused  ContinueUse
1   : 1     1   :24        1   : 0        1   : 4    1   :12   1   : 0
2   : 4     2   : 7        2   : 6        2   :12    2   :16   2   : 5
3   :12     3   : 4        3   :13        3   :11    3   : 2   3   : 7
4   :12     4   : 3        4   :11        4   : 8    4   : 8   4   :12
5   :10     5   : 1        5   : 9        5   : 4    5   : 1   5   :15
NA's: 2     NA's: 2        NA's: 2        NA's: 2    NA's: 2   NA's: 2    
In [86]:
#need to remove the 'out of level columns'
ss4 <- c(3:18)
final_data <- subset(clean_q2_A, select=ss4)
summary(final_data)

 UseFrequently Complex   EasyToUse NeedSupport WellIntegrated Inconsistency
1   : 1       1   :24   1   : 3   1   :25     1   : 1        1   :15
2   : 5       2   :10   2   : 1   2   :11     2   : 2        2   :13
3   : 8       3   : 4   3   : 6   3   : 2     3   : 7        3   : 7
4   :10       4   : 1   4   :13   4   : 1     4   :17        4   : 4
5   :15       5   : 0   5   :16   5   : 0     5   :12        5   : 0
NA's: 2       NA's: 2   NA's: 2   NA's: 2     NA's: 2        NA's: 2
UseQuickly Cumbersome Confident NeededLearn TextHelpful DifficultToNav
1   : 0    1   :20    1   : 2   1   :17     1   : 1     1   :24
2   : 2    2   :11    2   : 5   2   :15     2   : 4     2   : 7
3   : 2    3   : 5    3   : 8   3   : 3     3   :12     3   : 4
4   :17    4   : 2    4   :12   4   : 2     4   :12     4   : 3
5   :18    5   : 1    5   :12   5   : 2     5   :10     5   : 1
NA's: 2    NA's: 2    NA's: 2   NA's: 2     NA's: 2     NA's: 2
ImproveQuality Irrelevant Confused  ContinueUse
1   : 0        1   : 4    1   :12   1   : 0
2   : 6        2   :12    2   :16   2   : 5
3   :13        3   :11    3   : 2   3   : 7
4   :11        4   : 8    4   : 8   4   :12
5   : 9        5   : 4    5   : 1   5   :15
NA's: 2        NA's: 2    NA's: 2   NA's: 2    
In [87]:
results <- likert(final_data)

In [88]:
# Legend: 5 = Strongly agree 1 = Stronly disagree
plot(results, type='bar')

In [89]:
# Alternative hearmap graph
plot(results,
type="heat",
low.color = "white",
high.color = "blue",
text.color = "black",
text.size = 4,
wrap = 50)