Machine learning to help researchers evaluate biases in clinical trials: a prospective, randomized user study

Authors list:
Frank Soboczenski (1),
Thomas Trikalinos (2),
Joel Kuiper (3),
Randolph G Bias (4),
Byron C Wallace (5),
Iain J Marshall (1)

(1) School of Population Health and Life Sciences, King's College London, London, UK
(2) Center for Evidence-based Medicine, Brown University, Providence, USA
(3) Vortext Systems, Groningen, Netherlands
(4) School of Information, University of Texas at Austin, Austin, USA
(5) College of Computer and Information Science, Northeastern University, Boston, USA

The System used and evaluated in this study can be found here: RobotReviewer User Study

1. Main Analysis

This Notebook is unsing an R Kernel!
First setting up the R environment:

In [9]:
#set specific working directory uncomment to refer to the data files
#setwd("~/Desktop/R_STUFF")

# load performance improvement libraries & enable just in time compiler
library(compiler)
enableJIT(1)

#some environmental variables (decimals)
options=7
1
In [10]:
#check if required packages are there - if not the script will install them!
requiredPackages = c('rcompanion','gdata','compiler','car','lsr','sft','nlme', 'lme4', 'bibtex', 'psych', 'likert', 'ggplot2', 'tidyverse')
for(p in requiredPackages){
  if(!require(p,character.only = TRUE)) install.packages(p);
  library(p,character.only = TRUE);
}
#load all the libraries    
lapply(requiredPackages, require, character.only = TRUE);

# !!!! NOTE: run this cell twice for decluttering
  1. TRUE
  2. TRUE
  3. TRUE
  4. TRUE
  5. TRUE
  6. TRUE
  7. TRUE
  8. TRUE
  9. TRUE
  10. TRUE
  11. TRUE
  12. TRUE
  13. TRUE
In [11]:
# Importing the data 
data <- read.xls("data/TimeAnalysis2_1.xlsx", verbose=FALSE, na.strings=c("NA"))
In [12]:
# quick check on the data
head(data)
ParticipantNoIDPDFOrderTimeConditionCExperienceTasksNReviewsAMeanNReviewsAC2AL2AC3AL3AC4AL4CommentsTotalAddedTotalDeletedTotalSubmitted
1 -JjAzwjQakZk-3kIbeMfw U2PXr4iMBf8Kt-leQI8Ii 1 435 1 1 9 10+ 12.5 Yes Yes Yes Yes Yes Yes C1:2added, C2:2added,C3:3added,C4:1added 8 0 20
1 -JjAzwjQakZk-3kIbeMfw 4gIe5sni3WJJCzNglSdo8 2 582 0 1 9 10+ 12.5 Yes No Yes No Yes No C1:1added, C2:1added, C3:1added, C4:2added5 0 5
1 -JjAzwjQakZk-3kIbeMfw iHbGgWtrKNfksdoj9Hxv9 3 297 1 1 9 10+ 12.5 No Yes Yes Yes Yes Yes C1:1added, C2:1added, C4:1added 3 0 15
1 -JjAzwjQakZk-3kIbeMfw cnDXl97I_WoVUqyIH0HbQ 4 610 0 1 9 10+ 12.5 No No No No Yes No C1:1added, C2:0added, C3:0added 1 0 1
2 hNQKTiKtHSY_BmxLyZb9Q XOCbiBddVQK3BYy1Ox4lI 1 306 1 1 8 5to10 7.5 No Yes No Yes No Yes C1:1added 1 0 16
2 hNQKTiKtHSY_BmxLyZb9Q oOAy7INgRLumTV3vZWtVM 2 127 1 1 8 5to10 7.5 No Yes No Yes No Yes 0 0 15

Short comments on the data:
ID = individual participant
PDF = document ID
Order = document order of appearance
Time = Time spent on one document in seconds
Condition = Independent variable (1 = Machine learning recommendations (MLR) present, 0 = no MLR)
CExperience = Experience with Cochrance Risk of Bias Tool? (1 = Yes, 0 = No)
Tasks = Number of tasks (in a systematic review) performed (9 = max)
NReviews = Number of systematic reviews performed
AMeanNReviews = Artithmetic Mean number of systematic reviews performed (0=0, 1-5=3, 5-10=7.5, 10+=12.5)
Error = reported errors (ignore for now)

1.1 Data Shape

In [13]:
# first we'll look at data: the histogram
hist(data$Time)
In [14]:
# However we chose to use a log representation (lot=log of time) of our data to account for 
lot <- log(data$Time)
hist(lot)
In [15]:
# we perform a shapiro-wilk test to establish if the data is parametric or not:
shapiro.test(data$Time)
	Shapiro-Wilk normality test

data:  data$Time
W = 0.69672, p-value < 2.2e-16

The Shapiro-Wilk test shows significance ($W=0.69$, $p<0.001$) which means it is highly unlikely that the data here was sampled from a normal distribution. Hence our data is non-parametric.

As this analysis is looking for differences between the two groups (machine-leanring and No-machine-learning) and as the study followed a within-jubjects design, a Wilcoxon Rank-Sum test is the suitable test for an initial overview of the data.

An alpha level of .05 was used for all statistical tests. A Shapiro-Wilk test showed that the timing data does not follow a normal distribution $W=0.69$, $p<0.001$. The subsequent Wilcoxon Rank-Sum test showed that there was a significant difference in time between the participants in the semi-automated (machine-learning) and the manual (non-machine learning) condition $W(164)=$ 13530, $p<$ 0.001.

In [16]:
# We used th log of the timeing data for all subsequent analyses 
T_log <- log(data$Time)
# In this case the Wilcox test does not care about Log or not as it uses Rankings of bins
# Wilcox text on the Log Time of the data 
wilcox.test(T_log, data$Condition, paired = TRUE)
# We also looked at the Timing differences by documents
#wilcox.test(T_log, data$PDF, paired = TRUE)
# As well as differences in time in respect to the order 
wilcox.test(T_log, data$Order, paired = TRUE)
	Wilcoxon signed rank test with continuity correction

data:  T_log and data$Condition
V = 13530, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
	Wilcoxon signed rank test with continuity correction

data:  T_log and data$Order
V = 13530, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
In [17]:
# We'll have a look at the boxplots now:
# For timing data (Logarithmic scale) by Condition (ML or Non-ML)
boxplot(T_log~data$Condition, xlab='Condition 0=Manual, 1=Semi-Automated', ylab="Time in Seconds (Log)", main='Overall timing by condition')
#boxplot(data$Time~data$Condition, xlab='Condition', ylab="Time in Seconds", main=axis(1, at=0:1, labels=(["N", "M"]))
In [18]:
# also by looking at the scatterplot
scatterplot(T_log ~ Condition, data=data)

Eyeballing...there is a slight downward tendencey of time towards the MLR (1.0) condition.

In [19]:
# We also perfomed a t-test (paired because our within-participants design) to 
# check robustness
t.test(T_log, data$Condition, paired = TRUE, alternative = "two.sided")
	Paired t-test

data:  T_log and data$Condition
t = 73.028, df = 163, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 5.668718 5.983796
sample estimates:
mean of the differences 
               5.826257 

The t-test also confirms the result is highly significant! The t-test shows that the two groups differ significatnly from each other $t(163) = 73.02, p < 0.001$

In [20]:
# before we carry on, lets have a look if there is a difference of timing in terms of the 
# documents. First the box plot:
boxplot(T_log~data$PDF)
In [25]:
#Also looking at the order here First the box plot:
boxplot(data$Time~data$Order, xlab="Order of the Documents", ylab="Time in Seconds", main="Overall Time by Order")

This nicely shows the incease in time spent on a PDF from the 1st Document participants see to the 4th.

In [23]:
# Now the same plot in time (Log scale) and separated by condition 
boxplot(T_log~data$Order*data$Condition, xlab='0.X = Manual conditions, 1.X = Semi-Automated conditions', ylab="Time in Seconds (Log)", main='Order by Time Separated by Condition')

We can see a almost linear increase in the timing data on the non-machine learning side (1.0, 2.0, 3.0, 4.0) compared to the machine-learning side. Note, as this was a within-participants design fatigue effects are expected and can be seen here in form of the last plot 4.0 & 4.1. The timing in the machine-learning condition almost stays at a constant level except the expected last box.

In [26]:
# Again for robustness we also performed a t-test:
t.test(T_log, data$Order, paired = TRUE, alternative = "two.sided")
	Paired t-test

data:  T_log and data$Order
t = 44.277, df = 163, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 3.655618 3.996895
sample estimates:
mean of the differences 
               3.826257 

The t-test shows that the Order is significantly different in terms of timing data $t(163) = 44.28, p < 0.001$

2. Descriptives

In [30]:
# creating two groups (ML (1) and NonML (0) )
Annotations <- subset(data, Condition=='1')
NoAnnotations <- subset(data, Condition=='0')
In [31]:
# checking the data
head(Annotations)
ParticipantNoIDPDFOrderTimeConditionCExperienceTasksNReviewsAMeanNReviewsAC2AL2AC3AL3AC4AL4CommentsTotalAddedTotalDeletedTotalSubmitted
11 -JjAzwjQakZk-3kIbeMfw U2PXr4iMBf8Kt-leQI8Ii 1 435 1 1 9 10+ 12.5 Yes Yes Yes Yes Yes Yes C1:2added, C2:2added,C3:3added,C4:1added 8 0 20
31 -JjAzwjQakZk-3kIbeMfw iHbGgWtrKNfksdoj9Hxv9 3 297 1 1 9 10+ 12.5 No Yes Yes Yes Yes Yes C1:1added, C2:1added, C4:1added 3 0 15
52 hNQKTiKtHSY_BmxLyZb9Q XOCbiBddVQK3BYy1Ox4lI 1 306 1 1 8 5to10 7.5 No Yes No Yes No Yes C1:1added 1 0 16
62 hNQKTiKtHSY_BmxLyZb9Q oOAy7INgRLumTV3vZWtVM 2 127 1 1 8 5to10 7.5 No Yes No Yes No Yes 0 0 15
103 pyTvUi2W85Nb9handPSvZ gD_JugCFB6D5iV-JIOI0p 2 209 1 1 9 5to10 7.5 No Yes No Yes No Yes 0 0 15
123 pyTvUi2W85Nb9handPSvZ 4gIe5sni3WJJCzNglSdo8 4 756 1 1 9 5to10 7.5 Yes Yes Yes Yes Yes Yes C1:1added all left;C2:1added all left,C3:1added,C4:1added4 0 19
In [32]:
cat("Time Overall: ", sum(data$Time), '\n')
cat("Overall Mean Time: ", mean(data$Time), '\n')
cat("Overall SD Time: ", sd(data$Time), '\n')
cat('_____________________________________________', '\n')
cat("Time in Annotations: ", sum(Annotations$Time), "Time in NoAnnotations: ", sum(NoAnnotations$Time), '\n') 
cat("Mean Annotations: ", mean(Annotations$Time), "Mean NoAnnotations: ", mean(NoAnnotations$Time), '\n') 
cat("SD Annotations: ", sd(Annotations$Time), "SD NoAnnotations: ", sd(NoAnnotations$Time), '\n') 
cat('_____________________________________________', '\n')
Time Overall:  129454 
Overall Mean Time:  789.3537 
Overall SD Time:  794.4597 
_____________________________________________ 
Time in Annotations:  61913 Time in NoAnnotations:  67541 
Mean Annotations:  755.0366 Mean NoAnnotations:  823.6707 
SD Annotations:  868.4611 SD NoAnnotations:  716.6 
_____________________________________________ 

Forty-one participants were recruited. All except four had experience of at least one systematic review and all but eight were familiar with the Cochrane Risk of Bias tool. Twenty listed more than one task on how they contributed to previsous systematic reviews.

A mean of 755 seconds (SD 868) were taken for semi-automated bias assesmsents and 824 seconds (SD 717) for manual assessments ( $p<0.001$). Participants spent in total 129454 seconds ($Mean$=789.35, $SD$=794.46) to complete the study together in both conditions semi-automated and manual.

3. Tukey Ladder of Powers

The Tukey Ladder of Powers was used to transform the response variable (time) to come closer to a normal distribution. The resulting $\lambda =$ -0.15 is close to 0 which confirms that the optimal transformation according to the Tukey Ladder of Powers is indeed a logarithmic operation. For the following mixed model analysis the timing data was therefore transformed to a logarithmic scale.

In [33]:
# Quick look at the data 
plotNormalHistogram(data$Time)
In [34]:
# cheching the data transformation:
T_tuk = transformTukey(data$Time, plotit=FALSE)
    lambda      W Shapiro.p.value
395  -0.15 0.9907          0.3626

if (lambda >  0){TRANS = x ^ lambda} 
if (lambda == 0){TRANS = log(x)} 
if (lambda <  0){TRANS = -1 * x ^ lambda} 

In [35]:
# plotting the transformed data
plotNormalHistogram(T_tuk)
In [36]:
# Now using the log of the data and plotting it:
plotNormalHistogram(T_log)

The Tukey Ladder of Powers was used to transform the response variable (time) to come closer to a normal distribution. The resulting $\lambda =$ -0.15 is close to 0 which confirms that the optimal transformation according to the Tukey Ladder of Powers is indeed a logarithmic operation. For the following mixed model analysis the timing data was therefore transformed to a logarithmic scale.

4. Linear Mixed Effects Model Analysis

A linear mixed effects model was used to examine the associations between the log transformed time response, the semi-automatic or manual condition, the order in which a document was randomly presented and self reported characteristics of the reviewers.

4.1 Primary model analysis

In [37]:
# first add the log time to the data frame
data$logT <- log(data$Time)
# create the primary model: Log(Time) by Condition as fixed effects and ID as random effect:
ml.p = lmer(logT ~ Condition + (1 | ID), data=data)
In [38]:
summary(ml.p)
Linear mixed model fit by REML ['lmerMod']
Formula: logT ~ Condition + (1 | ID)
   Data: data

REML criterion at convergence: 374.2

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.29905 -0.63291  0.02993  0.55999  2.47915 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 0.2085   0.4566  
 Residual             0.4277   0.6540  
Number of obs: 164, groups:  ID, 41

Fixed effects:
            Estimate Std. Error t value
(Intercept)   6.4678     0.1015   63.73
Condition    -0.2832     0.1021   -2.77

Correlation of Fixed Effects:
          (Intr)
Condition -0.503
In [39]:
# looking at coefficients by ID
coef(ml.p)$ID
(Intercept)Condition
-JjAzwjQakZk-3kIbeMfw6.342909 -0.2831514
-O8MP5AR-esmcnFcENyoF6.402884 -0.2831514
-YoLEIFk1XZUJo3BgVmlp6.723705 -0.2831514
2K-K26QZH9xPJH9RVMltM6.419611 -0.2831514
2snvok14kLm_dRIdL67PF6.223803 -0.2831514
3CDOco99k3UGgia7A9LQo7.275245 -0.2831514
4dkYUV8wlkm_Q5miw0qCJ6.865738 -0.2831514
5h9rXuHepqDSmetOKXM0J6.557449 -0.2831514
5SAqbCbqZP_q7hVWTEJdL5.615216 -0.2831514
8arsi19w5sMwMmpkmM0sX6.415119 -0.2831514
C07qGfUuD1L-1IoLjtOXY6.572443 -0.2831514
C4xWyqDJrsHcIoHU5GtUG7.029055 -0.2831514
Ci3GWDLAOc5ANxkPpNOBc6.409193 -0.2831514
cs1qlmKFvdkFjUcXEkJgT6.431466 -0.2831514
FfUZhXrHdKSHZ3ON2_Dty6.667310 -0.2831514
fQoBVJ60uF5ehuYNAg4v26.417978 -0.2831514
FXvg7iTf7P1bl-Z9dHKPO6.431382 -0.2831514
GR96RbbiPLL4gS3-ilxky7.259352 -0.2831514
hNQKTiKtHSY_BmxLyZb9Q6.227614 -0.2831514
j-6IWj13fqVcUXKjFCo7y6.202480 -0.2831514
jBFO8AR45EErIQ5grz5n-6.324240 -0.2831514
jYha2hNra2RfUfgXnVaj16.436849 -0.2831514
KlKCr9O8A-8WK6gr2vK9k6.455434 -0.2831514
KtfB3jYg9jsg0osJkMWVG5.920122 -0.2831514
L-KBZ1ZC0lK2hQlDSUCgH6.018265 -0.2831514
lllK6Q_LLhHbw7bTJ1v516.135699 -0.2831514
pyTvUi2W85Nb9handPSvZ6.191651 -0.2831514
Rh7QrcKwEzMEORahevriS6.846711 -0.2831514
S2oI3sQr3HnSJ7rgbaMJR6.692770 -0.2831514
sLUZSWii-lFp5oEFHCd7f6.310985 -0.2831514
to4PqjXHBIG3bKQhv9cHr6.173613 -0.2831514
tRdDoOfbjIQw3tUkGktky6.736527 -0.2831514
TTtTg_voG4fV-uDraAWC36.302266 -0.2831514
UbIO7NW7RAUjVfe3_VJ486.232454 -0.2831514
uq4HMVa0zTYYknq8_4SOc7.464664 -0.2831514
UyqDH9vL0gK918t2bEj0z6.266225 -0.2831514
V9_3tiOUC-jab7xnPK1HK6.569870 -0.2831514
WpFHCjeR-3YdyEJJmCT2Y6.208212 -0.2831514
ynh1N1_BZO8W4d0_gaMDy6.345564 -0.2831514
ZAj9Y0K4px17TAcPf3uYG6.092636 -0.2831514
ZXpghXhauDD3Vpv76PGoj6.966419 -0.2831514
In [40]:
# the mean values
coef(summary(ml.p))[,"Estimate"]
(Intercept)
6.46783243288911
Condition
-0.283151403792091
In [41]:
# establishing the confidence intervalls:
confint(ml.p)
Computing profile confidence intervals ...
2.5 %97.5 %
.sig01 0.3074827 0.62121739
.sigma 0.5777147 0.74206725
(Intercept) 6.2682066 6.66745826
Condition-0.4840935 -0.08220934

The confidence intervals are -0.48 to -0.08 -> exp(-0.48)=62% to exp(-0.08)=92% )

In [42]:
# main speed up = 100 - 0.75 = 25%
exp(-.28)
0.755783741455725

The primary model ($m_p$) took the log transformed time and the condition semi-automated or manual) as fixed effects and the individuals as random effect into account. Participants performing bias assessments in the semi-automated condition were on average 25% quicker than the participants in the manual condition (95% CI 62% to 92%).

4.2 Exploratory Analysis

Now continuing with Exploratory Analisys (Likelihood Ratio)

In addition, model $m_1$ took also the Order as fixed effect and the document as random effect into account. whereas $m_2$ did only account for the document as random effect to examine the importance of the Order in the model. A close examination for $m_1$ showed that there was a random intercept for every PDF. The random effect variance of the documents is 0.02 (about 1/10th) as high as from the individuals (0.23). Hence, there is more variance across participants than there is across documents. The subsequent likelihood-ratio test showed that the Order was highly significant $\chi^2$(1) = 42.26, $p<$ 0.001. Therefore the Order was kept in the following models.\newline

In [43]:
# What is the effect of the PDF, Order, Condition and Person?
ml.1 = lmer(logT ~ Order + Condition + (1|ID) + (1|PDF), data=data, REML=FALSE)
ml.2 = lmer(logT ~ Condition + (1|ID) + (1|PDF), data=data, REML=FALSE)
In [44]:
summary(ml.1)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: logT ~ Order + Condition + (1 | ID) + (1 | PDF)
   Data: data

     AIC      BIC   logLik deviance df.resid 
   338.2    356.8   -163.1    326.2      158 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.05956 -0.44213 -0.01847  0.60865  2.15485 

Random effects:
 Groups   Name        Variance Std.Dev.
 PDF      (Intercept) 0.02466  0.1571  
 ID       (Intercept) 0.23839  0.4883  
 Residual             0.27691  0.5262  
Number of obs: 164, groups:  PDF, 46; ID, 41

Fixed effects:
            Estimate Std. Error t value
(Intercept)  5.78065    0.13809   41.86
Order        0.27151    0.03781    7.18
Condition   -0.27389    0.08498   -3.22

Correlation of Fixed Effects:
          (Intr) Order 
Order     -0.691       
Condition -0.319  0.012
In [45]:
summary(ml.2)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: logT ~ Condition + (1 | ID) + (1 | PDF)
   Data: data

     AIC      BIC   logLik deviance df.resid 
   378.5    394.0   -184.2    368.5      159 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.29806 -0.63552  0.02841  0.55830  2.49665 

Random effects:
 Groups   Name        Variance  Std.Dev. 
 PDF      (Intercept) 2.306e-15 4.802e-08
 ID       (Intercept) 2.016e-01 4.490e-01
 Residual             4.243e-01 6.514e-01
Number of obs: 164, groups:  PDF, 46; ID, 41

Fixed effects:
            Estimate Std. Error t value
(Intercept)   6.4678     0.1005   64.38
Condition    -0.2832     0.1017   -2.78

Correlation of Fixed Effects:
          (Intr)
Condition -0.506
In [46]:
# Likelihood-Ratio analysis (ANOVA) to see if the Order is important
anova(ml.1, ml.2)
DfAICBIClogLikdevianceChisqChi DfPr(>Chisq)
ml.25 378.4641 393.9634 -184.2321 368.4641 NA NA NA
ml.16 338.1962 356.7954 -163.0981 326.1962 42.26793 1 7.958708e-11

This tells us (and confirms previous overall results) to keep the order:
The Likelihood-Ratio analysis (Anova) between $m_1$ and $m_2$ tells us that the order is highly significant ($p<0.001$).
That is an indication to keep the order in the model.
Let's again take a look at model $m_1$

In [47]:
summary(ml.1)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: logT ~ Order + Condition + (1 | ID) + (1 | PDF)
   Data: data

     AIC      BIC   logLik deviance df.resid 
   338.2    356.8   -163.1    326.2      158 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.05956 -0.44213 -0.01847  0.60865  2.15485 

Random effects:
 Groups   Name        Variance Std.Dev.
 PDF      (Intercept) 0.02466  0.1571  
 ID       (Intercept) 0.23839  0.4883  
 Residual             0.27691  0.5262  
Number of obs: 164, groups:  PDF, 46; ID, 41

Fixed effects:
            Estimate Std. Error t value
(Intercept)  5.78065    0.13809   41.86
Order        0.27151    0.03781    7.18
Condition   -0.27389    0.08498   -3.22

Correlation of Fixed Effects:
          (Intr) Order 
Order     -0.691       
Condition -0.319  0.012

We have a random intercept for every PDF. We can see that the random effect variance of PDF is about 1/10th as high as the variance from the ID. So there is more variance across people than there is across PDF (document) related.
Let's examine a model $m_3$ (removed the PDF random effect):

In [48]:
ml.3 = lmer(logT ~ Order + Condition + (1|ID), data=data, REML=FALSE)
In [49]:
# again likelihood-ratio analysis (ANOVA) now looking at the difference between m_1 and m_3 :
anova(ml.1, ml.3)
DfAICBIClogLikdevianceChisqChi DfPr(>Chisq)
ml.35 336.9199 352.4193 -163.4600326.9199 NA NA NA
ml.16 338.1962 356.7954 -163.0981326.1962 0.723746 1 0.3949179

Model $m_3$ did not have the document as random effect in order to see if it had an influence. The following likelihood-ratio comparisons with $m_1$ showed that there was no significance which suggests that there is little variability from documents and by ignoring this does not detract from the explanatory potential of the model $\chi^2$(1) = 0.72, $p=$ 0.39.

There is no significance! This means we do not care so much for the PDF random effect.
In fact we could loose the PDF random effect without destroying the known universe.
So let's see the random effects coefficients:

In [50]:
coef(ml.3)
$ID
                      (Intercept)     Order  Condition
-JjAzwjQakZk-3kIbeMfw    5.646676 0.2701334 -0.2765628
-O8MP5AR-esmcnFcENyoF    5.715102 0.2701334 -0.2765628
-YoLEIFk1XZUJo3BgVmlp    6.081139 0.2701334 -0.2765628
2K-K26QZH9xPJH9RVMltM    5.734187 0.2701334 -0.2765628
2snvok14kLm_dRIdL67PF    5.510783 0.2701334 -0.2765628
3CDOco99k3UGgia7A9LQo    6.710410 0.2701334 -0.2765628
4dkYUV8wlkm_Q5miw0qCJ    6.243188 0.2701334 -0.2765628
5h9rXuHepqDSmetOKXM0J    5.891452 0.2701334 -0.2765628
5SAqbCbqZP_q7hVWTEJdL    4.816425 0.2701334 -0.2765628
8arsi19w5sMwMmpkmM0sX    5.729063 0.2701334 -0.2765628
C07qGfUuD1L-1IoLjtOXY    5.908558 0.2701334 -0.2765628
C4xWyqDJrsHcIoHU5GtUG    6.429523 0.2701334 -0.2765628
Ci3GWDLAOc5ANxkPpNOBc    5.722301 0.2701334 -0.2765628
cs1qlmKFvdkFjUcXEkJgT    5.747713 0.2701334 -0.2765628
FfUZhXrHdKSHZ3ON2_Dty    6.016796 0.2701334 -0.2765628
fQoBVJ60uF5ehuYNAg4v2    5.732324 0.2701334 -0.2765628
FXvg7iTf7P1bl-Z9dHKPO    5.747617 0.2701334 -0.2765628
GR96RbbiPLL4gS3-ilxky    6.692276 0.2701334 -0.2765628
hNQKTiKtHSY_BmxLyZb9Q    5.515131 0.2701334 -0.2765628
j-6IWj13fqVcUXKjFCo7y    5.486455 0.2701334 -0.2765628
jBFO8AR45EErIQ5grz5n-    5.625375 0.2701334 -0.2765628
jYha2hNra2RfUfgXnVaj1    5.753855 0.2701334 -0.2765628
KlKCr9O8A-8WK6gr2vK9k    5.775059 0.2701334 -0.2765628
KtfB3jYg9jsg0osJkMWVG    5.164302 0.2701334 -0.2765628
L-KBZ1ZC0lK2hQlDSUCgH    5.276278 0.2701334 -0.2765628
lllK6Q_LLhHbw7bTJ1v51    5.410262 0.2701334 -0.2765628
pyTvUi2W85Nb9handPSvZ    5.474099 0.2701334 -0.2765628
Rh7QrcKwEzMEORahevriS    6.221480 0.2701334 -0.2765628
S2oI3sQr3HnSJ7rgbaMJR    6.045844 0.2701334 -0.2765628
sLUZSWii-lFp5oEFHCd7f    5.610252 0.2701334 -0.2765628
to4PqjXHBIG3bKQhv9cHr    5.453520 0.2701334 -0.2765628
tRdDoOfbjIQw3tUkGktky    6.095768 0.2701334 -0.2765628
TTtTg_voG4fV-uDraAWC3    5.600304 0.2701334 -0.2765628
UbIO7NW7RAUjVfe3_VJ48    5.520653 0.2701334 -0.2765628
uq4HMVa0zTYYknq8_4SOc    6.926525 0.2701334 -0.2765628
UyqDH9vL0gK918t2bEj0z    5.559183 0.2701334 -0.2765628
V9_3tiOUC-jab7xnPK1HK    5.905622 0.2701334 -0.2765628
WpFHCjeR-3YdyEJJmCT2Y    5.492995 0.2701334 -0.2765628
ynh1N1_BZO8W4d0_gaMDy    5.649705 0.2701334 -0.2765628
ZAj9Y0K4px17TAcPf3uYG    5.361130 0.2701334 -0.2765628
ZXpghXhauDD3Vpv76PGoj    6.358060 0.2701334 -0.2765628

attr(,"class")
[1] "coef.mer"
In [51]:
summary(ml.3)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: logT ~ Order + Condition + (1 | ID)
   Data: data

     AIC      BIC   logLik deviance df.resid 
   336.9    352.4   -163.5    326.9      159 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.19339 -0.52439 -0.01335  0.53696  2.27474 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 0.2320   0.4817  
 Residual             0.3027   0.5501  
Number of obs: 164, groups:  ID, 41

Fixed effects:
            Estimate Std. Error t value
(Intercept)  5.78920    0.13663   42.37
Order        0.27013    0.03843    7.03
Condition   -0.27656    0.08592   -3.22

Correlation of Fixed Effects:
          (Intr) Order 
Order     -0.707       
Condition -0.322  0.011

The variance of coefficients 0.23
Do different people have a different coefficient when it comes to the Machine learning (random slope)?
One option to examine this is to include the condition also as random effect. So there is an overall mean for the machine learning, but how variable does it seem to be?

In [52]:
# including the condition also as random effect in model m_3
ml.3 = lmer(logT ~ Order + Condition + (Condition | ID), data=data, REML=FALSE)
In [53]:
summary(ml.3)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: logT ~ Order + Condition + (Condition | ID)
   Data: data

     AIC      BIC   logLik deviance df.resid 
   328.0    349.7   -157.0    314.0      157 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.3985 -0.4850  0.0052  0.5775  2.2962 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 ID       (Intercept) 0.1487   0.3856       
          Condition   0.2000   0.4473   0.29
 Residual             0.2361   0.4859       
Number of obs: 164, groups:  ID, 41

Fixed effects:
            Estimate Std. Error t value
(Intercept)  5.76592    0.12264   47.01
Order        0.27940    0.03678    7.60
Condition   -0.27634    0.10314   -2.68

Correlation of Fixed Effects:
          (Intr) Order 
Order     -0.753       
Condition -0.138  0.009

There seems to be variability in the slope. So different people seem to react differently to the machine learning condition. It seems that the residual variance has dropped between the first $m_3$ and updated $m_3$ models: It was .40 in the first $m_3$ now in the updated $m_3$ it is .24. Perhaps this model explains a bit more. Let's take a look at the coefficients:

In [54]:
# coefficients by ID
coef(ml.3)$ID
(Intercept)OrderCondition
-JjAzwjQakZk-3kIbeMfw5.645644 0.2794002 -0.31868291
-O8MP5AR-esmcnFcENyoF5.689063 0.2794002 -0.24591965
-YoLEIFk1XZUJo3BgVmlp6.018559 0.2794002 -0.21255577
2K-K26QZH9xPJH9RVMltM5.758266 0.2794002 -0.43458312
2snvok14kLm_dRIdL67PF5.533983 0.2794002 -0.37010571
3CDOco99k3UGgia7A9LQo6.471687 0.2794002 0.25956400
4dkYUV8wlkm_Q5miw0qCJ6.085542 0.2794002 0.09094654
5h9rXuHepqDSmetOKXM0J5.825058 0.2794002 -0.14658113
5SAqbCbqZP_q7hVWTEJdL5.065488 0.2794002 -1.00634718
8arsi19w5sMwMmpkmM0sX5.665417 0.2794002 -0.11211088
C07qGfUuD1L-1IoLjtOXY5.907933 0.2794002 -0.39198176
C4xWyqDJrsHcIoHU5GtUG6.114986 0.2794002 0.61406152
Ci3GWDLAOc5ANxkPpNOBc5.721248 0.2794002 -0.33934149
cs1qlmKFvdkFjUcXEkJgT5.696203 0.2794002 -0.16163754
FfUZhXrHdKSHZ3ON2_Dty5.907440 0.2794002 -0.02371297
fQoBVJ60uF5ehuYNAg4v25.799538 0.2794002 -0.59194148
FXvg7iTf7P1bl-Z9dHKPO5.716290 0.2794002 -0.23548329
GR96RbbiPLL4gS3-ilxky6.409160 0.2794002 0.42701709
hNQKTiKtHSY_BmxLyZb9Q5.593803 0.2794002 -0.57432025
j-6IWj13fqVcUXKjFCo7y5.540374 0.2794002 -0.47586129
jBFO8AR45EErIQ5grz5n-5.589154 0.2794002 -0.18404910
jYha2hNra2RfUfgXnVaj15.650462 0.2794002 0.02656450
KlKCr9O8A-8WK6gr2vK9k5.710093 0.2794002 -0.11988919
KtfB3jYg9jsg0osJkMWVG5.223300 0.2794002 -0.40611143
L-KBZ1ZC0lK2hQlDSUCgH5.348445 0.2794002 -0.48501901
lllK6Q_LLhHbw7bTJ1v515.418836 0.2794002 -0.28901104
pyTvUi2W85Nb9handPSvZ5.496255 0.2794002 -0.35622486
Rh7QrcKwEzMEORahevriS6.114006 0.2794002 -0.08672533
S2oI3sQr3HnSJ7rgbaMJR5.908495 0.2794002 0.07077306
sLUZSWii-lFp5oEFHCd7f5.602679 0.2794002 -0.28475114
to4PqjXHBIG3bKQhv9cHr5.475240 0.2794002 -0.34898562
tRdDoOfbjIQw3tUkGktky6.095288 0.2794002 -0.44385065
TTtTg_voG4fV-uDraAWC35.697105 0.2794002 -0.66402711
UbIO7NW7RAUjVfe3_VJ485.689982 0.2794002 -0.90763414
uq4HMVa0zTYYknq8_4SOc6.665880 0.2794002 0.28053689
UyqDH9vL0gK918t2bEj0z5.595343 0.2794002 -0.43080896
V9_3tiOUC-jab7xnPK1HK5.849366 0.2794002 -0.18756661
WpFHCjeR-3YdyEJJmCT2Y5.609388 0.2794002 -0.70630849
ynh1N1_BZO8W4d0_gaMDy5.831413 0.2794002 -0.98833027
ZAj9Y0K4px17TAcPf3uYG5.459037 0.2794002 -0.60249247
ZXpghXhauDD3Vpv76PGoj6.207454 0.2794002 0.03368090
In [55]:
# histogram about those conditions:
hist(coef(ml.3)$ID[,"Condition"])

So there seems to be some people who do it faster and some slower so there seems to be some heterogeneity. Model $m_4$ adds the random effect per person:

In [56]:
ml.4 = lmer(logT ~ Order + Condition + (Condition|ID), data=data, REML=FALSE)
In [57]:
# again likelihood-ratio (ANOVA) to see the differences between models:
anova(ml.4, ml.3)
DfAICBIClogLikdevianceChisqChi DfPr(>Chisq)
ml.47 328.0184 349.7175 -157.0092314.0184 NA NA NA
ml.37 328.0184 349.7175 -157.0092314.0184 0 0 1

Not significant: $\chi^2$(0) = 0.0, $p=$ 1.

In [58]:
# Do we need the Order if we have the condition as a random effect?
ml.5 = lmer(logT ~ Condition + (Condition|ID), data=data, REML=FALSE)
In [59]:
# Differences in models:
anova(ml.4, ml.5)
DfAICBIClogLikdevianceChisqChi DfPr(>Chisq)
ml.56 372.7248 391.3240 -180.3624 360.7248 NA NA NA
ml.47 328.0184 349.7175 -157.0092 314.0184 46.70645 1 8.245712e-12

Yes, significant! The order is still important: $\chi^2$(1) = 46.70, $p<$ 0.001.

In [60]:
# examining model m_4
summary(ml.4)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: logT ~ Order + Condition + (Condition | ID)
   Data: data

     AIC      BIC   logLik deviance df.resid 
   328.0    349.7   -157.0    314.0      157 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.3985 -0.4850  0.0052  0.5775  2.2962 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 ID       (Intercept) 0.1487   0.3856       
          Condition   0.2000   0.4473   0.29
 Residual             0.2361   0.4859       
Number of obs: 164, groups:  ID, 41

Fixed effects:
            Estimate Std. Error t value
(Intercept)  5.76592    0.12264   47.01
Order        0.27940    0.03678    7.60
Condition   -0.27634    0.10314   -2.68

Correlation of Fixed Effects:
          (Intr) Order 
Order     -0.753       
Condition -0.138  0.009
In [61]:
# examining model m_5
summary(ml.5)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: logT ~ Condition + (Condition | ID)
   Data: data

     AIC      BIC   logLik deviance df.resid 
   372.7    391.3   -180.4    360.7      158 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.34127 -0.62213  0.05227  0.57998  2.21132 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 ID       (Intercept) 0.1069   0.3270       
          Condition   0.1603   0.4003   0.52
 Residual             0.3708   0.6090       
Number of obs: 164, groups:  ID, 41

Fixed effects:
            Estimate Std. Error t value
(Intercept)  6.46783    0.08444   76.60
Condition   -0.28315    0.11381   -2.49

Correlation of Fixed Effects:
          (Intr)
Condition -0.298

Interestingly, both of the models have the same variance with different sign (indication for colinearty?). However, the order is not exactly random. Each prticipant has two documents of each. It is a random ordering. Each participant will have for sure two ML and two Non-ML conditions / documents.

After we remove the order in model $m_5$ the condition variance is roughly the same. Before was -27 now is -28.

5. Reviewer Jugements & Annotations Analysis

5.1 Descriptives

In [62]:
# Some descriptitves
# TOTAL first (Annotations & NonAnnotations together)
cat('____________TOTAL DATA:______________________', '\n')
cat("Total Annotations added:", sum(data$TotalAdded), '\n')
cat("Mean Total Annotations added:", mean(data$TotalAdded), '\n')
cat("SD Total Annotations added:", sd(data$TotalAdded), '\n')
cat('_____________________________________________', '\n')

cat("Total Annotations deleted:", sum(data$TotalDeleted), '\n')
cat("Mean Total Annotations deleted:", mean(data$TotalDeleted), '\n')
cat("SD Total Annotations deleted:", sd(data$TotalDeleted), '\n')
cat('\n')


cat('____________MACHINE LEARNING:________________', '\n')

# Now only for Machine-Learning Condition
cat("Total Annotations added:", sum(Annotations$TotalAdded), '\n')
cat("Mean Total Annotations added:", mean(Annotations$TotalAdded), '\n')
cat("SD Total Annotations added:", sd(Annotations$TotalAdded), '\n')
cat('_____________________________________________', '\n')

cat("Total Annotations deleted:", sum(Annotations$TotalDeleted), '\n')
cat("Mean Total Annotations deleted:", mean(Annotations$TotalDeleted), '\n')
cat("SD Total Annotations deleted:", sd(Annotations$TotalDeleted), '\n')
cat('\n')

cat('____________NON-MACHINE LEARNING: ___________', '\n')

# And for the Non-Machine-Learning Condition
cat("Total Annotations added:", sum(NoAnnotations$TotalAdded), '\n')
cat("Mean Total Annotations added:", mean(NoAnnotations$TotalAdded), '\n')
cat("SD Total Annotations added:", sd(NoAnnotations$TotalAdded), '\n')
cat('_____________________________________________', '\n')

cat("Total Annotations deleted:", sum(NoAnnotations$TotalDeleted), '\n')
cat("Mean Total Annotations deleted:", mean(NoAnnotations$TotalDeleted), '\n')
cat("SD Total Annotations deleted:", sd(NoAnnotations$TotalDeleted), '\n')
____________TOTAL DATA:______________________ 
Total Annotations added: 486 
Mean Total Annotations added: 2.963415 
SD Total Annotations added: 3.033333 
_____________________________________________ 
Total Annotations deleted: 127 
Mean Total Annotations deleted: 0.7743902 
SD Total Annotations deleted: 1.957607 

____________MACHINE LEARNING:________________ 
Total Annotations added: 103 
Mean Total Annotations added: 1.256098 
SD Total Annotations added: 1.755492 
_____________________________________________ 
Total Annotations deleted: 127 
Mean Total Annotations deleted: 1.54878 
SD Total Annotations deleted: 2.549037 

____________NON-MACHINE LEARNING: ___________ 
Total Annotations added: 383 
Mean Total Annotations added: 4.670732 
SD Total Annotations added: 3.087429 
_____________________________________________ 
Total Annotations deleted: 0 
Mean Total Annotations deleted: 0 
SD Total Annotations deleted: 0 

5.2 Descriptives Self-Reported characteristics

In [63]:
# using a subset of the above main data here:
data_s <- read.xls("data/subset_selfreported.xlsx", verbose=FALSE, na.strings=c("NA"))
# and filtering out douplicates
data_s2 <- unique(data_s)
In [64]:
# quick check on the data
head(data_s2)
IDCExperienceTasksNReviews
1-JjAzwjQakZk-3kIbeMfw1 9 10+
5hNQKTiKtHSY_BmxLyZb9Q1 8 5to10
9pyTvUi2W85Nb9handPSvZ1 9 5to10
132K-K26QZH9xPJH9RVMltM1 9 10+
178arsi19w5sMwMmpkmM0sX1 9 10+
21GR96RbbiPLL4gS3-ilxky1 9 1to5
In [65]:
# Calculating Descriptive Self-reported characteristics data:
NumberOfReviews0 <- sum(data_s2$NReviews=='0')
NumberOfReviews1_5 <- sum(data_s2$NReviews=='1to5')
NumberOfReviews5_0 <- sum(data_s2$NReviews=='5to10')
NumberOfReviews10plus <- sum(data_s2$NReviews=='10+')
NumberOfReviews0 
NumberOfReviews1_5 
NumberOfReviews5_0
NumberOfReviews10plus

NumberOfReviews0_percentage <- NumberOfReviews0/41
NumberOfReviews1_5_percentage <- NumberOfReviews1_5/41
NumberOfReviews5_0_percentage <- NumberOfReviews5_0/41
NumberOfReviews10plus_percentage <- NumberOfReviews10plus/41
NumberOfReviews0_percentage 
NumberOfReviews1_5_percentage
NumberOfReviews5_0_percentage
NumberOfReviews10plus_percentage

TotalSumOfPeopleWithCochraneExperience <- sum(data_s2$CExperience=='1')
TotalSumOfPeopleWithoutCochraneExperience <- sum(data_s2$CExperience=='0')
TotalSumOfPeopleWithCochraneExperience
TotalSumOfPeopleWithoutCochraneExperience

Percentage_TotalSumOfPeopleWithCochraneExperience <- TotalSumOfPeopleWithCochraneExperience/41
Percentage_TotalSumOfPeopleWithoutCochraneExperience <- TotalSumOfPeopleWithoutCochraneExperience/41
Percentage_TotalSumOfPeopleWithCochraneExperience
Percentage_TotalSumOfPeopleWithoutCochraneExperience

MedianOfTasksPerfomed <- median(data_s2$Tasks)
IRQofTasksPerformed <- quantile(data_s2$Tasks)
MedianOfTasksPerfomed
IRQofTasksPerformed
5
9
12
15
0.121951219512195
0.219512195121951
0.292682926829268
0.365853658536585
32
9
0.780487804878049
0.219512195121951
8
0%
1
25%
6
50%
8
75%
9
100%
9

5.3 Jugdement Agrement Data

In [66]:
# Importing the data 
data_agreement <- read.xls("data/agreement.xlsx", verbose=FALSE, na.strings=c("NA"))
In [67]:
# creating two groups (ML (1) and NonML (0) )
ML <- subset(data_agreement, Condition=='1')
NoML <- subset(data_agreement, Condition=='0')
In [68]:
# checking how many datapoints in one column:
count(ML)
n
82
In [69]:
# calculating the changed data and percentiles:
RSG <- sum(ML$Changed)
cat("RSG changed:", RSG, "RSG %:", RSG/82, '\n')
AC <- sum(ML$Changed.1)
cat("AC changed:", AC, "AC %:", AC/82, '\n')
BPP <- sum(ML$Changed.2)
cat("BPP changed:", BPP, "BPP %:", BPP/82, '\n')
BOA <- sum(ML$Changed.3)
cat("BOA changed:", BOA, "BOA %:", BOA/82, '\n')
Overall <- RSG+AC+BPP+BOA
cat("Overall changed:", Overall, "RSG %:", Overall/328, '\n')
RSG changed: 7 RSG %: 0.08536585 
AC changed: 7 AC %: 0.08536585 
BPP changed: 6 BPP %: 0.07317073 
BOA changed: 7 BOA %: 0.08536585 
Overall changed: 27 RSG %: 0.08231707 

5.4 Annotations Data

In [70]:
# Overall mean annotations
data <- read.xls("data/TimeAnalysis2_1.xlsx", verbose=FALSE, na.strings=c("NA"))
data2 <- subset(data, Condition=='1')
data3 <- subset(data, Condition=='0')

mean(data2$TotalSubmitted)
mean(data3$TotalSubmitted)
14.6341463414634
4.67073170731707
In [71]:
##############
# annotations

data <- read.xls("data/annotationschanged.xlsx", verbose=FALSE, na.strings=c("NA"))
data2 <- subset(data, Condition=='1')
data3 <- subset(data, Condition=='0')



RSG1 <- sum(data2$Changed1=='0')
RSG2 <- sum(data2$Changed1=='1')
RSG3 <- sum(data2$Changed1=='2')
RSG1/82
RSG2/82
RSG3/82

AC1 <- sum(data2$Changed2=='0')
AC2 <- sum(data2$Changed2=='1')
AC3 <- sum(data2$Changed2=='2')
AC1/82
AC2/82
AC3/82

BPP1 <- sum(data2$Changed3=='0')
BPP2 <- sum(data2$Changed3=='1')
BPP3 <- sum(data2$Changed3=='2')
BPP1/82
BPP2/82
BPP3/82

BOA1 <- sum(data2$Changed4=='0')
BOA2 <- sum(data2$Changed4=='1')
BOA3 <- sum(data2$Changed4=='2')
BOA1/82
BOA2/82
BOA3/82

TotalUnchanged <- RSG1+AC1+BPP1+BOA1
TotalUnchanged/328

TotalChangedML <- RSG2+AC2+BPP2+BOA2
TotalChangedML/328

TotalChangedNoML <- RSG3+AC3+BPP3+BOA3
TotalChangedNoML/328
0.560975609756098
0.414634146341463
0.024390243902439
0.621951219512195
0.341463414634146
0.0365853658536585
0.634146341463415
0.329268292682927
0.0365853658536585
0.646341463414634
0.292682926829268
0.0609756097560976
0.615853658536585
0.344512195121951
0.0396341463414634

6. QUESTIONNAIRE ANALYSIS

In [74]:
# loading the data:
dataq <- read.xls("data/UXData1.xlsx", verbose=FALSE, na.strings=c("NA"))

6.1 Data Preparation (cleaning and structuring)

In [75]:
# subsetting
data_rel <- as.data.frame(dataq)
ss2 <- c(1,3,4, 25:44)
data_q2 <- subset(data_rel, select=ss2)
In [76]:
# checking the data
head(data_q2)
ParticipantNoSequenceConditionCapacityReviewNoOfTasksPerformedHowManyReviewsCochraneRoBExpUseFrequentlyComplexEasyToUseUseQuicklyCumbersomeConfidentNeededLearnTextHelpfulDifficultToNavImproveQualityIrrelevantConfusedContinueUse
1 1010 A develop questions; planning methods or write and publish protocols; develop and run search; select studies; collect data; assess RoB; analyse data; interprete findings; write and publish review 9 10+ Yes 5 1 4 5 1 5 1 4 1 4 3 2 5
1 1010 NOA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 1100 A planning methods or write and publish protocols; develop and run search; select studies; collect data; assess RoB; analyse data; interprete findings; write and publish review 8 5to10 Yes 5 1 4 5 2 5 2 5 1 4 3 1 5
2 1100 NOA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 101 A develop questions; planning methods or write and publish protocols; develop and run search; select studies; collect data; assess RoB; analyse data; interprete findings; write and publish review 9 5to10 Yes 5 4 5 5 2 2 2 5 1 5 2 2 5
3 101 NOA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
In [77]:
ss3 <- c(1, 3, 8:23)
clean_q2 <- subset(data_q2, select=ss3)
head(clean_q2)
ParticipantNoConditionUseFrequentlyComplexEasyToUseNeedSupportWellIntegratedInconsistencyUseQuicklyCumbersomeConfidentNeededLearnTextHelpfulDifficultToNavImproveQualityIrrelevantConfusedContinueUse
1 A 5 1 4 1 4 2 5 1 5 1 4 1 4 3 2 5
1 NOANA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 A 5 1 4 1 5 1 5 2 5 2 5 1 4 3 1 5
2 NOANA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 A 5 4 5 2 5 2 5 2 2 2 5 1 5 2 2 5
3 NOANA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
In [78]:
clean_q2 <- subset(clean_q2, Condition=="A")
In [79]:
head(clean_q2)
ParticipantNoConditionUseFrequentlyComplexEasyToUseNeedSupportWellIntegratedInconsistencyUseQuicklyCumbersomeConfidentNeededLearnTextHelpfulDifficultToNavImproveQualityIrrelevantConfusedContinueUse
11A5141425151414325
32A5141515252514315
53A5452525222515225
74A5254525542524225
95A5252514141313324
116A5333245211255125
In [80]:
clean_q2_A <- subset(clean_q2, Condition=='A')
clean_q2_NOA <- subset(clean_q2, Condition=='NOA')
In [81]:
head(clean_q2_A)
ParticipantNoConditionUseFrequentlyComplexEasyToUseNeedSupportWellIntegratedInconsistencyUseQuicklyCumbersomeConfidentNeededLearnTextHelpfulDifficultToNavImproveQualityIrrelevantConfusedContinueUse
11A5141425151414325
32A5141515252514315
53A5452525222515225
74A5254525542524225
95A5252514141313324
116A5333245211255125

6.2 Qualitative Analysis (likert scales diagram)

In [82]:
str(clean_q2_A)
'data.frame':	41 obs. of  18 variables:
 $ ParticipantNo : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Condition     : Factor w/ 2 levels "A","NOA": 1 1 1 1 1 1 1 1 1 1 ...
 $ UseFrequently : int  5 5 5 5 5 5 4 1 2 4 ...
 $ Complex       : int  1 1 4 2 2 3 1 3 2 1 ...
 $ EasyToUse     : int  4 4 5 5 5 3 5 4 2 4 ...
 $ NeedSupport   : int  1 1 2 4 2 3 2 1 1 1 ...
 $ WellIntegrated: int  4 5 5 5 5 2 4 4 4 4 ...
 $ Inconsistency : int  2 1 2 2 1 4 3 1 3 1 ...
 $ UseQuickly    : int  5 5 5 5 4 5 4 3 3 4 ...
 $ Cumbersome    : int  1 2 2 5 1 2 2 4 2 1 ...
 $ Confident     : int  5 5 2 4 4 1 4 2 3 4 ...
 $ NeededLearn   : int  1 2 2 2 1 1 2 2 3 1 ...
 $ TextHelpful   : int  4 5 5 5 3 2 3 5 4 3 ...
 $ DifficultToNav: int  1 1 1 2 1 5 2 3 1 1 ...
 $ ImproveQuality: int  4 4 5 4 3 5 3 3 5 2 ...
 $ Irrelevant    : int  3 3 2 2 3 1 4 2 2 4 ...
 $ Confused      : int  2 1 2 2 2 2 3 2 2 4 ...
 $ ContinueUse   : int  5 5 5 5 4 5 4 4 2 4 ...
In [83]:
# need to change this to factors!
clean_q2_A$UseFrequently = factor(clean_q2_A$UseFrequently,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$Complex = factor(clean_q2_A$Complex,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$EasyToUse = factor(clean_q2_A$EasyToUse,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$NeedSupport = factor(clean_q2_A$NeedSupport,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$WellIntegrated = factor(clean_q2_A$WellIntegrated,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$Inconsistency = factor(clean_q2_A$Inconsistency,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$UseQuickly = factor(clean_q2_A$UseQuickly,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$Cumbersome = factor(clean_q2_A$Cumbersome,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$Confident = factor(clean_q2_A$Confident,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$NeededLearn = factor(clean_q2_A$NeededLearn,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$TextHelpful = factor(clean_q2_A$TextHelpful,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$DifficultToNav = factor(clean_q2_A$DifficultToNav,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$ImproveQuality = factor(clean_q2_A$ImproveQuality,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$Irrelevant = factor(clean_q2_A$Irrelevant,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$Confused = factor(clean_q2_A$Confused,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)

clean_q2_A$ContinueUse = factor(clean_q2_A$ContinueUse,
                                  levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
In [84]:
# checkl again
str(clean_q2_A)
'data.frame':	41 obs. of  18 variables:
 $ ParticipantNo : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Condition     : Factor w/ 2 levels "A","NOA": 1 1 1 1 1 1 1 1 1 1 ...
 $ UseFrequently : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 5 5 5 5 4 1 2 4 ...
 $ Complex       : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 4 2 2 3 1 3 2 1 ...
 $ EasyToUse     : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 4 5 5 5 3 5 4 2 4 ...
 $ NeedSupport   : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 2 4 2 3 2 1 1 1 ...
 $ WellIntegrated: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 5 5 5 5 2 4 4 4 4 ...
 $ Inconsistency : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 2 1 2 2 1 4 3 1 3 1 ...
 $ UseQuickly    : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 5 5 4 5 4 3 3 4 ...
 $ Cumbersome    : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 2 2 5 1 2 2 4 2 1 ...
 $ Confident     : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 2 4 4 1 4 2 3 4 ...
 $ NeededLearn   : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 2 2 2 1 1 2 2 3 1 ...
 $ TextHelpful   : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 5 5 5 3 2 3 5 4 3 ...
 $ DifficultToNav: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 1 2 1 5 2 3 1 1 ...
 $ ImproveQuality: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 4 5 4 3 5 3 3 5 2 ...
 $ Irrelevant    : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 3 3 2 2 3 1 4 2 2 4 ...
 $ Confused      : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 2 1 2 2 2 2 3 2 2 4 ...
 $ ContinueUse   : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 5 5 4 5 4 4 2 4 ...
In [85]:
summary(clean_q2_A)
 ParticipantNo Condition UseFrequently Complex   EasyToUse NeedSupport
 Min.   : 1    A  :41    1   : 1       1   :24   1   : 3   1   :25    
 1st Qu.:11    NOA: 0    2   : 5       2   :10   2   : 1   2   :11    
 Median :21              3   : 8       3   : 4   3   : 6   3   : 2    
 Mean   :21              4   :10       4   : 1   4   :13   4   : 1    
 3rd Qu.:31              5   :15       5   : 0   5   :16   5   : 0    
 Max.   :41              NA's: 2       NA's: 2   NA's: 2   NA's: 2    
 WellIntegrated Inconsistency UseQuickly Cumbersome Confident NeededLearn
 1   : 1        1   :15       1   : 0    1   :20    1   : 2   1   :17    
 2   : 2        2   :13       2   : 2    2   :11    2   : 5   2   :15    
 3   : 7        3   : 7       3   : 2    3   : 5    3   : 8   3   : 3    
 4   :17        4   : 4       4   :17    4   : 2    4   :12   4   : 2    
 5   :12        5   : 0       5   :18    5   : 1    5   :12   5   : 2    
 NA's: 2        NA's: 2       NA's: 2    NA's: 2    NA's: 2   NA's: 2    
 TextHelpful DifficultToNav ImproveQuality Irrelevant Confused  ContinueUse
 1   : 1     1   :24        1   : 0        1   : 4    1   :12   1   : 0    
 2   : 4     2   : 7        2   : 6        2   :12    2   :16   2   : 5    
 3   :12     3   : 4        3   :13        3   :11    3   : 2   3   : 7    
 4   :12     4   : 3        4   :11        4   : 8    4   : 8   4   :12    
 5   :10     5   : 1        5   : 9        5   : 4    5   : 1   5   :15    
 NA's: 2     NA's: 2        NA's: 2        NA's: 2    NA's: 2   NA's: 2    
In [86]:
#need to remove the 'out of level columns'
ss4 <- c(3:18)
final_data <- subset(clean_q2_A, select=ss4)
summary(final_data)
 UseFrequently Complex   EasyToUse NeedSupport WellIntegrated Inconsistency
 1   : 1       1   :24   1   : 3   1   :25     1   : 1        1   :15      
 2   : 5       2   :10   2   : 1   2   :11     2   : 2        2   :13      
 3   : 8       3   : 4   3   : 6   3   : 2     3   : 7        3   : 7      
 4   :10       4   : 1   4   :13   4   : 1     4   :17        4   : 4      
 5   :15       5   : 0   5   :16   5   : 0     5   :12        5   : 0      
 NA's: 2       NA's: 2   NA's: 2   NA's: 2     NA's: 2        NA's: 2      
 UseQuickly Cumbersome Confident NeededLearn TextHelpful DifficultToNav
 1   : 0    1   :20    1   : 2   1   :17     1   : 1     1   :24       
 2   : 2    2   :11    2   : 5   2   :15     2   : 4     2   : 7       
 3   : 2    3   : 5    3   : 8   3   : 3     3   :12     3   : 4       
 4   :17    4   : 2    4   :12   4   : 2     4   :12     4   : 3       
 5   :18    5   : 1    5   :12   5   : 2     5   :10     5   : 1       
 NA's: 2    NA's: 2    NA's: 2   NA's: 2     NA's: 2     NA's: 2       
 ImproveQuality Irrelevant Confused  ContinueUse
 1   : 0        1   : 4    1   :12   1   : 0    
 2   : 6        2   :12    2   :16   2   : 5    
 3   :13        3   :11    3   : 2   3   : 7    
 4   :11        4   : 8    4   : 8   4   :12    
 5   : 9        5   : 4    5   : 1   5   :15    
 NA's: 2        NA's: 2    NA's: 2   NA's: 2    
In [87]:
results <- likert(final_data)
In [88]:
# Legend: 5 = Strongly agree 1 = Stronly disagree
plot(results, type='bar')
In [89]:
# Alternative hearmap graph
plot(results, 
     type="heat",
           low.color = "white", 
           high.color = "blue",
           text.color = "black", 
           text.size = 4, 
           wrap = 50)