Use the EPL data for this item. Using Tidyverse functions, create the league table which contains for each team:
Sort the table with the top point scorer on top.
First, we read in the data.
library(tidyverse)
## URL of the file
eplURL <- eplURL <- "https://raw.githubusercontent.com/sens/smalldata/master/soccer/E0.csv"
## read in data frame
epl <- read.csv(eplURL)
head(epl <- as_tibble(epl))
── Attaching core tidyverse packages ──────────────────────────────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.4.4 ✔ tibble 3.2.1 ✔ lubridate 1.9.3 ✔ tidyr 1.3.1 ✔ purrr 1.0.2 ── Conflicts ────────────────────────────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Div | Date | HomeTeam | AwayTeam | FTHG | FTAG | FTR | HTHG | HTAG | HTR | ⋯ | BbAv.2.5.1 | BbAH | BbAHh | BbMxAHH | BbAvAHH | BbMxAHA | BbAvAHA | PSCH | PSCD | PSCA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<chr> | <chr> | <chr> | <chr> | <int> | <int> | <chr> | <int> | <int> | <chr> | ⋯ | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
E0 | 08/08/15 | Bournemouth | Aston Villa | 0 | 1 | A | 0 | 0 | D | ⋯ | 1.79 | 26 | -0.5 | 1.98 | 1.93 | 1.99 | 1.92 | 1.82 | 3.88 | 4.70 |
E0 | 08/08/15 | Chelsea | Swansea | 2 | 2 | D | 2 | 1 | H | ⋯ | 1.99 | 27 | -1.5 | 2.24 | 2.16 | 1.80 | 1.73 | 1.37 | 5.04 | 10.88 |
E0 | 08/08/15 | Everton | Watford | 2 | 2 | D | 0 | 1 | A | ⋯ | 1.96 | 26 | -1.0 | 2.28 | 2.18 | 1.76 | 1.71 | 1.75 | 3.76 | 5.44 |
E0 | 08/08/15 | Leicester | Sunderland | 4 | 2 | H | 3 | 0 | H | ⋯ | 1.67 | 26 | -0.5 | 2.00 | 1.95 | 1.96 | 1.90 | 1.79 | 3.74 | 5.10 |
E0 | 08/08/15 | Man United | Tottenham | 1 | 0 | H | 1 | 0 | H | ⋯ | 2.01 | 26 | -1.0 | 2.20 | 2.09 | 1.82 | 1.78 | 1.64 | 4.07 | 6.04 |
E0 | 08/08/15 | Norwich | Crystal Palace | 1 | 3 | A | 0 | 1 | A | ⋯ | 1.67 | 27 | 0.0 | 1.83 | 1.78 | 2.17 | 2.08 | 2.46 | 3.39 | 3.14 |
The data is organized so that each team appears both as the home team in half of its games and as an away team in the other half the games. So, the analysis is simpler, if we split up the computation for each team, first as home team, and then as away team.
First, we select only the variables of interest for further processing.
epl <- select(epl,HomeTeam,AwayTeam,FTHG,FTAG,FTR)
head(epl)
HomeTeam | AwayTeam | FTHG | FTAG | FTR |
---|---|---|---|---|
<chr> | <chr> | <int> | <int> | <chr> |
Bournemouth | Aston Villa | 0 | 1 | A |
Chelsea | Swansea | 2 | 2 | D |
Everton | Watford | 2 | 2 | D |
Leicester | Sunderland | 4 | 2 | H |
Man United | Tottenham | 1 | 0 | H |
Norwich | Crystal Palace | 1 | 3 | A |
We calculate games played, goals for/against, and points scored as home team, and then the same thing as away team.
(home <- group_by(epl,HomeTeam) |>
summarize(hgames=length(FTR),hgoalsfor=sum(FTHG),hgoalsag=sum(FTAG),hpoints=sum(3*(FTR=="H")+1*(FTR=="D"))))
HomeTeam | hgames | hgoalsfor | hgoalsag | hpoints |
---|---|---|---|---|
<chr> | <int> | <int> | <int> | <dbl> |
Arsenal | 19 | 31 | 11 | 40 |
Aston Villa | 19 | 14 | 35 | 11 |
Bournemouth | 19 | 23 | 34 | 20 |
Chelsea | 19 | 32 | 30 | 24 |
Crystal Palace | 19 | 19 | 23 | 21 |
Everton | 19 | 35 | 30 | 23 |
Leicester | 19 | 35 | 18 | 42 |
Liverpool | 19 | 33 | 22 | 32 |
Man City | 19 | 47 | 21 | 38 |
Man United | 19 | 27 | 9 | 41 |
Newcastle | 19 | 32 | 24 | 28 |
Norwich | 19 | 26 | 30 | 23 |
Southampton | 19 | 39 | 22 | 36 |
Stoke | 19 | 22 | 24 | 28 |
Sunderland | 19 | 23 | 20 | 24 |
Swansea | 19 | 20 | 20 | 30 |
Tottenham | 19 | 35 | 15 | 36 |
Watford | 19 | 20 | 19 | 24 |
West Brom | 19 | 20 | 26 | 23 |
West Ham | 19 | 34 | 26 | 34 |
(away <- group_by(epl,AwayTeam) |>
summarize(agames=length(FTR),agoalsfor=sum(FTAG),agoalsag=sum(FTHG),apoints=sum(3*(FTR=="A")+1*(FTR=="D"))))
AwayTeam | agames | agoalsfor | agoalsag | apoints |
---|---|---|---|---|
<chr> | <int> | <int> | <int> | <dbl> |
Arsenal | 19 | 34 | 25 | 31 |
Aston Villa | 19 | 13 | 41 | 6 |
Bournemouth | 19 | 22 | 33 | 22 |
Chelsea | 19 | 27 | 23 | 26 |
Crystal Palace | 19 | 20 | 28 | 21 |
Everton | 19 | 24 | 25 | 24 |
Leicester | 19 | 33 | 18 | 39 |
Liverpool | 19 | 30 | 28 | 28 |
Man City | 19 | 24 | 20 | 28 |
Man United | 19 | 22 | 26 | 25 |
Newcastle | 19 | 12 | 41 | 9 |
Norwich | 19 | 13 | 37 | 11 |
Southampton | 19 | 20 | 19 | 27 |
Stoke | 19 | 19 | 31 | 23 |
Sunderland | 19 | 25 | 42 | 15 |
Swansea | 19 | 22 | 32 | 17 |
Tottenham | 19 | 34 | 20 | 34 |
Watford | 19 | 20 | 31 | 21 |
West Brom | 19 | 14 | 22 | 20 |
West Ham | 19 | 31 | 25 | 28 |
We put the two tables together using a left join. Strictly speaking it would have worked without a join since the teams are sorted in alphabetical order, but it is safer to use a left join, just in case it is not.
allGames <- left_join(home,away,by = c("HomeTeam"="AwayTeam"))
head(allGames)
HomeTeam | hgames | hgoalsfor | hgoalsag | hpoints | agames | agoalsfor | agoalsag | apoints |
---|---|---|---|---|---|---|---|---|
<chr> | <int> | <int> | <int> | <dbl> | <int> | <int> | <int> | <dbl> |
Arsenal | 19 | 31 | 11 | 40 | 19 | 34 | 25 | 31 |
Aston Villa | 19 | 14 | 35 | 11 | 19 | 13 | 41 | 6 |
Bournemouth | 19 | 23 | 34 | 20 | 19 | 22 | 33 | 22 |
Chelsea | 19 | 32 | 30 | 24 | 19 | 27 | 23 | 26 |
Crystal Palace | 19 | 19 | 23 | 21 | 19 | 20 | 28 | 21 |
Everton | 19 | 35 | 30 | 23 | 19 | 24 | 25 | 24 |
Finally, we put things together by adding games played, goals for/against, goal difference and points. We sort by points, goal difference, and goals for (as is traditional).
leagueTable <- transmute(allGames,Team=HomeTeam,GamesPlayed=hgames+agames,
GoalsFor=hgoalsfor+agoalsfor,
GoalsAgainst=hgoalsag+agoalsag,
GoalDifference=GoalsFor-GoalsAgainst,
Points=hpoints+apoints) |>
arrange(-Points,-GoalDifference,-GoalsFor)
leagueTable
Team | GamesPlayed | GoalsFor | GoalsAgainst | GoalDifference | Points |
---|---|---|---|---|---|
<chr> | <int> | <int> | <int> | <int> | <dbl> |
Leicester | 38 | 68 | 36 | 32 | 81 |
Arsenal | 38 | 65 | 36 | 29 | 71 |
Tottenham | 38 | 69 | 35 | 34 | 70 |
Man City | 38 | 71 | 41 | 30 | 66 |
Man United | 38 | 49 | 35 | 14 | 66 |
Southampton | 38 | 59 | 41 | 18 | 63 |
West Ham | 38 | 65 | 51 | 14 | 62 |
Liverpool | 38 | 63 | 50 | 13 | 60 |
Stoke | 38 | 41 | 55 | -14 | 51 |
Chelsea | 38 | 59 | 53 | 6 | 50 |
Everton | 38 | 59 | 55 | 4 | 47 |
Swansea | 38 | 42 | 52 | -10 | 47 |
Watford | 38 | 40 | 50 | -10 | 45 |
West Brom | 38 | 34 | 48 | -14 | 43 |
Crystal Palace | 38 | 39 | 51 | -12 | 42 |
Bournemouth | 38 | 45 | 67 | -22 | 42 |
Sunderland | 38 | 48 | 62 | -14 | 39 |
Newcastle | 38 | 44 | 65 | -21 | 37 |
Norwich | 38 | 39 | 67 | -28 | 34 |
Aston Villa | 38 | 27 | 76 | -49 | 17 |
Another way to do the same.
pts <- c(0,1,3)
names(pts) <- c("A","D","H")
epl <- epl |> select(HomeTeam,AwayTeam,FTHG,FTAG,FTR) |>
mutate(homepts=pts[FTR],awaypts=3-homepts-1*(homepts==1))
home <- epl |>
group_by(HomeTeam) |>
summarize(games=n(),goalsfor=sum(FTHG),goalsagainst=sum(FTAG),points=sum(homepts))
away <- epl |>
group_by(AwayTeam) |>
summarize(games=n(),goalsfor=sum(FTAG),goalsagainst=sum(FTHG),points=sum(awaypts))
left_join(home,away,by=join_by(HomeTeam==AwayTeam)) |>
transmute(Team=HomeTeam,games=games.x+games.y,goalsFor=goalsfor.x+goalsfor.y,
goalsAgainst=goalsagainst.x+goalsagainst.y,
goalDiff=goalsFor-goalsAgainst,points=points.x+points.y) |>
arrange(desc(points),desc(goalDiff),desc(goalsFor))
Team | games | goalsFor | goalsAgainst | goalDiff | points |
---|---|---|---|---|---|
<chr> | <int> | <int> | <int> | <int> | <dbl> |
Leicester | 38 | 68 | 36 | 32 | 81 |
Arsenal | 38 | 65 | 36 | 29 | 71 |
Tottenham | 38 | 69 | 35 | 34 | 70 |
Man City | 38 | 71 | 41 | 30 | 66 |
Man United | 38 | 49 | 35 | 14 | 66 |
Southampton | 38 | 59 | 41 | 18 | 63 |
West Ham | 38 | 65 | 51 | 14 | 62 |
Liverpool | 38 | 63 | 50 | 13 | 60 |
Stoke | 38 | 41 | 55 | -14 | 51 |
Chelsea | 38 | 59 | 53 | 6 | 50 |
Everton | 38 | 59 | 55 | 4 | 47 |
Swansea | 38 | 42 | 52 | -10 | 47 |
Watford | 38 | 40 | 50 | -10 | 45 |
West Brom | 38 | 34 | 48 | -14 | 43 |
Crystal Palace | 38 | 39 | 51 | -12 | 42 |
Bournemouth | 38 | 45 | 67 | -22 | 42 |
Sunderland | 38 | 48 | 62 | -14 | 39 |
Newcastle | 38 | 44 | 65 | -21 | 37 |
Norwich | 38 | 39 | 67 | -28 | 34 |
Aston Villa | 38 | 27 | 76 | -49 | 17 |
First, we read in the data.
# downloading data file from Dryad
dryadFileDownload <- function(filenum,filename,baseurl="https://datadryad.org/api/v2")
{
download.file(paste(baseurl,"/files/",filenum,"/download",sep=""),filename,mode="wb")
}
# make a temporaty filename with RDA extension
tmpfile <- tempfile(fileext="rda")
# download that file
emergencyFileID <- "27091"
dryadFileDownload(emergencyFileID,tmpfile)
# load it
load(tmpfile)
# assign it to a different name as it has an unhelpful name
emergency <- data
rm(data)
head(emergency)
triage | age | sex | crp | k | na | hb | crea | leu | alb | ⋯ | mort30 | icutime | icustatus | inddage | genindl.1 | saturation | respirationsfrekvens | puls | systoliskblodtryk | gcs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | ⋯ | <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <int> | <int> | <int> | <int> | |
2430 | yellow | 75 | female | 281.101300 | 3.266996 | 135.5883 | 8.4 | 34.50495 | 24.61 | 22.25447 | ⋯ | 0 | 999999 | 0 | 23 | 0 | 97 | 20 | 124 | 120 | 15 |
4250 | red | 21 | female | 3.470628 | 4.014938 | 137.3804 | 7.9 | 61.14286 | 18.69 | NA | ⋯ | 0 | 999999 | 0 | 2 | 0 | 99 | 18 | 102 | 164 | 15 |
5002 | green | 83 | female | 19.913540 | 4.254919 | 132.2926 | 7.5 | 61.46023 | 12.25 | 43.55455 | ⋯ | 0 | 999999 | 0 | 1 | 0 | 97 | 16 | 87 | 156 | 15 |
5375 | orange | 71 | male | 32.092750 | 4.900000 | 135.0000 | 9.9 | 89.00000 | 16.39 | 39.17460 | ⋯ | 1 | 0 | 1 | 1 | 0 | NA | NA | NA | NA | 14 |
1424 | yellow | 86 | female | 76.256850 | 3.600000 | 138.0000 | 7.6 | 92.00000 | 14.53 | 40.15953 | ⋯ | 0 | 999999 | 0 | 1 | 0 | 97 | 20 | 69 | 150 | 14 |
1057 | yellow | 84 | female | 2.324502 | 3.300000 | 140.0000 | 7.2 | 76.00000 | 5.17 | 44.07696 | ⋯ | 0 | 999999 | 0 | 3 | 0 | 97 | 20 | 70 | 167 | 15 |
emergency %>% group_by(sex) %>%
summarize(n=n(),meanCRP=mean(crp,na.rm=T),
avg30dMort=round(mean(mort30,na.rm=T),3))
sex | n | meanCRP | avg30dMort |
---|---|---|---|
<fct> | <int> | <dbl> | <dbl> |
female | 3236 | 34.93204 | 0.049 |
male | 3013 | 31.97153 | 0.056 |
emergency %>% group_by(sex,triage) %>%
summarize(n=n(),meanCRP=mean(crp,na.rm=T) |> round(1),
avg30dMort=mean(mort30,na.rm=T)|> round(3))
`summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
sex | triage | n | meanCRP | avg30dMort |
---|---|---|---|---|
<fct> | <fct> | <int> | <dbl> | <dbl> |
female | red | 121 | 47.2 | 0.240 |
female | orange | 798 | 31.9 | 0.060 |
female | yellow | 1241 | 41.7 | 0.046 |
female | green | 1057 | 26.9 | 0.022 |
female | NA | 19 | 48.2 | 0.053 |
male | red | 152 | 30.2 | 0.132 |
male | orange | 778 | 27.3 | 0.085 |
male | yellow | 1141 | 40.5 | 0.047 |
male | green | 915 | 24.7 | 0.031 |
male | NA | 27 | 35.6 | 0.074 |
mort30drisk <- group_by(emergency,sex,triage) %>%
summarize(avg30dMort=mean(mort30,na.rm=T))
mort30drisk
`summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
sex | triage | avg30dMort |
---|---|---|
<fct> | <fct> | <dbl> |
female | red | 0.23966942 |
female | orange | 0.06015038 |
female | yellow | 0.04593070 |
female | green | 0.02175970 |
female | NA | 0.05263158 |
male | red | 0.13157895 |
male | orange | 0.08483290 |
male | yellow | 0.04732691 |
male | green | 0.03060109 |
male | NA | 0.07407407 |
pivot_wider(mort30drisk,names_from = "sex",
values_from="avg30dMort") %>%
mutate(riskRatio=round(male/female,3))
triage | female | male | riskRatio |
---|---|---|---|
<fct> | <dbl> | <dbl> | <dbl> |
red | 0.23966942 | 0.13157895 | 0.549 |
orange | 0.06015038 | 0.08483290 | 1.410 |
yellow | 0.04593070 | 0.04732691 | 1.030 |
green | 0.02175970 | 0.03060109 | 1.406 |
NA | 0.05263158 | 0.07407407 | 1.407 |
We could have chained the two lines together, but it is easier to understand the flow if we break up the computation.
sessionInfo()
R version 4.4.2 (2024-10-31) Platform: x86_64-pc-linux-gnu Running under: Debian GNU/Linux trixie/sid Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.28.so; LAPACK version 3.12.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: America/Chicago tzcode source: system (glibc) attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 [5] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 [9] ggplot2_3.4.4 tidyverse_2.0.0 loaded via a namespace (and not attached): [1] gtable_0.3.4 jsonlite_1.8.8 compiler_4.4.2 renv_1.0.7 [5] crayon_1.5.2 tidyselect_1.2.1 IRdisplay_1.1 scales_1.3.0 [9] uuid_1.2-0 yaml_2.3.10 fastmap_1.1.1 IRkernel_1.3.2 [13] R6_2.5.1 generics_0.1.3 munsell_0.5.0 pillar_1.9.0 [17] tzdb_0.4.0 rlang_1.1.4 utf8_1.2.4 stringi_1.8.3 [21] repr_1.1.7 timechange_0.3.0 cli_3.6.3 withr_3.0.1 [25] magrittr_2.0.3 digest_0.6.37 grid_4.4.2 base64enc_0.1-3 [29] hms_1.1.3 pbdZMQ_0.3-13 lifecycle_1.0.4 vctrs_0.6.5 [33] evaluate_0.23 glue_1.7.0 fansi_1.0.6 colorspace_2.1-0 [37] tools_4.4.2 pkgconfig_2.0.3 htmltools_0.5.7