This document will walk through the steps for mapping QTLs in the RIX lines.
Required Files:
SIG_QTL_Mapping_Workflow.ipynb
): [Download here]rix_qtl_mapping_functions.r
: [Download here]Oas1b_status_recoded.txt
)Mx1_status_recoded.txt
)** Note: this notebook can also be downloaded as an R script (only the code blocks seen below will be included): [Download R script here]
Required R Libraries:
gdata
: https://cran.r-project.org/web/packages/gdata/index.htmlDOQTL
: https://www.bioconductor.org/packages/release/bioc/html/DOQTL.htmlGenomicRanges
: https://bioconductor.org/packages/release/bioc/html/GenomicRanges.htmlVariantAnnotation
: https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.htmlforeach
: https://cran.r-project.org/web/packages/foreach/index.htmldoParallel
: https://cran.r-project.org/web/packages/doParallel/index.htmlRColorBrewer
: https://cran.r-project.org/web/packages/RColorBrewer/index.htmlAll code is available on GitHub: https://github.com/biodev/SIG
## Load R functions and libraries
source('./scripts/rix_qtl_mapping_functions.r')
gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. Attaching package: ‘gdata’ The following object is masked from ‘package:stats’: nobs The following object is masked from ‘package:utils’: object.size The following object is masked from ‘package:base’: startsWith Loading required package: BSgenome.Mmusculus.UCSC.mm10 Loading required package: BSgenome Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following object is masked from ‘package:gdata’: combine The following objects are masked from ‘package:stats’: IQR, mad, xtabs The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist, unsplit Loading required package: S4Vectors Loading required package: stats4 Loading required package: IRanges Attaching package: ‘IRanges’ The following object is masked from ‘package:gdata’: trim Loading required package: GenomeInfoDb Loading required package: GenomicRanges Loading required package: Biostrings Loading required package: XVector Loading required package: rtracklayer Loading required package: VariantAnnotation Loading required package: SummarizedExperiment Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Loading required package: Rsamtools Attaching package: ‘VariantAnnotation’ The following object is masked from ‘package:base’: tabulate Loading required package: iterators
## Read phenotype data
cleaned_data_dir = '/Users/mooneymi/Documents/SIG/WNV/Cleaned_Data_Releases/14-Jun-2016/'
pheno_dir = '/Users/mooneymi/Documents/SIG/Mapping/phenotypes'
mapping_dir = '/Users/mooneymi/Documents/SIG/Mapping'
## For this example we're working with qPCR data
pheno = read.xls(file.path(cleaned_data_dir, 'Gale_qPCR_byMouse_14-Jun-2016_final.xlsx'), header=T, as.is=T)
dim(pheno)
## The phenotype will be D12 WNV (viral transcript) in the brain
pheno = pheno[with(pheno, Virus=='WNV' & Timepoint==12 & Experiment=='WNV' & Tissue=='Brain'), ]
dim(pheno)
## Save the phenotype dataframe (optional)
#save(pheno, file=file.path(pheno_dir, 'wnv_dCt_brain_d12_wnv_17-Jun-2016.rda'))
## View the phenotype distribution accross the mapping population
library(lattice)
dotplot(reorder(pheno[,'UW_Line'], pheno[,'dCt'], mean, na.rm=T) ~
pheno[,'dCt'] | pheno[,'Virus'],
panel = function(x,y,...) {panel.dotplot(x,y,...); panel.abline(v=0, col.line="red")},
pch=19, ylab='UW Line', xlab="D12 WNV dCt")
## Sort pheno dataframe and set rownames
pheno = pheno[with(pheno, order(Mating, RIX_ID)),]
rownames(pheno) = pheno$ID
## Add sex column
pheno$sex = 'M'
## Create covariate dataframe (must include sex)
covar = data.frame(sex = as.numeric(pheno$sex == 'M'))
rownames(covar) = pheno$ID
## Get IDs and Matings for each sample
samples = pheno$ID
matings = unlist(lapply(strsplit(samples, '_'), function(x) {x[1]}))
matings = unique(matings)
## Read strain ID mapping file (with Oas1b status for WNV mapping analyses)
## Note: use the Mx1 file (linked at top) for Flu analyses
strain_map = read.delim(file.path('./data', 'Oas1b_status_recoded.txt'), header=T, as.is=T, sep='\t')
head(strain_map)
Mating | CC_Mating | UW_Line | A | B | C | D | E | F | G | H | CW_prob | P_prob | Null_prob | Oas1b_status | Oas1b | Oas1b_High | Oas1b_Mod | Oas1b_Low | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 13067x16912 | CC003xCC051 | 17 | 1e-20 | 1e-20 | 1e-20 | 1e-20 | 0.5 | 1e-20 | 1e-20 | 0.5 | 0.5 | 1e-20 | 0.5 | High-Low | High | 1 | 0 | 0 |
2 | 13067x5306 | CC003xCC062 | 63 | 1e-20 | 1e-20 | 1.995e-07 | 0.4999998005 | 1e-20 | 1e-20 | 1e-20 | 0.5 | 0.5 | 1e-20 | 0.5 | High-Low | High | 1 | 0 | 0 |
3 | 13140x16680 | CC001xCC055 | 48 | 1e-20 | 0.5 | 1e-20 | 1e-20 | 0.5 | 1e-20 | 1e-20 | 1e-20 | 2e-20 | 1e-20 | 1 | Low | Low | 0 | 0 | 1 |
4 | 13140x3015 | CC001xCC074 | 2 | 1e-20 | 0.5 | 0.5 | 1e-20 | 1e-20 | 1e-20 | 1e-20 | 1e-20 | 2e-20 | 1e-20 | 1 | Low | Low | 0 | 0 | 1 |
5 | 13421x16034 | CC007xCC070 | 78 | 1e-20 | 1e-20 | 1.156e-06 | 0.499998844 | 1e-20 | 1e-20 | 0.5 | 1e-20 | 2e-20 | 0.5 | 0.5 | Mod-Low | Mod | 0 | 1 | 0 |
6 | 13421x16296 | CC007xCC049 | NA | 1e-20 | 0.125 | 1.2215e-06 | 0.8749987785 | 1e-20 | 1e-20 | 1e-20 | 1e-20 | 2e-20 | 1e-20 | 1 | Low | Low | 0 | 0 | 1 |
## Check that matings match the strain mapping file
setdiff(matings, strain_map$Mating)
## Update covar with Oas1b status (only for WNV analyses)
covar$Mating = pheno$Mating
covar$Oas1b = sapply(covar$Mating, function(x) {if (x %in% strain_map$Mating) strain_map$Oas1b[strain_map$Mating == x] else NA})
covar$Oas1b_High = sapply(covar$Mating, function(x) {if (x %in% strain_map$Mating) strain_map$Oas1b_High[strain_map$Mating == x] else NA})
covar$Oas1b_Mod = sapply(covar$Mating, function(x) {if (x %in% strain_map$Mating) strain_map$Oas1b_Mod[strain_map$Mating == x] else NA})
covar$Oas1b_Low = sapply(covar$Mating, function(x) {if (x %in% strain_map$Mating) strain_map$Oas1b_Low[strain_map$Mating == x] else NA})
pheno$Oas1b = covar$Oas1b
pheno$Oas1b_High = covar$Oas1b_High
pheno$Oas1b_Mod = covar$Oas1b_Mod
pheno$Oas1b_Low = covar$Oas1b_Low
## Load universal model probabilities (loads a model.probs object containing all RIX lines)
data_dir = './xdata'
load(file.path(data_dir, 'rix_universal_model_prob_males_27-Jun-2016.rda'))
## Create model.probs array specific to the mapping population
model.probs = model.probs[pheno$Mating, , ]
dimnames(model.probs)[[1]] = pheno$ID
## Check model.probs object
dim(model.probs)
names(dimnames(model.probs))
dim(model.probs)[1] == dim(pheno)[1]
model.probs[1,,1:5]
UNC6 | JAX00000010 | JAX00240603 | JAX00240610 | JAX00240613 | |
---|---|---|---|---|---|
A | 6.00e-06 | 5.25e-06 | 4.50e-06 | 3.75e-06 | 3.00e-06 |
B | 0.4999962 | 0.4999970 | 0.4999977 | 0.4999981 | 0.4999985 |
C | 4.2e-08 | 1.0e-20 | 1.0e-20 | 1.0e-20 | 1.0e-20 |
D | 4.2e-08 | 1.0e-20 | 1.0e-20 | 1.0e-20 | 1.0e-20 |
E | 0.4999961 | 0.4999970 | 0.4999977 | 0.4999981 | 0.4999985 |
F | 4.2e-08 | 1.0e-20 | 1.0e-20 | 1.0e-20 | 1.0e-20 |
G | 4.2e-08 | 1.0e-20 | 1.0e-20 | 1.0e-20 | 1.0e-20 |
H | 1.50075e-06 | 7.50750e-07 | 7.50000e-10 | 7.50000e-10 | 1.00000e-20 |
## Fix very small genotype probabilities
model.probs[model.probs < 0.005] = 1e-20
## Check model.probs object
model.probs[1,,1:5]
UNC6 | JAX00000010 | JAX00240603 | JAX00240610 | JAX00240613 | |
---|---|---|---|---|---|
A | 1e-20 | 1e-20 | 1e-20 | 1e-20 | 1e-20 |
B | 0.4999962 | 0.4999970 | 0.4999977 | 0.4999981 | 0.4999985 |
C | 1e-20 | 1e-20 | 1e-20 | 1e-20 | 1e-20 |
D | 1e-20 | 1e-20 | 1e-20 | 1e-20 | 1e-20 |
E | 0.4999961 | 0.4999970 | 0.4999977 | 0.4999981 | 0.4999985 |
F | 1e-20 | 1e-20 | 1e-20 | 1e-20 | 1e-20 |
G | 1e-20 | 1e-20 | 1e-20 | 1e-20 | 1e-20 |
H | 1e-20 | 1e-20 | 1e-20 | 1e-20 | 1e-20 |
## Create kinship probability matrix
K = kinship.probs(model.probs)
SNP 0 SNP 1000 SNP 2000 SNP 3000 SNP 4000 SNP 5000 SNP 6000 SNP 7000 SNP 8000 SNP 9000 SNP 10000 SNP 11000 SNP 12000 SNP 13000 SNP 14000 SNP 15000 SNP 16000 SNP 17000 SNP 18000 SNP 19000 SNP 20000 SNP 21000 SNP 22000 SNP 23000 SNP 24000 SNP 25000 SNP 26000 SNP 27000 SNP 28000 SNP 29000 SNP 30000 SNP 31000 SNP 32000 SNP 33000 SNP 34000 SNP 35000 SNP 36000 SNP 37000 SNP 38000 SNP 39000 SNP 40000 SNP 41000 SNP 42000 SNP 43000 SNP 44000 SNP 45000 SNP 46000 SNP 47000 SNP 48000 SNP 49000 SNP 50000 SNP 51000 SNP 52000 SNP 53000 SNP 54000 SNP 55000 SNP 56000 SNP 57000 SNP 58000 SNP 59000 SNP 60000 SNP 61000 SNP 62000 SNP 63000 SNP 64000 SNP 65000 SNP 66000 SNP 67000 SNP 68000 SNP 69000 SNP 70000 SNP 71000 SNP 72000 SNP 73000 SNP 74000 SNP 75000 SNP 76000
## Garbage collection might help free memory
gc()
used | (Mb) | gc trigger | (Mb) | max used | (Mb) | |
---|---|---|---|---|---|---|
Ncells | 4024269.0 | 215.0 | 6861544.0 | 366.5 | 5167425.0 | 276.0 |
Vcells | 99667044.0 | 760.4 | 345600613.0 | 2636.8 | 390134154.0 | 2976.5 |
## Check kinship matrix
K[1:5, 1:5]
13067x16912_133 | 13067x16912_134 | 13067x16912_135 | 13067x5306_47 | 13067x5306_58 | |
---|---|---|---|---|---|
13067x16912_133 | 1.0000000 | 1.0000000 | 1.0000000 | 0.5946385 | 0.5946385 |
13067x16912_134 | 1.0000000 | 1.0000000 | 1.0000000 | 0.5946385 | 0.5946385 |
13067x16912_135 | 1.0000000 | 1.0000000 | 1.0000000 | 0.5946385 | 0.5946385 |
13067x5306_47 | 0.5946385 | 0.5946385 | 0.5946385 | 1.0000000 | 1.0000000 |
13067x5306_58 | 0.5946385 | 0.5946385 | 0.5946385 | 1.0000000 | 1.0000000 |
## First get marker positions
marker_pos = read.csv(file.path(data_dir, 'CC001-Uncb38V01.csv'), as.is=T)
marker_pos = marker_pos[,1:3]
marker_pos$position_cM = NA
head(marker_pos)
marker | chromosome | position.B38. | position_cM | |
---|---|---|---|---|
1 | UNC6 | 1 | 3010274 | NA |
2 | JAX00000010 | 1 | 3135418 | NA |
3 | JAX00240603 | 1 | 3252796 | NA |
4 | JAX00240610 | 1 | 3266608 | NA |
5 | JAX00240613 | 1 | 3323400 | NA |
6 | JAX00240636 | 1 | 3389563 | NA |
## Run QTL scan
qtl = scanone(pheno=pheno, pheno.col='dCt', probs=model.probs, K=K,
addcovar=covar[, c('sex'), drop=F], snps=marker_pos)
[1] "Mapping with 158 samples." [1] "Mapping with 76600 markers." [1] "dCt"
## Run permutations to calculate significance threshold
## Note: nperm=10 is for demonstration purposes only. Normally, at least 1000 permutations should be done.
perms = scanone.perm(pheno=pheno, pheno.col='dCt', probs=model.probs,
addcovar=covar[, c('sex'), drop=F], snps=marker_pos, nperm = 10)
[1] "dCt" [1] "1 of 10" [1] "2 of 10" [1] "3 of 10" [1] "4 of 10" [1] "5 of 10" [1] "6 of 10" [1] "7 of 10" [1] "8 of 10" [1] "9 of 10" [1] "10 of 10" [1] "1 of 10" [1] "2 of 10" [1] "3 of 10" [1] "4 of 10" [1] "5 of 10" [1] "6 of 10" [1] "7 of 10" [1] "8 of 10" [1] "9 of 10" [1] "10 of 10"
## Get the significance thresholds
sig.thr = get.sig.thr(perms[,,'dCt'], alpha=0.05)
sugg.thr = get.sig.thr(perms[,,'dCt'], alpha=0.1)
## Plot QTL results
plot(qtl, sig.thr = sig.thr, main = 'Day 12 Brain WNV dCt')
## Add lines for suggestive peaks (alpha = 0.1)
lines(x = c(0, 2462), y = rep(sugg.thr[1,"A"], 2), lwd = 2, lty=2, col = 'red')
lines(x = c(2462, 2633), y = rep(sugg.thr[1,"X"], 2), lwd = 2, lty=2, col = 'red')
## Identify the Bayes Credible Interval for the peak on chromosome 5
interval = bayesint_v2(qtl, chr = 5)
interval
marker | chromosome | position.B38. | position_cM | perc.var | lrs | lod | p | neg.log10.p | |
---|---|---|---|---|---|---|---|---|---|
23173 | UNC10027338 | 5 | 119.898431 | NA | 19.290635 | 33.861861 | 7.35300965868645 | 1.82806638316241e-05 | 4.73800803764066 |
23255 | UNC10043432 | 5 | 121.263782 | NA | 21.918455 | 39.0918 | 8.48867647727879 | 1.87722070165332e-06 | 5.72648466511146 |
23291 | UNC10062151 | 5 | 122.900748 | NA | 17.674955 | 30.73018 | 6.67297371217539 | 6.97164944526511e-05 | 4.15666445859798 |
## Create founder effects (coefficient) plot for peak interval
coefplot_v2(qtl, chr=5, start=interval[1,3], end=interval[3,3])
## Get marker at peak
get_min_marker(qtl, chr=5)
marker | chromosome | position.B38. | position_cM | perc.var | lrs | lod | p | neg.log10.p | |
---|---|---|---|---|---|---|---|---|---|
23255 | UNC10043432 | 5 | 121.263782 | NA | 21.9184548398778 | 39.0917998316627 | 8.48867647727879 | 1.87722070165332e-06 | 5.72648466511146 |
## Create probability plot
prob.plot(pheno=pheno, pheno.col='dCt', probs=model.probs, marker='UNC10043432', qtl=qtl)