Karl Broman , Department of Biostatistics & Medical Informatics, University of Wisconsin–Madison

Our aim in this tutorial is to demonstrate how to map quantitative trait loci (QTL) in the BXD mouse recombinant inbred lines using the R/qtl2 software. We will first show how to download BXD phenotypes from GeneNetwork2 using its API, via the R package R/GNapi. At the end, we will use the R/qtl2browse package to display genome scan results using the Genetics Genome Browser.

We will first use the GeneNetwork2 API to acquire BXD phenotypes to use for mapping. We will use the R package R/GNapi.

We first need to install the package, which is not available on CRAN, but is available via a private repository.

```
install.packages("GNapi", repos="http://rqtl.org/qtl2cran")
```

We then load the package using `library()`

.

In [1]:

```
library(GNapi)
```

The R/GNapi has a variety of functions. For an overview, see its vignette. Here we will just do one thing: use the function `get_pheno()`

to grab BXD phenotype data. You provide a data set and a phenotype. Phenotype 10038 concerns "habituation", measured as a difference in locomotor activity between day 1 and day 3 in a 5 minute test trial.

In [2]:

```
phe <- get_pheno("BXD", "10038")
head(phe)
```

We will use just the column "value", but we need to include the strain names so that R/qtl2 can line up these phenotypes with the genotypes.

In [3]:

```
pheno <- setNames(phe$value, phe$sample_name)
head(pheno)
```

We now want to get genotype data for the BXD panel. We first need to install the R/qtl2 package. As with R/GNapi, it is not available on CRAN, but rather is distributed via a private repository.

```
install.packages("qtl2", repos="http://rqtl.org/qtl2cran")
```

We then load the package with `library()`

.

In [4]:

```
library(qtl2)
```

R/qtl2 uses a special file format for QTL data (described here). There are a variety of sample datasets on Github, including genotypes for the mouse BXD lines, taken from GeneNetwork2. We'll load those data directly into R using the function `read_cross2()`

.

In [5]:

```
bxd_file <- "https://raw.githubusercontent.com/rqtl/qtl2data/master/BXD/bxd.zip"
bxd <- read_cross2(bxd_file)
```

We get a warning message about heterozygous genotypes being omitted. A number of the newer BXD lines have considerable heterozygosity. But these lines weren't among those phenotyped in the data we downloaded above, and so we don't need to worry about it here.

The data are read into the object `bxd`

, which has class `"cross2"`

. It contains the genotypes and well as genetic and physical marker maps. There are also phenotype data (which we will ignore).

We can get a quick summary of the dataset with `summary()`

. For reasons that I don't understand, it gets printed as a big mess within this Jupyter notebook, and so here we need to surround it with `print()`

to get the intended output.

In [6]:

```
print( summary(bxd) )
```

The first step in QTL analysis is to calculate genotype probabilities at putative QTL positions across the genome, conditional on the observed marker data. This allows us that consider positions between the genotyped markers and to allow for the presence of genotyping errors.

First, we need to define the positions that we will consider. We will take the observed marker positions and insert a set of "pseudomarkers" (marker-like positions that are not actually markers). We do this with the function `insert_pseudomarkers()`

. We pull the genetic map (`gmap`

) out of the `bxd`

data as our basic map; `step=0.2`

and `stepwidth="max"`

mean to insert pseudomarkers so that no two adjacent markers or pseudomarkers are more than 0.2 cM apart. That is, in any marker interval that is greater than 0.2 cM, we will insert one or more evenly spaced pseudomarkers, so that the intervals between markers and pseudomarkers are no more than 0.2 cM.

In [7]:

```
gmap <- insert_pseudomarkers(bxd$gmap, step=0.2, stepwidth="max")
```

We will be interested in results with respect to the physical map (in Mbp), and so we need to create a corresponding map that includes the pseudomarker positions. We do this with the function `interp_map()`

, which uses linear interpolation to get estimated positions for the inserted pseudomarkers.

In [8]:

```
pmap <- interp_map(gmap, bxd$gmap, bxd$pmap)
```

We can now proceed with calculating genotype probabilities for all BXD strains at all markers and pseudomarkers, conditional on the observed marker genotypes and assuming a 0.2% genotyping error rate. We use the Carter-Falconer map function to convert between cM and recombination fractions; it assumes a high degree of crossover interference, appropriate for the mouse.

In [9]:

```
pr <- calc_genoprob(bxd, gmap, error_prob=0.002, map_function="c-f")
```

In the QTL analysis, we will fit a linear mixed model to account for polygenic background effects. We will use the "leave one chromosome out" (LOCO) method for this. When we scan a chromosome for a QTL, we include a polygenic term with a kinship matrix derived from all other chromosomes.

We first need to calculate this set of kinship matrices, which we do with the function `calc_kinship()`

. The second argument, `"loco"`

, indicates that we want to calculate a vector of kinship matrices, each derived from the genotype probabilities but leaving one chromosome out.

In [10]:

```
k <- calc_kinship(pr, "loco")
```

Now, finally, we're ready to perform the genome scan, which we do with the function `scan1()`

. It takes the genotype probabilities and a set of phenotypes (here, just one phenotype). If kinship matrices are provided (here, as `k`

), the scan is performed using a linear mixed model. To make the calculations faster, the residual polygenic variance is first estimated without including any QTL effect and is then taking to be fixed and known during the scan.

In [11]:

```
out <- scan1(pr, pheno, k)
```

The output of `scan1()`

is a matrix of LOD scores; the rows are marker/pseudomarker positions and the columns are phenotypes. We can plot the results using `plot.scan1()`

, and we can just use `plot()`

because it uses the class of its input to determine what plot to make.

Here I'm using the package repr to control the height and width of the plot that's created. I installed it with `install.packages("repr")`

. You can ignore that part, if you want.

In [12]:

```
library(repr)
options(repr.plot.height=4, repr.plot.width=8)
par(mar=c(5.1, 4.1, 0.6, 0.6))
plot(out, pmap)
```

There's a clear QTL on chromosome 8. We can make a plot of just that chromosome with the argument `chr=15`

.

In [13]:

```
par(mar=c(5.1, 4.1, 0.6, 0.6))
plot(out, pmap, chr=15)
```

Let's create a plot of the phenotype vs the genotype at the inferred QTL. We first need to identify the QTL location, which we can do using `max()`

. We then use `maxmarg()`

to get inferred genotypes at the inferred QTL.

In [14]:

```
mx <- max(out, pmap)
g_imp <- maxmarg(pr, pmap, chr=mx$chr, pos=mx$pos, return_char=TRUE)
```

We can use `plot_pxg()`

to plot the phenotype as a function of QTL genotype. We use `swap_axes=TRUE`

to have the phenotype on the x-axis and the genotype on the y-axis, rather than the other way around. Here we see that the BB and DD genotypes are completely separated, phenotypically.

In [15]:

```
par(mar=c(5.1, 4.1, 0.6, 0.6))
plot_pxg(g_imp, pheno, swap_axes=TRUE, xlab="Habituation phenotype")
```

The Genetics Genome Browser is a fast, lightweight, [purescript]-based genome browser developed for browsing GWAS or QTL analysis results. We'll use the R package R/qtl2browse to view our QTL mapping results in the GGB.

We first need to install the R/qtl2browse package, again from a private CRAN-like repository.

```
install.packages("qtl2browse", repos="http://rqtl.org/qtl2cran")
```

We then load the package and use its one function, `browse()`

, which takes the `scan1()`

output and corresponding physical map (in Mbp). This will open the Genetics Genome Browser in a separate tab in your web browser.

In [16]:

```
library(qtl2browse)
browse(out, pmap)
```