Basic Concepts¶

Crossingover and Recombination¶

An odd number of crossovers between two loci results in a recombination between them. Because crossing over takes place at random, the probability of recombination is higher for loci that are farther apart than for loci that are closer to each other. This provides the basis for genetic linkage analysis, where recombination rates between loci are used to order genes on chromosomes. For example, if the recombination rate between locus $A$ and $B$ is $r_{AB}=0.1$, between $B$ and $C$ is $r_{BC}=0.1$, and between $A$ and $C$ is $r_{AC}=0.19$, we can arrange the loci in the order $ABC$. Note that $r_{AC}<r_{AB}+r_{BC}$. This is because recombinations between $A$ and $B$ and between $B$ and $C$ result in an even number of crossovers between $A$ and $C$ with no recombination between $A$ and $C$.

Interference¶

Interference is the lack of independence in recombinations at different intervals on a chromosome. Consider three loci ordered as $ABC$. If recombination in the $A$-$B$ interval is independent from recombination in the $B$-$C$ interval, the probability of a double recombinant, denoted by $g_{11}$, is $$g_{11}=r_{AB}r_{BC}$$ where $r_{ij}$ is the probability of a recombination between loci $i$ and $j$. If recombinations in the two intervals are not independent, the above probability is give by

$$g_{11}=cr_{AB}r_{BC}$$

where $c$ is called the coefficient of coincidence. Interference is quantified as $I=1-c$. Thus, under independence, $c=1$ and $I=0$.

Map Distance¶

The map distance $x$ between two loci, in Morgan units, is defined as the expected number of crossovers between them. Unlike recombination rates, map distances are additive.

Map distances between loci provide a convenient set of parameters for models used in linkage analysis. Consider the gametes produced by a parent heterozygous at each of $k$ loci. Each such gamete corresponds to a recombination event that can be indexed by a $(k-1)\times1$ vector  where element $j$ of  is $1$ if there was a recombination between locus $j$ and $j+1$ or is $0$ otherwise. Thus, in linkage analysis with $k$ loci, there are $2^{k-1}$ recombination events that need to be modeled. The probability of each of these recombination events can be treated as a parameter. Taking into consideration that these probabilities sum to one, this approach would give rise to $2^{k-1}-1$ parameters that need to be estimated. However, using the relationship between map distance and recombination rate, probabilities of the $2^{k-1}$ recombination events can be computed from the $k-1$ map distances between adjacent loci. Then, these $k-1$ map distances become the parameters for linkage analysis. The relationship between map distance and recombination rate is discussed next.

Map Functions¶

Map functions provide a transformation from map distance to recombination rate. Two approaches have been used to derive map functions. In the first, a probability model is assumed for the number of crossovers in an interval of length $x$. Then, recombination rate is calculated as the probability of an odd number of crossovers in the interval. In the second approach, recombination events in two adjacent intervals are modeled, allowing for interference. This model is then used to develop a differential equation, the solution for which yields the map function. Both of these approaches are described in detail below.

Suppose that $P_{t}$ is the probability of $t$ crossovers in a chromosomal interval of length $x$ Morgans. Recall that a recombination is observed when an odd number of crossovers occurs in this interval. Thus, probability $r_{x}$ of a recombination in an interval of length $x$ is

$$\begin{eqnarray} r_{x} & =& P_{1}+P_{3}+P_{5}+\cdots\\ & =& {\frac{1}{2}}(1-\sum_{t}P_{t}(-1)^{t})\\ & =& {\frac{1}{2}}(1-P(-1)) \end{eqnarray} \tag{2}$$

where $P(S)=\sum_{t}P_{t}S^{t}$ is the probability generating function of the distribution of crossovers.

Haldane(Haldane.JBS:1919) used the Poisson distribution for $P_{t}$. This implies that crossovers in one interval are independent of those in another and that the probability of crossovers in a very short interval is proportional to the length of the interval. According to the Poisson distribution, the probability of $t$ crossovers in an interval of length of $x$ (in Morgan units) is

$$P_{t}=\frac{(\lambda x)^{t}e^{-\lambda x}}{t!}$$

The parameter $\lambda$ in the Poisson distribution is the expected number of outcomes in a unit interval. Because map distance between two loci is defined as the expected number of crossovers between them, $\lambda=1$, and

$$P_{t}=\frac{x^{t}e^{-x}}{t!}\tag{4}$$

The probability generating function for (4) is

$$\begin{split} P(S) & =\sum_{t}\frac{x^{t}e^{-x}S^{t}}{t!}\\ & =\sum_{t}\frac{(xS)^{t}e^{-xS}}{t!}\frac{e^{-x}}{e^{-xS}}\\ & =e^{x(S-1)} \end{split} \tag{5}$$

Using (5) in (2) gives Haldane’s map function:

$$r_{x}={\frac{1}{2}}(1-e^{-2x}) \tag{6}$$

The inverse of (6) is

$$x=\begin{cases} -{\frac{1}{2}}\ln(1-2r_{x}) & {if {0\leq r_{x}<{{\frac{1}{2}}}}}\\ \infty & if {r_{x}={{\frac{1}{2}}}} \end{cases}$$

Karlin (@Karlin.S:1984) used the binomial distribution with parameters $N$ and $p$ for $P_{t}$. Thus, $t$ is the number of successes in $N$ Bernoulli trials each having probability $p$ of success. From the definition of map distance, it follows that the map distance $x={\mbox{E}}(t)=Np$, and $p=x/N$. Now, the probability of $t$ crossovers in an interval of length of $x$ is

$$P_{t}=\binom{N}{t}(x/N)^{t}(1-x/N)^{N-t} \tag{7}$$

The probability generating function for (7) is

$$\begin{split} P(S) & =\sum_{t}\binom{N}{t}(x/N)^{t}(1-x/N)^{N-t}S^{t}\\ & =\sum_{t}\binom{N}{t}(xS/N)^{t}(1-x/N)^{N-t}\\ & =[xS/N+(1-x/N)]^{N} \end{split} \tag{8}$$

because $\sum_{t}\binom{N}{t}a^{t}b^{N-t}=(a+b)^{N}$. Using (8) in (2) gives the binomial map function (Karlin's map function):

$$r_{x}= \begin{cases} {{\frac{1}{2}}}[1-(1-2x/N)^{N}] & {if x<N/2}\\ {{\frac{1}{2}}} & {if x \geq N/2} \end{cases} \tag{9}$$

In [1]:
using Gadfly, Reactive, Interact

In [5]:
@manipulate for N=2:10
plot([x -> x<N/2 ? 0.5(1 - (1 - 2x/N)^N):0.5, x -> 0.5(1 - exp(-2x))] , 0,2.0,
Guide.title("Karlin's (f<sub>1</sub>) and Haldane's (f<sub>2</sub>) Map Functions"),
Guide.ylabel("Recombination Rate"), Guide.xlabel("Map Distance"),
Guide.colorkey("Map Function")
)
end

Out[5]:

The inverse of (9) is

$$x={\frac{1}{2}}N[1-(1-2r_{x})^{1/N}] \tag{10}$$

In the second approach for deriving map functions, recombination is modeled in two adjacent intervals. Suppose three loci $A$, $B$, and $C$ are ordered as $ABC$ with a map distance of $x$ between $A$ and $B$, and a distance of $h$ between $B$ and $C$. Let $M(x)$ be the map function that we wish to derive that transforms map distances to recombination rates. It is assumed that when $x$ is sufficiently small, $r_{x}=M(x)=x$.

Also, let $g_{\epsilon_i}$ denote the probability of the recombination event indexed by $\epsilon_i$; for example, $g_{10}$ is the probability of a recombination in the first interval and no recombination in the second interval.

Using this notation, the probability $r_{AC}$ of a recombination between $A$ and $C$ can be written as

$$r_{AC}=g_{10}+g_{01}$$

If there is no interference,

$$\begin{split} r_{AC} & =g_{10}+g_{01}\\ & =r_{AB}(1-r_{BC})+(1-r_{AB})r_{BC}\\ & =r_{AB}+r_{BC}-2r_{AB}r_{BC} \end{split} \tag{11}$$

Recall that $r_{AB}r_{BC}=g_{11}$ is the probability of a double recombination when interference is absent. When interference is present, the probability of a double recombination is given by ([g11]). Thus, when interference is present, the probability of a recombination between $A$ and $C$ can be written as

$$r_{AC}=r_{AB}+r_{BC}-2cr_{AB}r_{BC} \tag{12}$$

where $c$ is the coefficient of coincidence. Now, ([rac-interference]) is rewritten using the map function $M(.)$ in place of the recombination rates:

$$M(x+h)=M(x)+M(h)-2cM(x)M(h) \tag{13}$$

The above equation can be rearranged as

$$\frac{M(x+h)-M(x)}{h}=\frac{M(h)-2cM(x)M(h)}{h} \tag{14}$$

As $h\to0$, $\frac{M(h)}{h}\to1$. Thus, taking the limit as $h\to0$, on both sides of ([rac-derivative-eq]) gives

$$\frac{dr_{x}}{dx}=1-2cr_{x} \tag{15}$$

Letting $c=1$ and solving (15) gives Haldane’s map function (6). When $c=1$, recombination in the two intervals are independent; this assumption is implicit in the Poisson distribution.

Letting $c=2r_{x}$ gives the Kosambi map function

$$r_{x}={\frac{1}{2}}\frac{e^{4x}-1}{e^{4x}+1} \tag{16}$$

with inverse

$$x=\frac{1}{4}\ln\frac{1+2r_{x}}{1-2r_{x}} \tag{17}$$

Several other map functions derived from (15) by assuming different assumptions about $c$ are given in AHGL (pp 14–17). Map functions derived from (15) with $c\neq1$ are not suitable for linkage analysis with more than three loci (AHGL pp 124–127).

In [6]:
@manipulate for N=2:10
plot([x -> x<N/2 ? 0.5(1 - (1 - 2x/N)^N):0.5, x -> 0.5(1 - exp(-2x)), x -> 0.5((exp(4x)-1)/(exp(4x)+1))] , 0,2.0,
Guide.title("Karlin's (f<sub>1</sub>), Haldane's (f<sub>2</sub>) and Kosambi's (f<sub>3</sub>)  Map Functions"),
Guide.ylabel("Recombination Rate"), Guide.xlabel("Map Distance"),
Guide.colorkey("Map Function")
)
end

Out[6]: